Forum
Welcome, Guest
Username: Password: Remember me
This is the optional category header for the Suggestion Box.

TOPIC:

Can't get HA-cfg in HA on both nodes 3 years 8 months ago #2066

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Hans,
Thanks for verifying that.
I've read through the code and can confirm that the master uuid displayed in the output of "ha-cg status" comes from a local cache. The cache is populated every few seconds by the ha-lizard service. So, as long as the service is running OK, it should be populated with the correct master uuid.

Before we do anything, on the slave, can you check the timestamp on /etc/ha-lizard/state/master_uuid. It should only be a few seconds behind the current time.. Also, check the contents of the file to see whether it is populated with the wrong uuid that we are seeing in the output of "ha-cfg status".

If the timestamp is old and the content is wrong, we can deduce that the service is not running correctly or at all.

- on the slave, run "service ha-lizard status" to ensure that it is running. If it is (it is running in your screen snip), then it is not running correctly. Try "ha-cfg log" and check whether there are any noticeable errors in the log. If so, try a restart of the service "service ha-lizard restart". If the master_uuid state file is still off, it could be a config parameter that is off. You can send me a message at This email address is being protected from spambots. You need JavaScript enabled to view it. to arrange a time to look at your system.

FYI - in a normally operating system, you should be able to delete all the files in /etc/ha-lizard/state/ and they would reappear in a few seconds.. It seems that is not happening in your case.

Please Log in or Create an account to join the conversation.

Can't get HA-cfg in HA on both nodes 3 years 8 months ago #2067

  • Hans Hoeksma
  • Hans Hoeksma's Avatar Topic Author
  • Offline
  • Posts: 13
Hi Salvatore,
The slave looks ok, but I see on the master the following messages apearing:
Jun 29 16:38:29 aeudcimpnlhyp2 ha-lizard-ERROR-/etc/ha-lizard/init/ha-lizard.mon: /etc/ha-lizard/init/ha-lizard.mon: line 53: [: : integer expression expected
Jun 29 16:38:29 aeudcimpnlhyp2 ha-lizard-ERROR-/etc/ha-lizard/init/ha-lizard.mon: sleep: missing operand
Jun 29 16:38:29 aeudcimpnlhyp2 ha-lizard-ERROR-/etc/ha-lizard/init/ha-lizard.mon: Try 'sleep --help' for more information.

I will drop you a message to your email.

Please Log in or Create an account to join the conversation.

Can't get HA-cfg in HA on both nodes 3 years 8 months ago #2068

  • Hans Hoeksma
  • Hans Hoeksma's Avatar Topic Author
  • Offline
  • Posts: 13
Hi Salvatore,

I have got it cracked.....the file ha-lizard.pool.conf on the master was corrupted and showed the following:

timeout
| ha-lizard Version: 2.2.3 |
| Operating Mode: Mode [ 2 ] Managing Individual VMs in Pool |
| Host Role: slave |
| Pool UUID: 1e461c89-52da-6ac5-9673-79a126d1d16b |
| Host UUID: 36f63b61-7f8c-441f-8c6e-d3559c5d8832 |
| Master UUID: 3afdf3d7-cf15-4a2b-86ed-b60fc8c4d12b |
| Daemon Status: ha-lizard is running [ OK ] |
| Watchdog Status: ha-lizard-watchdog is running [ OK ] |
| HA Enabled: TIMEOUT |
Pool HA Status: [31mTIMEOUT
[0m

I have copied over the correct info from the slave and started the "service ha-lizard status"
Running the "ha-cfg" status showed now immediately the correct Master UUID and I am able to set the "Ha Enabled" to true, which is now also replicated to the slave.
Also setting the "Ha Enabled" from the slave works like a charm and is replicated to the master as well.

Why this happened, I don't know but the problem is now solved.

Thanks for your support and your assistance to point me in the right direction.

Hans

Please Log in or Create an account to join the conversation.

Can't get HA-cfg in HA on both nodes 3 years 8 months ago #2069

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Hi Hans,
Nice catch. That file acts as a bootstrap to ensure all parameters are set before running any logic. It refreshes often to ensure new config parameters are present immediately. I've not seen that before. It could have been a race condition on shutdown while in the midst of writing to the file, leaving corrupted config parameters behind.

Glad it all worked out.

Please Log in or Create an account to join the conversation.