Forum
Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1

TOPIC:

XCP-NG 3 Node Pool 1 Node fails quorum once a day. 5 years 7 months ago #1638

  • Mike Upson
  • Mike Upson's Avatar Topic Author
  • Offline
  • Posts: 4
We are currently experiencing an issue where only one of our nodes keeps being fenced by HA-Lizard. The log shows that quorum check failed and then the host is fenced, however 1 second later the when HA-Lizard checks quorum again the node that was fenced passes the quorum check just fine. This keeps causing VM's to be moved to the other 2 nodes in the pool. We have checked all of the settings on the host that is having issues and cant seem to find what would cause it not to respond during the quorum check. Does anyone know what might be causing this issue.

Thanks

Mike

Please Log in or Create an account to join the conversation.

XCP-NG 3 Node Pool 1 Node fails quorum once a day. 5 years 7 months ago #1639

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Defaut HA parameters should tolerate health check failures for about 20 seconds. It's odd that you are seeing a 1 second failover. Have any of the default HA timers been changed?

Have you checked system level logs in case the interface goes down each time this happens?

Log snippets from the master and affected host would be helpful if you are able to post them.

Please Log in or Create an account to join the conversation.

XCP-NG 3 Node Pool 1 Node fails quorum once a day. 5 years 7 months ago #1640

  • Mike Upson
  • Mike Upson's Avatar Topic Author
  • Offline
  • Posts: 4
Salvatore,

Thank you for your response. I have been looking at logs but have not seen anything showing an interface was down. I will look more on Monday and I will try to add a screen shot of the logs on Monday as well. Thanks

Mike

Please Log in or Create an account to join the conversation.

XCP-NG 3 Node Pool 1 Node fails quorum once a day. 5 years 7 months ago #1641

  • Mike Upson
  • Mike Upson's Avatar Topic Author
  • Offline
  • Posts: 4
Salvatore,

Sorry for the delay had an emergency pop up that I have been working on. I have added the logs from the master and the slave that is having issues. It does appear that the slave which is xen3 is having a network issue but it never shows any drops any where else except in HA-lizard logs. Let me know what you think and if you need any more logs. Thanks
Attachments:

Please Log in or Create an account to join the conversation.

XCP-NG 3 Node Pool 1 Node fails quorum once a day. 5 years 7 months ago #1642

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Thanks for the logs. There is definitely a network issue going on when this occurs and it seems to be isolated to the host "xen3" which is unreachable. Xen3 also cannot communicate with the master as evidenced by the IP check failure and the failure to update its configuration just before the errors, which requires XAPI.

Have you checked xensource.log around the same timestamp to see if there are any network errors reported on xen3?

Since the network issue appears to clear in ~ <20 seconds, you can give the HA process more time by setting xapi_count to 5, which would give you 30 additional seconds for the issue to clear and avoid disruption of your running VMs. This is not a solution, but would give you time to sort out the network issue. From the CLI "ha-cfg set xapi_count 5". This can be run from any of the 3 hosts and it will update all the hosts in the pool with the new value.

While you are at it, if you can, check switch logs in case a switch port is having an issue.

Please Log in or Create an account to join the conversation.

  • Page:
  • 1