HA-lizard Version 2.2.0 Trial

HA-lizard Version 2.2.0 Trial 5 years 5 months ago #1723

Nathan Scannell
Topic Author
Offline
Posts: 38

Hi Salvatore,

I've tested a couple of disaster scenarios using the rpm upgrade package and for the most part, it has proven well.

Thought I'd just start a new thread here to discuss it in detail.

First Test:
Simulated Master host failure (Power supply failure on Master)
The slave auto-promoted and could be accessed in GUI *tick*
Fixed the old Master and rebooted. Came back online as a Slave *tick*

Second Test:
Simulated a further failure, this time the NEW Master (old slave)
The Slave (Old Master) auto-promoted and came online *tick*
The failed new Master (the original Slave came back online as a slave BUT, DRBD failed to connect to the peer node with this message at boot, interfering with Plymouth screen.

***************************************************************
DRBD's startup script waits for the peer node(s) to appear.
- If this node was already a degraded cluster before the
reboot, the timeout is 0 seconds. [degr-wfc-timeout]
- If the peer was available before the reboot, the timeout
is 0 seconds. [wfc-timeout]
(These values are for resource 'iscsi1'; 0 sec -> wait forever)
To abort waiting enter 'yes' [ 14]:
***************************************************************

I've attempted to manually repair the DRBD connection at this point but it stubbornly refuses to establish a connection to Master and thus prevents tgtd service from starting.

Any idea how to rectify? I know the case simulated here is a little obtuse but theoretically, the hosts should be able to fail/recover indefinitely. This particular case could feasibly arise... eg two power supply units from the same batch are bad and fail in close succession.

Please Log in or Create an account to join the conversation.

HA-lizard Version 2.2.0 Trial 5 years 5 months ago #1725

Salvatore Costantino
Offline
Posts: 722

It could you be one of 2 things

- we have seen sporadic issues with Linux bridge using a bond where ARP peer notifications fail occasionally. This is purely related to the kernel. We have worked around it in 2 ways. 1) don't use a bond for replication 2) switch the bond type to LACP. You can verify whether this is your issue by simply pinging the peer replication ip from each host. If it fails, then this is likely the cause

- if it's not the above, then it could be drbd split brain. Drbd will do this if it cannot reliably determine which host is most up to date and is done so intentionally to preserve data integrity. If the is the issue, we ship a script that will resolve it. Run the below on each host and follow the instructions. /etc/iscsi-ha/scripts/drbd_sb_tool

Please Log in or Create an account to join the conversation.

Last edit: by Salvatore Costantino.

HA-lizard Version 2.2.0 Trial 5 years 5 months ago #1726

Nathan Scannell
Topic Author
Offline
Posts: 38

Thanks...

It's not split-brain. That was my first suspicion but there was never any sign of split-brain occurring.

I ran the script anyway but it just automating my previous manual attempts.

Also, I've avoided using bonds but that reminds me though... i never switched over to bridge. Is open vswitch supported yet? I though the installer would warn me if not.

I'll run the simulation again in bridge mode anyway just to document the result.

Please Log in or Create an account to join the conversation.

Last edit: by Nathan Scannell.

HA-lizard Version 2.2.0 Trial 5 years 5 months ago #1727

Salvatore Costantino Offline Posts: 722	Ovs was a problem in 6.x. It's been working fine in 7.x. Are you able to ping replication IPs from each of the hosts? Could be firewall too.
	Please Log in or Create an account to join the conversation.

HA-lizard Version 2.2.0 Trial 5 years 5 months ago #1728

Nathan Scannell Topic Author Offline Posts: 38	OMG... Bad crossover cable. I made it myself because I ran out of proper stock hah
	Please Log in or Create an account to join the conversation.

HA-lizard Version 2.2.0 Trial 5 years 5 months ago #1730

Nathan Scannell Topic Author Offline Posts: 38	So... It wasn't my cable after all... Intermittently, my adapters are not lighting up after reboot. It seems they have trouble negotiating a link. Problem is resolved by installing a switch instead of crossover cable. They are Intel Desktop Pro 1000 PCI-E NICs. Might pay to avoid these in xover scenarios.
	Please Log in or Create an account to join the conversation.

TOPIC:

HA-lizard Version 2.2.0 Trial 5 years 5 months ago #1723

HA-lizard Version 2.2.0 Trial 5 years 5 months ago #1725

HA-lizard Version 2.2.0 Trial 5 years 5 months ago #1726

HA-lizard Version 2.2.0 Trial 5 years 5 months ago #1727

HA-lizard Version 2.2.0 Trial 5 years 5 months ago #1728

HA-lizard Version 2.2.0 Trial 5 years 5 months ago #1730

Company

Links

Products