Welcome,
Guest
|
|
Hello, i installed HA-Lizard on a 2 node cluster with a internal storage. I use xcp ng 81 as my hypervisor and want to make this cluster HA so when my nr.1 node crashes that the other one takes over. My problem now is that if my main node crashes my secondary node cant take over and the server(Pool) doesnt show up anymore until i add the secondary server manually over the Xen-Center. I looked in the logs what actually happens if i disconnect the mainserver and what my secondary does and it seems like he has some problems to find the VID and also i can't ping my 10.10.10.3 per ssh which i implemented as my iSCSI Storage.It also trys to migrate the VM to the primary server but obviously cant find it.
Nov 4 15:50:55 xcp-ng-secondary ha-lizard: 8351 Ready to exec [find /dev/shm/ha-lizard-mail/ -name *.msg -type f -mmin +60 -delete] Nov 4 15:50:55 xcp-ng-secondary ha-lizard: 8351 FLUSH_MAIL_EXEC returned [0] Nov 4 15:50:55 xcp-ng-secondary ha-lizard: 8351 check_email_enabled: Email enabled for vm_mon Nov 4 15:50:55 xcp-ng-secondary ha-lizard: 8351 email: Duplicate message - not sending. Content = vm_mon: Error starting failed VM: Windows 10 (64-bit) (1) UUID: 312abe84-9704-a079-3dc2-02cb08a1bf1f Nov 4 15:50:55 xcp-ng-secondary ha-lizard: 8351 email: Message barred for 60 minutes Nov 4 15:50:58 xcp-ng-secondary ha-lizard: 2547 ha-lizard Watchdog: ha-lizard running - OK Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 31214 Spawning new instance of ha-lizard Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 31214 check_logger_processes: Checking logger processes Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 31214 check_logger_processes: No processes to clear Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 10078 LOG_TERMINAL = [false] Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 10078 Mail Spool Directory Found /dev/shm/ha-lizard-mail Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 10078 This iteration is count 39 Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 10078 Checking if this host is a Pool Master or Slave Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 10078 This host's pool status = master Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 10078 Checking if ha-lizard is enabled for this pool Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 10078 check_ha_enabled: Checking if ha-lizard is enabled for pool: e43e608e-4ead-c0df-8f26-9d52f628752a Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 10078 check_ha_enabled: ha-lizard is enabled Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 10078 check_ha_enabled: checking whether maintenance mode enabled Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 10078 ha-lizard is enabled Nov 4 15:51:02 xcp-ng-secondary ha-lizard: 10078 check_xs_ha: Checking XenServer HA status Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 update_global_conf_params: Successfully updated global pool configuration settings in /etc/ha-lizard/ha-lizard.pool.conf. Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 update_global_conf_params: DISABLED_VAPPS=()#012DISK_MONITOR=1#012ENABLE_ALERTS=1#012ENABLE_LOGGING=1#012FENCE_ACTION=stop#012FENCE_ENABLED=1#012FENCE_FILE_LOC=/etc/ha-lizard/fence#012FENCE_HA_ONFAIL=0#012FENCE_HEURISTICS_IPS=192.168.255.254#012FENCE_HOST_FORGET=0#012FENCE_IPADDRESS=#012FENCE_METHOD=POOL#012FENCE_MIN_HOSTS=2#012FENCE_PASSWD=#012FENCE_QUORUM_REQUIRED=1#012FENCE_REBOOT_LONE_HOST=0#012FENCE_USE_IP_HEURISTICS=1#012GLOBAL_VM_HA=1#012HOST_SELECT_METHOD=0#012MAIL_FROM="root@localhost"#012MAIL_ON=1#012MAIL_SUBJECT="SYSTEM_ALERT-FROM_HOST:$HOSTNAME"#012MAIL_TO="root@localhost"#012MGT_LINK_LOSS_TOLERANCE=5#012MONITOR_DELAY=15#012MONITOR_KILLALL=1#012MONITOR_MAX_STARTS=20#012MONITOR_SCANRATE=10#012OP_MODE=2#012PROMOTE_SLAVE=1#012SLAVE_HA=1#012SLAVE_VM_STAT=0#012SMTP_PASS=""#012SMTP_PORT="25"#012SMTP_SERVER="127.0.0.1"#012SMTP_USER=""#012XAPI_COUNT=2#012XAPI_DELAY=10#012XC_FIELD_NAME='ha-lizard-enabled'#012XE_TIMEOUT=10 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_master_mgt_link_state: Checking management interface link state Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_master_mgt_link_state: Link State = [ true ] for management interface IP [ 192.168.10.2 ] Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_master_mgt_link_state: Link [ xenbr0 ] state UP Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 Master management link OK - checking prior link state Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 This host detected as pool Master Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 Found 2 hosts in pool Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 validate_vm_ha_state: Validating VM HA-state Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 validate_vm_ha_state: VM [ 312abe84-9704-a079-3dc2-02cb08a1bf1f ] state [ false ] = OK Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 Calling function write_pool_state Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 Calling function autoselect_slave Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 Calling function check_slave_status Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_master_mgt_link_state: Checking management interface link state Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 write_pool_state: MASTER UUID found: f543502f-1445-429e-8220-b360cd2a6946 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 autoselect_slave: This host UUID found: f543502f-1445-429e-8220-b360cd2a6946 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 write_pool_state: MASTER UUID: f543502f-1445-429e-8220-b360cd2a6946 written to local state storage Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 autoselect_slave: MASTER host UUID found: f543502f-1445-429e-8220-b360cd2a6946 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 write_pool_state: Calling function get_vms_on_host for UUID: b6dcd4f7-860c-4dfc-8e7c-22c00ef08b33 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_master_mgt_link_state: Link State = [ true ] for management interface IP [ 192.168.10.2 ] Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_master_mgt_link_state: Link [ xenbr0 ] state UP Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_slave_status: Management link OK - continue Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 get_vms_on_host: No VMs found on host: b6dcd4f7-860c-4dfc-8e7c-22c00ef08b33 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 autoselect_slave: Removing Slave UUID from list of Hosts - Slave: b6dcd4f7-860c-4dfc-8e7c-22c00ef08b33 is disabled or in maintenance mode Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 write_pool_state: Writing VM array to local state file host.b6dcd4f7-860c-4dfc-8e7c-22c00ef08b33.vmlist.uuid_array Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 autoselect_slave: f543502f-1445-429e-8220-b360cd2a6946 is Master UUID - excluding from list of available slaves Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 autoselect_slave: 0 available Slave UUIDs found: Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 write_pool_state: Calling function get_vms_on_host for UUID: f543502f-1445-429e-8220-b360cd2a6946 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 get_pool_host_list: returned b6dcd4f7-860c-4dfc-8e7c-22c00ef08b33#012f543502f-1445-429e-8220-b360cd2a6946 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 get_vms_on_host: No VMs found on host: f543502f-1445-429e-8220-b360cd2a6946 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 write_pool_state: Writing VM array to local state file host.f543502f-1445-429e-8220-b360cd2a6946.vmlist.uuid_array Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 No slaves available to become pool master Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_slave_status: Removing Slave UUID from list of Hosts - Slave: b6dcd4f7-860c-4dfc-8e7c-22c00ef08b33 is disabled or in maintenance mode Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_slave_status: Removing Master UUID from list of Hosts Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_slave_status: Host IP Address check Status Array for Slaves = () Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_slave_status: Quorum check called Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_quorum: Checking host IPs: 192.168.10.2 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 pool autopromote_uuid = [none_available] Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 write_pool_state: Pool autopromote_uuid=none_available Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_quorum: Host IP: 192.168.10.2 Response = OK Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 write_pool_state: autopromote_uuid unchanged - not updating Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_quorum: LIVE HOSTs = 1 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_quorum: Using network points: 192.168.255.254 as possible additional vote Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_quorum: Heuristic IP: 192.168.255.254 Response = OK Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_quorum: Successful Replies = 1 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_quorum: Total enpoints checked = 1 with total successful replies = 1 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_quorum: Additional heuristic vote success. Incremeting vote by 1 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_quorum: Minimum number of hosts needed to allow fencing = 0 + 1 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_quorum: 2 Hosts found. Minimum needed = 0 + 1. Fencing allowed Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_slave_status: Failed slave count = 0 Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_slave_status: No Failed slaves detected Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_ha_enabled: Checking if ha-lizard is enabled for pool: e43e608e-4ead-c0df-8f26-9d52f628752a Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 Function check_slave_status reported no failures: calling vm_mon Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_ha_enabled: ha-lizard is enabled Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 check_ha_enabled: checking whether maintenance mode enabled Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 vm_mon: ha-lizard is operating mode 2 - managing pool VMs Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 vm_mon: Retrived list of VMs for this poll: 312abe84-9704-a079-3dc2-02cb08a1bf1f Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 vm_mon: Removing Control Domains from VM list Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 vm_mon: VM list returned = 312abe84-9704-a079-3dc2-02cb08a1bf1f Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 vm_state: Machine state for 312abe84-9704-a079-3dc2-02cb08a1bf1f returned: halted Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 vm_mon: VM 312abe84-9704-a079-3dc2-02cb08a1bf1f state = halted Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 vm_mon: GLOBAL_VM_HA is enabled. Adding VM: 312abe84-9704-a079-3dc2-02cb08a1bf1f to list of failed VMs on this run. Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 vm_mon: 1 Eligible Halted VMs found Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 vm_mon: Halted VMs found: 312abe84-9704-a079-3dc2-02cb08a1bf1f Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 vm_mon: Attempting to start VMs in halted state Nov 4 15:51:03 xcp-ng-secondary ha-lizard: 10078 write_pool_state: Pool contains 1 hosts. Writing to /etc/ha-lizard/state/pool_num_hosts Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 get_pool_host_list: enabled flag set - returning only hosts with enabled=true Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 validate_vm_safe_to_start_here: VM [312abe84-9704-a079-3dc2-02cb08a1bf1f] home pool validated [e43e608e-4ead-c0df-8f26-9d52f628752a] - safe to start here Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 get_pool_host_list: returned f543502f-1445-429e-8220-b360cd2a6946 Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 get_pool_ip_list: returned 192.168.10.2 Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 write_pool_state: Host IP List = 192.168.10.2 Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 write_status_report: Writing status report Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 vm_mon: Starting VM: Windows 10 (64-bit) (1) UUID: 312abe84-9704-a079-3dc2-02cb08a1bf1f Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 email: Mail Spool Directory Found /dev/shm/ha-lizard-mail Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 Ready to exec [find /dev/shm/ha-lizard-mail/ -name *.msg -type f -mmin +60 -delete] Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 FLUSH_MAIL_EXEC returned [0] Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 check_email_enabled: Email enabled for vm_mon Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 email: Duplicate message - not sending. Content = vm_mon: Starting VM: Windows 10 (64-bit) (1) UUID: 312abe84-9704-a079-3dc2-02cb08a1bf1f Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 email: Message barred for 60 minutes Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 vm_mon: HOST_SELECT_METHOD set to [ 0 ] - checking for a healthy host Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 vm_mon: This host [ f543502f-1445-429e-8220-b360cd2a6946 ] start on serial [ 970 ] Nov 4 15:51:04 xcp-ng-secondary ha-lizard: 10078 vm_mon: Host [ f543502f-1445-429e-8220-b360cd2a6946 ] health status = [ master ] Nov 4 15:51:07 xcp-ng-secondary ha-lizard: 10078 vm_mon: VM start exit result = [ 1 ] Nov 4 15:51:07 xcp-ng-secondary ha-lizard: 10078 vm_mon: VM start returned messages = [ Error code: SR_BACKEND_FAILURE_46#012Error parameters: , The VDI is not available [opterr=Command failed (5): /dev/sdc: open failed: No such device or address#012 Volume group "VG_XenStorage-3aeb126f-7d32-39ac-1626-a334dc5404ff" not found#012 Cannot process volume group VG_XenStorage-3aeb126f-7d32-39ac-1626-a334dc5404ff], ] Nov 4 15:51:07 xcp-ng-secondary ha-lizard: 10078 vm_mon: Error code: SR_BACKEND_FAILURE_46#012Error parameters: , The VDI is not available [opterr=Command failed (5): /dev/sdc: open failed: No such device or address#012 Volume group "VG_XenStorage-3aeb126f-7d32-39ac-1626-a334dc5404ff" not found#012 Cannot process volume group VG_XenStorage-3aeb126f-7d32-39ac-1626-a334dc5404ff], Nov 4 15:51:07 xcp-ng-secondary ha-lizard: 10078 reset_vm_vdi: Resetting VDI(s) for VM [ 312abe84-9704-a079-3dc2-02cb08a1bf1f ] Nov 4 15:51:07 xcp-ng-secondary ha-lizard: 10078 reset_vm_vdi: Found VDI [ 85a74300-0ce0-4a65-b21d-34184c4a2e8b ] Nov 4 15:51:07 xcp-ng-secondary ha-lizard-NOTICE-/etc/ha-lizard/init/ha-lizard.mon: VDI 85a74300-0ce0-4a65-b21d-34184c4a2e8b is not marked as attached anywhere, nothing to do Nov 4 15:51:07 xcp-ng-secondary ha-lizard: 10078 reset_vm_vdi: VDI [ 85a74300-0ce0-4a65-b21d-34184c4a2e8b ] reset Nov 4 15:51:07 xcp-ng-secondary ha-lizard: 10078 reset_vm_vdi: No VDI found for VBD [ 27bc9965-8a28-ed69-b3c5-bf9a9179f879 ] Nov 4 15:51:07 xcp-ng-secondary ha-lizard: 10078 vm_mon: Reattempting vm [ 312abe84-9704-a079-3dc2-02cb08a1bf1f ] start Nov 4 15:51:08 xcp-ng-secondary ha-lizard: 2547 ha-lizard Watchdog: ha-lizard running - OK Nov 4 15:51:10 xcp-ng-secondary ha-lizard-ERROR-/etc/ha-lizard/init/ha-lizard.mon: Error code: SR_BACKEND_FAILURE_46 Nov 4 15:51:10 xcp-ng-secondary ha-lizard-ERROR-/etc/ha-lizard/init/ha-lizard.mon: Error parameters: , The VDI is not available [opterr=Command failed (5): /dev/sdc: open failed: No such device or address Nov 4 15:51:10 xcp-ng-secondary ha-lizard-ERROR-/etc/ha-lizard/init/ha-lizard.mon: Volume group "VG_XenStorage-3aeb126f-7d32-39ac-1626-a334dc5404ff" not found Nov 4 15:51:10 xcp-ng-secondary ha-lizard-ERROR-/etc/ha-lizard/init/ha-lizard.mon: Cannot process volume group VG_XenStorage-3aeb126f-7d32-39ac-1626-a334dc5404ff], Nov 4 15:51:10 xcp-ng-secondary ha-lizard: 10078 vm_mon: Error starting failed VM: Windows 10 (64-bit) (1) UUID: 312abe84-9704-a079-3dc2-02cb08a1bf1f Nov 4 15:51:10 xcp-ng-secondary ha-lizard: 10078 email: Mail Spool Directory Found /dev/shm/ha-lizard-mail Nov 4 15:51:10 xcp-ng-secondary ha-lizard: 10078 Ready to exec [find /dev/shm/ha-lizard-mail/ -name *.msg -type f -mmin +60 -delete] Nov 4 15:51:10 xcp-ng-secondary ha-lizard: 10078 FLUSH_MAIL_EXEC returned [0] Nov 4 15:51:10 xcp-ng-secondary ha-lizard: 10078 check_email_enabled: Email enabled for vm_mon Nov 4 15:51:10 xcp-ng-secondary ha-lizard: 10078 email: Duplicate message - not sending. Content = vm_mon: Error starting failed VM: Windows 10 (64-bit) (1) UUID: 312abe84-9704-a079-3dc2-02cb08a1bf1f Nov 4 15:51:10 xcp-ng-secondary ha-lizard: 10078 email: Message barred for 60 minutes If your interestes in the full Log i will attach it also but heres just a bit of the logs maybe its enough for you to understandw what i mean. Thank you for your help! |
Please Log in or Create an account to join the conversation. |
|
Looks like your slave has transitioned to master, but has not attached the storage.
Can on confirm whether iscsi-ha is running? service iscsi-ha status If it is running, can you provide the iscsi-ha logs for the slave for the time leading up to the event. The general logic works like this. Master host fails -> slave becomes master -> iscsi-ha on the slave will connect the storage and expose it over iscsi. It appears that this last step is not occurring in your case |
Please Log in or Create an account to join the conversation. |