Forum
Welcome, Guest
Username: Password: Remember me

TOPIC: Slave Not Promoting On New Install

Slave Not Promoting On New Install 4 months 5 days ago #1697

  • Philippe Allaire
  • Philippe Allaire's Avatar Topic Author
  • Offline
  • Posts: 3
I have new install of xcp and new install of ha-lizard nosan. Worked great, drbd syncing, looking good except, on the slave when I do ha-cfg status, in the Host UUID is empty, and in Master UUID in empty.

When I do hard shutdown of server1, server2 just stays in slave and never becomes master. From the logs it seems like they cant see each other? But I can ping the both and I know they can see each other ...

Please advise

From the logs on server1:
check_master_mgt_link_state: Checking management interface link state
Nov 20 22:20:19 xcpserver1 ha-lizard: check_master_mgt_link_state: Link State = [ true ] for management interface IP [ 192.168.1.2 ]
Nov 20 22:20:19 xcpserver1 ha-lizard: check_master_mgt_link_state: Link [ xenbr0 ] state UP
Nov 20 22:20:19 xcpserver1 ha-lizard: Master management link OK - checking prior link state
Nov 20 22:20:19 xcpserver1 ha-lizard: This host detected as pool Master
Nov 20 22:20:19 xcpserver1 ha-lizard: Found 2 hosts in pool
Nov 20 22:20:19 xcpserver1 ha-lizard: validate_vm_ha_state: Validating VM HA-state
Nov 20 22:20:19 xcpserver1 ha-lizard: Calling function write_pool_state
Nov 20 22:20:19 xcpserver1 ha-lizard: 32715 Calling function autoselect_slave
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 Calling function check_slave_status
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_master_mgt_link_state: Checking management interface link state
Nov 20 22:20:19 xcpserver1 ha-lizard: 32715 autoselect_slave: This host UUID found: b3edf4a2-8a88-4546-86be-0f32c60d9d42
Nov 20 22:20:19 xcpserver1 ha-lizard: 32715 autoselect_slave: MASTER host UUID found: b3edf4a2-8a88-4546-86be-0f32c60d9d42
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_master_mgt_link_state: Link State = [ true ] for management interface IP [ 192.168.1.2 ]
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_master_mgt_link_state: Link [ xenbr0 ] state UP
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_slave_status: Management link OK - continue
Nov 20 22:20:19 xcpserver1 ha-lizard: get_vms_on_host: No VMs found on host: b3edf4a2-8a88-4546-86be-0f32c60d9d42
Nov 20 22:20:19 xcpserver1 ha-lizard: 32715 autoselect_slave: b3edf4a2-8a88-4546-86be-0f32c60d9d42 is Master UUID - excluding from list of available slaves
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 get_pool_host_list: returned b3edf4a2-8a88-4546-86be-0f32c60d9d42#0122528a63a-0441-49ec-a409-a7edae7a7bc6
Nov 20 22:20:19 xcpserver1 ha-lizard: 32715 autoselect_slave: 1 available Slave UUIDs found: 2528a63a-0441-49ec-a409-a7edae7a7bc6
Nov 20 22:20:19 xcpserver1 ha-lizard: get_vms_on_host: No VMs found on host: 2528a63a-0441-49ec-a409-a7edae7a7bc6
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_slave_status: Removing Master UUID from list of Hosts
Nov 20 22:20:19 xcpserver1 ha-lizard: 32715 autoselect_slave: Selected Slave: 2528a63a-0441-49ec-a409-a7edae7a7bc6 = Current slave: 2528a63a-0441-49ec-a409-a7edae7a7bc6 - ignoring update
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 get_pool_ip_list: returned 192.168.1.3
Nov 20 22:20:19 xcpserver1 ha-lizard: check_ha_enabled: Checking if ha-lizard is enabled for pool: 0aaedd5e-cf58-7486-262b-78c77d17abca
Nov 20 22:20:19 xcpserver1 ha-lizard: check_ha_enabled: ha-lizard is enabled
Nov 20 22:20:19 xcpserver1 ha-lizard: check_ha_enabled: checking whether maintenance mode enabled
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_xapi: Pool Host 192.168.1.3 xapi status = 0
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 Mail Spool Directory Found /dev/shm/ha-lizard-mail
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_email_enabled: Email enabled for check_xapi
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 email: Duplicate message - not sending. Content = check_xapi: Pool Host on Server: 192.168.1.3 not responding to HTTP - manual intervention may be required
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 email: Message barred for 60 minutes
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_slave_status: Slave host [ 2528a63a-0441-49ec-a409-a7edae7a7bc6 ] health status = [ failed ] - break
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_slave_status: Host IP Address check Status Array for Slaves = (0)
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_slave_status: Quorum check called
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_quorum: Checking host IPs: 192.168.1.2 192.168.1.3
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_quorum: Host IP: 192.168.1.2 Response = OK
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_quorum: LIVE HOSTs = 1
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_quorum: Host IP: 192.168.1.3 Response = OK
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_quorum: LIVE HOSTs = 2
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_quorum: Using network points: 192.168.1.1 as possible additional vote
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_quorum: Heuristic IP: 192.168.1.1 Response = OK
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_quorum: Successful Replies = 1
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 Total enpoints checked = 1 with total successful replies = 1
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_quorum: Additional heuristic vote success. Incremeting vote by 1
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_quorum: Minimum number of hosts needed to allow fencing = 1 + 1
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_quorum: 3 Hosts found. Minimum needed = 1 + 1. Fencing allowed
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_slave_status: Failed slave count = 1
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_slave_status: Processing failed slave: 2528a63a-0441-49ec-a409-a7edae7a7bc6 on this iteration
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 Mail Spool Directory Found /dev/shm/ha-lizard-mail
Nov 20 22:20:19 xcpserver1 ha-lizard: get_pool_host_list: enabled flag set - returning only hosts with enabled=true
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_email_enabled: Email enabled for check_slave_status
Nov 20 22:20:19 xcpserver1 ha-lizard: get_pool_host_list: returned b3edf4a2-8a88-4546-86be-0f32c60d9d42#0122528a63a-0441-49ec-a409-a7edae7a7bc6
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 email: Duplicate message - not sending. Content = check_slave_status: Server xcpserver1: Some Pool Slaves not not responding , 2528a63a-0441-49ec-a409-a7edae7a7bc6
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 email: Message barred for 60 minutes
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_slave_status: Some Pool Slaves not not responding , 2528a63a-0441-49ec-a409-a7edae7a7bc6
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_slave_status: Calling function get_vms_on_host for UUID(s) 2528a63a-0441-49ec-a409-a7edae7a7bc6
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_slave_status: Calling function fence_host to remove unresponsive host from pool. Failed Host(s) = 2528a63a-0441-49ec-a409-a7edae7a7bc6
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 check_slave_status: fence_host 2528a63a-0441-49ec-a409-a7edae7a7bc6 executed on prior iteration - host already fenced
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 Function check_slave_status Host Power = Off, calling vm_mon
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 vm_mon: ha-lizard is operating mode 2 - managing pool VMs
Nov 20 22:20:19 xcpserver1 ha-lizard: get_pool_ip_list: returned 192.168.1.2
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 vm_mon: Retrived list of VMs for this poll:
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 vm_mon: Removing Control Domains from VM list
Nov 20 22:20:19 xcpserver1 ha-lizard: get_pool_ip_list: returned 192.168.1.2 192.168.1.3
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 vm_mon: VM list returned =
Nov 20 22:20:19 xcpserver1 ha-lizard: write_status_report: Writing status report
Nov 20 22:20:19 xcpserver1 ha-lizard: 32720 vm_mon: 0 Eligible Halted VMs found
Nov 20 22:20:23 xcpserver1 ha-lizard: ha-lizard Watchdog: ha-lizard running - OK

From The Log On Server2:
Nov 20 22:22:25 xcpserver2 ha-lizard: ha-lizard Watchdog: ha-lizard running - OK
Nov 20 22:22:31 xcpserver2 ha-lizard: 12196 Spawning new instance of ha-lizard
Nov 20 22:22:31 xcpserver2 ha-lizard: Mail Spool Directory Found /dev/shm/ha-lizard-mail
Nov 20 22:22:31 xcpserver2 ha-lizard: This iteration is count 238
Nov 20 22:22:31 xcpserver2 ha-lizard: Checking if this host is a Pool Master or Slave
Nov 20 22:22:31 xcpserver2 ha-lizard: This host's pool status = slave:192.168.1.2
Nov 20 22:22:31 xcpserver2 ha-lizard: update_global_conf_params: Successfully updated global pool configuration settings in /etc/ha-lizard/ha-lizard.pool.conf.
Nov 20 22:22:31 xcpserver2 ha-lizard: update_global_conf_params: DISABLED_VAPPS=()#012ENABLE_LOGGING=1#012FENCE_ACTION=stop#012FENCE_ENABLED=1#012FENCE_FILE_LOC=/etc/ha-lizard/fence#012FENCE_HA_ONFAIL=0#012FENCE_HEURISTICS_IPS=192.168.1.1#012FENCE_HOST_FORGET=0#012FENCE_IPADDRESS=#012FENCE_METHOD=POOL#012FENCE_MIN_HOSTS=2#012FENCE_PASSWD=#012FENCE_QUORUM_REQUIRED=1#012FENCE_REBOOT_LONE_HOST=0#012FENCE_USE_IP_HEURISTICS=1#012GLOBAL_VM_HA=1#012HOST_SELECT_METHOD=0#012MAIL_FROM="root@localhost"#012MAIL_ON=1#012MAIL_SUBJECT="SYSTEM_ALERT-FROM_HOST:$HOSTNAME"#012MAIL_TO="root@localhost"#012MGT_LINK_LOSS_TOLERANCE=5#012MONITOR_DELAY=15#012MONITOR_KILLALL=1#012MONITOR_MAX_STARTS=20#012MONITOR_SCANRATE=10#012OP_MODE=2#012PROMOTE_SLAVE=1#012SLAVE_HA=1#012SLAVE_VM_STAT=0#012SMTP_PASS=""#012SMTP_PORT="25"#012SMTP_SERVER="127.0.0.1"#012SMTP_USER=""#012XAPI_COUNT=2#012XAPI_DELAY=10#012XC_FIELD_NAME='ha-lizard-enabled'#012XE_TIMEOUT=10
Nov 20 22:22:31 xcpserver2 ha-lizard: master_ip: Pool Master IP Address = 192.168.1.2
Nov 20 22:22:31 xcpserver2 ha-lizard: Validating master is still a master
Nov 20 22:22:31 xcpserver2 ha-lizard: [ /etc/ha-lizard/scripts/timeout 1 /etc/ha-lizard/scripts/host_is_slave 192.168.1.2 ]
Nov 20 22:22:32 xcpserver2 ha-lizard: This slave - xcpserver2: selected as allowed to become master: setting ALLOW_PROMOTE_MASTER=1
Nov 20 22:22:32 xcpserver2 ha-lizard: check_xapi: Pool Host 192.168.1.2 xapi status = 0
Nov 20 22:22:32 xcpserver2 ha-lizard: Mail Spool Directory Found /dev/shm/ha-lizard-mail
Nov 20 22:22:32 xcpserver2 ha-lizard: check_email_enabled: Email enabled for check_xapi
Nov 20 22:22:32 xcpserver2 ha-lizard: email: Duplicate message - not sending. Content = check_xapi: Pool Host on Server: 192.168.1.2 not responding to HTTP - manual intervention may be required
Nov 20 22:22:32 xcpserver2 ha-lizard: email: Message barred for 60 minutes
Nov 20 22:22:32 xcpserver2 ha-lizard: Pool Master NOT OK - Checking if ha-lizard is enabled in latest state file
Nov 20 22:22:32 xcpserver2 ha-lizard: Checking if ha-lizard is enabled
Nov 20 22:22:32 xcpserver2 ha-lizard: Statefile /etc/ha-lizard/state/ha_lizard_enabled found: checking if ha-lizard is enabled
Nov 20 22:22:32 xcpserver2 ha-lizard-ERROR-/etc/ha-lizard/init/ha-lizard.mon: /etc/ha-lizard/ha-lizard.sh: line 369: [: =: unary operator expected
Nov 20 22:22:32 xcpserver2 ha-lizard: ha-lizard is disabled - exiting

Please Log in or Create an account to join the conversation.

Slave Not Promoting On New Install 4 months 4 days ago #1698

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 549
Looks like halizard is disabled. Can you try enabling and then check if the status fields populate?

Please Log in or Create an account to join the conversation.

Slave Not Promoting On New Install 4 months 3 days ago #1699

  • Philippe Allaire
  • Philippe Allaire's Avatar Topic Author
  • Offline
  • Posts: 3
Like " ha-cfg enable " ? Because ha is enable, service ha-lizard is running? Please explain thank you

Please Log in or Create an account to join the conversation.

Slave Not Promoting On New Install 4 months 2 days ago #1700

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 549
Actually, the slave is reporting that ha is disabled based on its last known state. the log shows that both hosts are unable to reach eachother which is the root cause of the missing data in the status display on the slave.

There appears to be an underlying network issue which is preventing the management interfaces from reaching eachother

Please Log in or Create an account to join the conversation.

Slave Not Promoting On New Install 4 months 1 day ago #1701

  • Philippe Allaire
  • Philippe Allaire's Avatar Topic Author
  • Offline
  • Posts: 3
But when I do ha-cfg enable or disable on first node, I can the ha status change on second node, doesn't that mean they can reach each other?

Please Log in or Create an account to join the conversation.

Last edit: by Philippe Allaire.

Slave Not Promoting On New Install 4 months 1 day ago #1702

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 549
If they cannot reach each other, then the slave is also likely not able to reach the pool DB. In this case, the slave would not read the configuration change when enabling HA from the master.

I suggest you try some lower level troubleshooting. Judging by the log, chances are that the hosts cannot ping each others management interfaces. Can you test that and work up from there to see where the communication problem is?

Please Log in or Create an account to join the conversation.