Forum
Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC:

2-Pool Xcp-Ng 7.5 Receive System Alert mails 5 years 7 months ago #1647

Hello,

I installed ha-lizard on Xcp-NG 7.5 without errors. Drdb is synchronized and no san storage with Nvme's is usable. So the first strange thing is that slave was not configured (No master uuid/salve uuid and so on when calling ha-cfg status).

So I configure the pool with ha-cfg:
DISABLED_VAPPS=()
ENABLE_LOGGING=1
FENCE_ACTION=stop
FENCE_ENABLED=1
FENCE_FILE_LOC=/etc/ha-lizard/fence
FENCE_HA_ONFAIL=0
FENCE_HEURISTICS_IPS=10.10.100.216
FENCE_HOST_FORGET=0
FENCE_IPADDRESS=
FENCE_METHOD=POOL
FENCE_MIN_HOSTS=2
FENCE_PASSWD=
FENCE_QUORUM_REQUIRED=1
FENCE_REBOOT_LONE_HOST=0
FENCE_USE_IP_HEURISTICS=1
GLOBAL_VM_HA=0
HOST_SELECT_METHOD=0
MAIL_FROM=xcp-pool@myself.com
MAIL_ON=1
MAIL_SUBJECT="SYSTEM_ALERT-FROM_HOST:$HOSTNAME"
MAIL_TO=info@myself.com
MGT_LINK_LOSS_TOLERANCE=5
MONITOR_DELAY=15
MONITOR_KILLALL=1
MONITOR_MAX_STARTS=20
MONITOR_SCANRATE=10
OP_MODE=2
PROMOTE_SLAVE=1
SLAVE_HA=1
SLAVE_VM_STAT=0
SMTP_PASS=**Password**
SMTP_PORT=25
SMTP_SERVER=mx.myself.com
SMTP_USER=xcp-pool@myself.com
XAPI_COUNT=2
XAPI_DELAY=10
XC_FIELD_NAME='ha-lizard-enabled'
XE_TIMEOUT=15

Ha - is enabled per VM.

Status on Master:
| ha-lizard Version: 2.1.4 |
| Operating Mode: Mode [ 2 ] Managing Individual VMs in Pool |
| Host Role: master |
| Pool UUID: 48f956d5-7ca3-c1d8-9ef7-3bbb5bafeff9 |
| Host UUID: e6e0f489-6e39-4d5f-9262-76ea92162615 |
| Master UUID: e6e0f489-6e39-4d5f-9262-76ea92162615 |
| Daemon Status: ha-lizard is running [ OK ] |
| Watchdog Status: ha-lizard-watchdog is running [ OK ] |
| HA Enabled: true |
Pool HA Status: ENABLED

Status on Slave:
| ha-lizard Version: 2.1.4 |
| Operating Mode: Mode [ 2 ] Managing Individual VMs in Pool |
| Host Role: slave |
| Pool UUID: 48f956d5-7ca3-c1d8-9ef7-3bbb5bafeff9 |
| Host UUID: d02c94c8-4cb6-496b-adbb-ab4f83226779 |
| Master UUID: e6e0f489-6e39-4d5f-9262-76ea92162615 |
| Daemon Status: ha-lizard is running [ OK ] |
| Watchdog Status: ha-lizard-watchdog is running [ OK ] |
| HA Enabled: true |
Pool HA Status: ENABLED

Logs:

Master:
Aug 27 07:08:27 hyperx-01 ha-lizard: This iteration is count 5002
Aug 27 07:08:27 hyperx-01 ha-lizard: Checking if this host is a Pool Master or Slave
Aug 27 07:08:27 hyperx-01 ha-lizard: This host's pool status = master
Aug 27 07:08:27 hyperx-01 ha-lizard: Checking if ha-lizard is enabled for this pool
Aug 27 07:08:27 hyperx-01 ha-lizard: check_ha_enabled: Checking if ha-lizard is enabled for pool: 48f956d5-7ca3-c1d8-9ef7-3bbb5bafeff9
Aug 27 07:08:27 hyperx-01 ha-lizard: check_ha_enabled: ha-lizard is enabled
Aug 27 07:08:27 hyperx-01 ha-lizard: check_ha_enabled: checking whether maintenance mode enabled
Aug 27 07:08:28 hyperx-01 ha-lizard: ha-lizard is enabled
Aug 27 07:08:28 hyperx-01 ha-lizard: check_xs_ha: Checking XenServer HA status
Aug 27 07:08:28 hyperx-01 ha-lizard: update_global_conf_params: Successfully updated global pool configuration settings in /etc/ha-lizard/ha-lizard.pool.conf.
Aug 27 07:08:28 hyperx-01 ha-lizard: update_global_conf_params: DISABLED_VAPPS=()#012ENABLE_LOGGING=1#012FENCE_ACTION=stop#012FENCE_ENABLED=1#012FENCE_FILE_LOC=/etc/ha-lizard/fence#012FENCE_HA_ONFAIL=0#012FENCE_HEURISTICS_IPS=10.10.100.216#012FENCE_HOST_FORGET=0#012FENCE_IPADDRESS=#012FENCE_METHOD=POOL#012FENCE_MIN_HOSTS=2#012FENCE_PASSWD=#012FENCE_QUORUM_REQUIRED=1#012FENCE_REBOOT_LONE_HOST=0#012FENCE_USE_IP_HEURISTICS=1#012GLOBAL_VM_HA=0#012HOST_SELECT_METHOD=0#012MAIL_FROM=xcp-pool@myself.com#012MAIL_ON=1#012MAIL_SUBJECT="SYSTEM_ALERT-FROM_HOST:$HOSTNAME"#012MAIL_TO=info@myself.com#012MGT_LINK_LOSS_TOLERANCE=5#012MONITOR_DELAY=15#012MONITOR_KILLALL=1#012MONITOR_MAX_STARTS=20#012MONITOR_SCANRATE=10#012OP_MODE=2#012PROMOTE_SLAVE=1#012SLAVE_HA=1#012SLAVE_VM_STAT=0#012SMTP_PASS=**Password**#012SMTP_PORT=25#012SMTP_SERVER=mx.myself.com#012SMTP_USER=xcp-pool@myself.com#012XAPI_COUNT=2#012XAPI_DELAY=10#012XC_FIELD_NAME='ha-lizard-enabled'#012XE_TIMEOUT=15
Aug 27 07:08:28 hyperx-01 ha-lizard: check_master_mgt_link_state: Checking management interface link state
Aug 27 07:08:29 hyperx-01 ha-lizard: check_master_mgt_link_state: Link State = [ true ] for management interface IP [ 10.10.100.218 ]
Aug 27 07:08:29 hyperx-01 ha-lizard: check_master_mgt_link_state: Link [ xapi0 ] state UP
Aug 27 07:08:29 hyperx-01 ha-lizard: Master management link OK - checking prior link state
Aug 27 07:08:29 hyperx-01 ha-lizard: This host detected as pool Master
Aug 27 07:08:29 hyperx-01 ha-lizard: Found 2 hosts in pool
Aug 27 07:08:29 hyperx-01 ha-lizard: validate_vm_ha_state: Validating VM HA-state
Aug 27 07:08:29 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ e50389ce-a0d6-0bb3-a80b-96f990225daa ] state [ true ] = OK
Aug 27 07:08:29 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ b7fe7a2e-d2d0-ae39-ac33-c2749ca45566 ] state [ true ] = OK
Aug 27 07:08:29 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ dec9f7ba-4ee4-9561-7fc0-377492623689 ] state [ true ] = OK
Aug 27 07:08:29 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ bea41587-d526-2a99-738b-50d50f931063 ] state [ true ] = OK
Aug 27 07:08:29 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ 857665a2-f11f-330b-d821-83718723ef9b ] state [ true ] = OK
Aug 27 07:08:29 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ 35cd92fa-aade-338b-db2e-27055c8e73a5 ] state [ true ] = OK
Aug 27 07:08:30 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ 7bc8c228-9877-e4f4-18af-fc414c258d46 ] state [ true ] = OK
Aug 27 07:08:30 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ 8fdb4306-6e87-59c6-8579-cca6ccabcba2 ] state [ true ] = OK
Aug 27 07:08:30 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ 0b463d7d-56ba-b6b2-c215-8a7ada127e3c ] state [ true ] = OK
Aug 27 07:08:30 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ 06136cd3-95ed-cf3c-9c35-0982a41745e1 ] state [ true ] = OK
Aug 27 07:08:30 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ 28ed1bf3-4363-e5c0-8ed0-f07adec34c16 ] state [ true ] = OK
Aug 27 07:08:30 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ b50ce4fa-bbfe-0e8c-8f4a-48de8266fd6e ] state [ true ] = OK
Aug 27 07:08:30 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ 8157463f-d6db-e535-25af-b65d8cb6b22d ] state [ true ] = OK
Aug 27 07:08:30 hyperx-01 ha-lizard: validate_vm_ha_state: VM [ 0db45136-357f-f030-6cdb-8dd832bd6038 ] state [ true ] = OK
Aug 27 07:08:30 hyperx-01 ha-lizard: Calling function write_pool_state
Aug 27 07:08:30 hyperx-01 ha-lizard: 20971 Calling function autoselect_slave
Aug 27 07:08:30 hyperx-01 ha-lizard: 20976 Calling function check_slave_status
Aug 27 07:08:30 hyperx-01 ha-lizard: 20976 check_master_mgt_link_state: Checking management interface link state
Aug 27 07:08:30 hyperx-01 ha-lizard: 20971 autoselect_slave: This host UUID found: e6e0f489-6e39-4d5f-9262-76ea92162615
Aug 27 07:08:30 hyperx-01 ha-lizard: 20971 autoselect_slave: MASTER host UUID found: e6e0f489-6e39-4d5f-9262-76ea92162615
Aug 27 07:08:30 hyperx-01 ha-lizard: get_vms_on_host: Returned e50389ce-a0d6-0bb3-a80b-96f990225daa#012b7fe7a2e-d2d0-ae39-ac33-c2749ca45566#012dec9f7ba-4ee4-9561-7fc0-377492623689#012bea41587-d526-2a99-738b-50d50f931063#012857665a2-f11f-330b-d821-83718723ef9b#01235cd92fa-aade-338b-db2e-27055c8e73a5#0120b463d7d-56ba-b6b2-c215-8a7ada127e3c#0128157463f-d6db-e535-25af-b65d8cb6b22d#0120db45136-357f-f030-6cdb-8dd832bd6038
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 check_master_mgt_link_state: Link State = [ true ] for management interface IP [ 10.10.100.218 ]
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 check_master_mgt_link_state: Link [ xapi0 ] state UP
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 check_slave_status: Management link OK - continue
Aug 27 07:08:31 hyperx-01 ha-lizard: 20971 autoselect_slave: e6e0f489-6e39-4d5f-9262-76ea92162615 is Master UUID - excluding from list of available slaves
Aug 27 07:08:31 hyperx-01 ha-lizard: 20971 autoselect_slave: 1 available Slave UUIDs found: d02c94c8-4cb6-496b-adbb-ab4f83226779
Aug 27 07:08:31 hyperx-01 ha-lizard: get_vms_on_host: Returned 7bc8c228-9877-e4f4-18af-fc414c258d46#0128fdb4306-6e87-59c6-8579-cca6ccabcba2#01206136cd3-95ed-cf3c-9c35-0982a41745e1#01228ed1bf3-4363-e5c0-8ed0-f07adec34c16#012b50ce4fa-bbfe-0e8c-8f4a-48de8266fd6e
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 get_pool_host_list: returned e6e0f489-6e39-4d5f-9262-76ea92162615#012d02c94c8-4cb6-496b-adbb-ab4f83226779
Aug 27 07:08:31 hyperx-01 ha-lizard: 20971 autoselect_slave: Selected Slave: d02c94c8-4cb6-496b-adbb-ab4f83226779 = Current slave: d02c94c8-4cb6-496b-adbb-ab4f83226779 - ignoring update
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 check_slave_status: Removing Master UUID from list of Hosts
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 get_pool_ip_list: returned 10.10.100.219
Aug 27 07:08:31 hyperx-01 ha-lizard: check_ha_enabled: Checking if ha-lizard is enabled for pool: 48f956d5-7ca3-c1d8-9ef7-3bbb5bafeff9
Aug 27 07:08:31 hyperx-01 ha-lizard: check_ha_enabled: ha-lizard is enabled
Aug 27 07:08:31 hyperx-01 ha-lizard: check_ha_enabled: checking whether maintenance mode enabled
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 check_xapi: Pool Host 10.10.100.219 xapi status = 0
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 Mail Spool Directory Found /dev/shm/ha-lizard-mail
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 check_email_enabled: Email enabled for check_xapi
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 email: Duplicate message - not sending. Content = check_xapi: Pool Host on Server: 10.10.100.219 not responding to HTTP - manual intervention may be required
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 email: Message barred for 60 minutes
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 check_slave_status: Slave host [ d02c94c8-4cb6-496b-adbb-ab4f83226779 ] health status = [ failed ] - break
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 check_slave_status: Host IP Address check Status Array for Slaves = (0)
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 check_slave_status: Quorum check called
Aug 27 07:08:31 hyperx-01 ha-lizard: get_pool_host_list: enabled flag set - returning only hosts with enabled=true
Aug 27 07:08:31 hyperx-01 ha-lizard: 20976 check_quorum: Checking host IPs: 10.10.100.218 10.10.100.219
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_quorum: Host IP: 10.10.100.218 Response = OK
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_quorum: LIVE HOSTs = 1
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_quorum: Host IP: 10.10.100.219 Response = OK
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_quorum: LIVE HOSTs = 2
Aug 27 07:08:32 hyperx-01 ha-lizard: get_pool_host_list: returned e6e0f489-6e39-4d5f-9262-76ea92162615#012d02c94c8-4cb6-496b-adbb-ab4f83226779
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_quorum: Using network points: 10.10.100.216 as possible additional vote
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_quorum: Heuristic IP: 10.10.100.216 Response = OK
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_quorum: Successful Replies = 1
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 Total enpoints checked = 1 with total successful replies = 1
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_quorum: Additional heuristic vote success. Incremeting vote by 1
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_quorum: Minimum number of hosts needed to allow fencing = 1 + 1
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_quorum: 3 Hosts found. Minimum needed = 1 + 1. Fencing allowed
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_slave_status: Failed slave count = 1
Aug 27 07:08:32 hyperx-01 ha-lizard: get_pool_ip_list: returned 10.10.100.218
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_slave_status: Processing failed slave: d02c94c8-4cb6-496b-adbb-ab4f83226779 on this iteration
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 Mail Spool Directory Found /dev/shm/ha-lizard-mail
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_email_enabled: Email enabled for check_slave_status
Aug 27 07:08:32 hyperx-01 ha-lizard: get_pool_ip_list: returned 10.10.100.218 10.10.100.219
Aug 27 07:08:32 hyperx-01 ha-lizard: write_status_report: Writing status report
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 email: Duplicate message - not sending. Content = check_slave_status: Server hyperx-01: Some Pool Slaves not not responding , d02c94c8-4cb6-496b-adbb-ab4f83226779
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 email: Message barred for 60 minutes
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_slave_status: Some Pool Slaves not not responding , d02c94c8-4cb6-496b-adbb-ab4f83226779
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_slave_status: Calling function get_vms_on_host for UUID(s) d02c94c8-4cb6-496b-adbb-ab4f83226779
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_slave_status: Calling function fence_host to remove unresponsive host from pool. Failed Host(s) = d02c94c8-4cb6-496b-adbb-ab4f83226779
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 check_slave_status: fence_host d02c94c8-4cb6-496b-adbb-ab4f83226779 executed on prior iteration - host already fenced
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 Function check_slave_status Host Power = Off, calling vm_mon
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 vm_mon: ha-lizard is operating mode 2 - managing pool VMs
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 vm_mon: Retrived list of VMs for this poll: e50389ce-a0d6-0bb3-a80b-96f990225daa#012b7fe7a2e-d2d0-ae39-ac33-c2749ca45566#012dec9f7ba-4ee4-9561-7fc0-377492623689#012bea41587-d526-2a99-738b-50d50f931063#012857665a2-f11f-330b-d821-83718723ef9b#01235cd92fa-aade-338b-db2e-27055c8e73a5#0127bc8c228-9877-e4f4-18af-fc414c258d46#0128fdb4306-6e87-59c6-8579-cca6ccabcba2#0120b463d7d-56ba-b6b2-c215-8a7ada127e3c#01206136cd3-95ed-cf3c-9c35-0982a41745e1#01228ed1bf3-4363-e5c0-8ed0-f07adec34c16#012b50ce4fa-bbfe-0e8c-8f4a-48de8266fd6e#0128157463f-d6db-e535-25af-b65d8cb6b22d#0120db45136-357f-f030-6cdb-8dd832bd6038
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 vm_mon: Removing Control Domains from VM list
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 vm_mon: VM list returned = e50389ce-a0d6-0bb3-a80b-96f990225daa#012b7fe7a2e-d2d0-ae39-ac33-c2749ca45566#012dec9f7ba-4ee4-9561-7fc0-377492623689#012bea41587-d526-2a99-738b-50d50f931063#012857665a2-f11f-330b-d821-83718723ef9b#01235cd92fa-aade-338b-db2e-27055c8e73a5#0127bc8c228-9877-e4f4-18af-fc414c258d46#0128fdb4306-6e87-59c6-8579-cca6ccabcba2#0120b463d7d-56ba-b6b2-c215-8a7ada127e3c#01206136cd3-95ed-cf3c-9c35-0982a41745e1#01228ed1bf3-4363-e5c0-8ed0-f07adec34c16#012b50ce4fa-bbfe-0e8c-8f4a-48de8266fd6e#0128157463f-d6db-e535-25af-b65d8cb6b22d#0120db45136-357f-f030-6cdb-8dd832bd6038
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 vm_state: Machine state for e50389ce-a0d6-0bb3-a80b-96f990225daa returned: running
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 vm_mon: VM e50389ce-a0d6-0bb3-a80b-96f990225daa state = running
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 vm_state: Machine state for b7fe7a2e-d2d0-ae39-ac33-c2749ca45566 returned: running
Aug 27 07:08:32 hyperx-01 ha-lizard: 20976 vm_mon: VM b7fe7a2e-d2d0-ae39-ac33-c2749ca45566 state = running
Aug 27 07:08:33 hyperx-01 ha-lizard: 20976 vm_state: Machine state for dec9f7ba-4ee4-9561-7fc0-377492623689 returned: running
Aug 27 07:08:33 hyperx-01 ha-lizard: 20976 vm_mon: VM dec9f7ba-4ee4-9561-7fc0-377492623689 state = running
Aug 27 07:08:33 hyperx-01 ha-lizard: 20976 vm_state: Machine state for bea41587-d526-2a99-738b-50d50f931063 returned: running
Aug 27 07:08:33 hyperx-01 ha-lizard: 20976 vm_mon: VM bea41587-d526-2a99-738b-50d50f931063 state = running
Aug 27 07:08:33 hyperx-01 ha-lizard: 20976 vm_state: Machine state for 857665a2-f11f-330b-d821-83718723ef9b returned: running
Aug 27 07:08:33 hyperx-01 ha-lizard: 20976 vm_mon: VM 857665a2-f11f-330b-d821-83718723ef9b state = running
Aug 27 07:08:33 hyperx-01 ha-lizard: ha-lizard Watchdog: ha-lizard running - OK
Aug 27 07:08:33 hyperx-01 ha-lizard: 20976 vm_state: Machine state for 35cd92fa-aade-338b-db2e-27055c8e73a5 returned: running
Aug 27 07:08:33 hyperx-01 ha-lizard: 20976 vm_mon: VM 35cd92fa-aade-338b-db2e-27055c8e73a5 state = running
Aug 27 07:08:33 hyperx-01 ha-lizard: 20976 vm_state: Machine state for 7bc8c228-9877-e4f4-18af-fc414c258d46 returned: running
Aug 27 07:08:33 hyperx-01 ha-lizard: 20976 vm_mon: VM 7bc8c228-9877-e4f4-18af-fc414c258d46 state = running
Aug 27 07:08:33 hyperx-01 ha-lizard: 20976 vm_state: Machine state for 8fdb4306-6e87-59c6-8579-cca6ccabcba2 returned: running
Aug 27 07:08:33 hyperx-01 ha-lizard: 20976 vm_mon: VM 8fdb4306-6e87-59c6-8579-cca6ccabcba2 state = running
Aug 27 07:08:34 hyperx-01 ha-lizard: 20976 vm_state: Machine state for 0b463d7d-56ba-b6b2-c215-8a7ada127e3c returned: running
Aug 27 07:08:34 hyperx-01 ha-lizard: 20976 vm_mon: VM 0b463d7d-56ba-b6b2-c215-8a7ada127e3c state = running
Aug 27 07:08:34 hyperx-01 ha-lizard: 20976 vm_state: Machine state for 06136cd3-95ed-cf3c-9c35-0982a41745e1 returned: running
Aug 27 07:08:34 hyperx-01 ha-lizard: 20976 vm_mon: VM 06136cd3-95ed-cf3c-9c35-0982a41745e1 state = running
Aug 27 07:08:34 hyperx-01 ha-lizard: 20976 vm_state: Machine state for 28ed1bf3-4363-e5c0-8ed0-f07adec34c16 returned: running
Aug 27 07:08:34 hyperx-01 ha-lizard: 20976 vm_mon: VM 28ed1bf3-4363-e5c0-8ed0-f07adec34c16 state = running
Aug 27 07:08:34 hyperx-01 ha-lizard: 20976 vm_state: Machine state for b50ce4fa-bbfe-0e8c-8f4a-48de8266fd6e returned: running
Aug 27 07:08:34 hyperx-01 ha-lizard: 20976 vm_mon: VM b50ce4fa-bbfe-0e8c-8f4a-48de8266fd6e state = running
Aug 27 07:08:34 hyperx-01 ha-lizard: 20976 vm_state: Machine state for 8157463f-d6db-e535-25af-b65d8cb6b22d returned: running
Aug 27 07:08:34 hyperx-01 ha-lizard: 20976 vm_mon: VM 8157463f-d6db-e535-25af-b65d8cb6b22d state = running
Aug 27 07:08:34 hyperx-01 ha-lizard: 20976 vm_state: Machine state for 0db45136-357f-f030-6cdb-8dd832bd6038 returned: running
Aug 27 07:08:34 hyperx-01 ha-lizard: 20976 vm_mon: VM 0db45136-357f-f030-6cdb-8dd832bd6038 state = running
Aug 27 07:08:34 hyperx-01 ha-lizard: 20976 vm_mon: 0 Eligible Halted VMs found
Aug 27 07:08:42 hyperx-01 ha-lizard: 20665 Spawning new instance of ha-lizard
Aug 27 07:08:42 hyperx-01 ha-lizard: Mail Spool Directory Found /dev/shm/ha-lizard-mail

Slave:
Aug 27 06:58:48 hyperx-02 ha-lizard: This iteration is count 2370
Aug 27 06:58:48 hyperx-02 ha-lizard: Checking if this host is a Pool Master or Slave
Aug 27 06:58:48 hyperx-02 ha-lizard: This host's pool status = slave:10.10.100.218
Aug 27 06:58:48 hyperx-02 ha-lizard: update_global_conf_params: Successfully updated global pool configuration settings in /etc/ha-lizard/ha-lizard.pool.conf.
Aug 27 06:58:48 hyperx-02 ha-lizard: update_global_conf_params: DISABLED_VAPPS=()#012ENABLE_LOGGING=1#012FENCE_ACTION=stop#012FENCE_ENABLED=1#012FENCE_FILE_LOC=/etc/ha-lizard/fence#012FENCE_HA_ONFAIL=0#012FENCE_HEURISTICS_IPS=10.10.100.216#012FENCE_HOST_FORGET=0#012FENCE_IPADDRESS=#012FENCE_METHOD=POOL#012FENCE_MIN_HOSTS=2#012FENCE_PASSWD=#012FENCE_QUORUM_REQUIRED=1#012FENCE_REBOOT_LONE_HOST=0#012FENCE_USE_IP_HEURISTICS=1#012GLOBAL_VM_HA=0#012HOST_SELECT_METHOD=0#012MAIL_FROM=xcp-pool@myself.com#012MAIL_ON=1#012MAIL_SUBJECT="SYSTEM_ALERT-FROM_HOST:$HOSTNAME"#012MAIL_TO=info@myself.com#012MGT_LINK_LOSS_TOLERANCE=5#012MONITOR_DELAY=15#012MONITOR_KILLALL=1#012MONITOR_MAX_STARTS=20#012MONITOR_SCANRATE=10#012OP_MODE=2#012PROMOTE_SLAVE=1#012SLAVE_HA=1#012SLAVE_VM_STAT=0#012SMTP_PASS=**Password**#012SMTP_PORT=25#012SMTP_SERVER=mx.myself.com#012SMTP_USER=xcp-pool@myself.com#012XAPI_COUNT=2#012XAPI_DELAY=10#012XC_FIELD_NAME='ha-lizard-enabled'#012XE_TIMEOUT=15
Aug 27 06:58:48 hyperx-02 ha-lizard: master_ip: Pool Master IP Address = 10.10.100.218
Aug 27 06:58:48 hyperx-02 ha-lizard: Validating master is still a master
Aug 27 06:58:48 hyperx-02 ha-lizard: [ /etc/ha-lizard/scripts/timeout 1 /etc/ha-lizard/scripts/host_is_slave 10.10.100.218 ]
Aug 27 06:58:49 hyperx-02 ha-lizard: This slave- hyperx-02: d02c94c8-4cb6-496b-adbb-ab4f83226779 not permitted to become master
Aug 27 06:58:49 hyperx-02 ha-lizard: check_xapi: Pool Host 10.10.100.218 xapi status = 0
Aug 27 06:58:49 hyperx-02 ha-lizard: Mail Spool Directory Found /dev/shm/ha-lizard-mail
Aug 27 06:58:49 hyperx-02 ha-lizard: check_email_enabled: Email enabled for check_xapi
Aug 27 06:58:50 hyperx-02 ha-lizard: email: Duplicate message - not sending. Content = check_xapi: Pool Host on Server: 10.10.100.218 not responding to HTTP - manual intervention may be required
Aug 27 06:58:50 hyperx-02 ha-lizard: email: Message barred for 60 minutes
Aug 27 06:58:50 hyperx-02 ha-lizard: Pool Master NOT OK - Checking if ha-lizard is enabled in latest state file
Aug 27 06:58:50 hyperx-02 ha-lizard: Checking if ha-lizard is enabled
Aug 27 06:58:50 hyperx-02 ha-lizard: Statefile /etc/ha-lizard/state/ha_lizard_enabled found: checking if ha-lizard is enabled
Aug 27 06:58:50 hyperx-02 ha-lizard: ha-lizard is enabled - continuing
Aug 27 06:58:50 hyperx-02 ha-lizard: Pool Master Monitor = Failed
Aug 27 06:58:50 hyperx-02 ha-lizard: Mail Spool Directory Found /dev/shm/ha-lizard-mail
Aug 27 06:58:50 hyperx-02 ha-lizard: email: Duplicate message - not sending. Content = Server hyperx-02: Failed to contact pool master - manual intervention may be required
Aug 27 06:58:50 hyperx-02 ha-lizard: email: Message barred for 60 minutes
Aug 27 06:58:50 hyperx-02 ha-lizard: Retry Count set to 2. Retrying 2 times in 10 second intervals..
Aug 27 06:58:50 hyperx-02 ha-lizard: Attempt 0: Checking Pool Master Status
Aug 27 06:58:52 hyperx-02 ha-lizard: ha-lizard Watchdog: ha-lizard running - OK
Aug 27 06:59:00 hyperx-02 ha-lizard: check_xapi: Pool Host 10.10.100.218 xapi status = 0
Aug 27 06:59:00 hyperx-02 ha-lizard: Mail Spool Directory Found /dev/shm/ha-lizard-mail
Aug 27 06:59:00 hyperx-02 ha-lizard: check_email_enabled: Email enabled for check_xapi
Aug 27 06:59:00 hyperx-02 ha-lizard: email: Duplicate message - not sending. Content = check_xapi: Pool Host on Server: 10.10.100.218 not responding to HTTP - manual intervention may be required
Aug 27 06:59:00 hyperx-02 ha-lizard: email: Message barred for 60 minutes
Aug 27 06:59:00 hyperx-02 ha-lizard: Attempt 1: Checking Pool Master Status
Aug 27 06:59:02 hyperx-02 ha-lizard: ha-lizard Watchdog: ha-lizard running - OK
Aug 27 06:59:02 hyperx-02 ha-lizard: 31149 ha-lizard already running: Attempt 1 on PIDS: 31149
Aug 27 06:59:10 hyperx-02 ha-lizard: check_xapi: Pool Host 10.10.100.218 xapi status = 0
Aug 27 06:59:10 hyperx-02 ha-lizard: Mail Spool Directory Found /dev/shm/ha-lizard-mail
Aug 27 06:59:10 hyperx-02 ha-lizard: check_email_enabled: Email enabled for check_xapi
Aug 27 06:59:10 hyperx-02 ha-lizard: email: Duplicate message - not sending. Content = check_xapi: Pool Host on Server: 10.10.100.218 not responding to HTTP - manual intervention may be required
Aug 27 06:59:10 hyperx-02 ha-lizard: email: Message barred for 60 minutes
Aug 27 06:59:10 hyperx-02 ha-lizard: ERROR Retrieving number of hosts in pool. Setting NUM_HOSTS = UNKNOWN
Aug 27 06:59:10 hyperx-02 ha-lizard: Failed to reach Pool Master - Checking if this host promotes to Master..
Aug 27 06:59:10 hyperx-02 ha-lizard: PROMOTE_SLAVE is disabled - Not Promoting this host - Manual Intervention Needed
Aug 27 06:59:13 hyperx-02 ha-lizard: ha-lizard Watchdog: ha-lizard running - OK
Aug 27 06:59:13 hyperx-02 ha-lizard: 31149 Spawning new instance of ha-lizard
Aug 27 06:59:13 hyperx-02 ha-lizard: Mail Spool Directory Found /dev/shm/ha-lizard-mail

So I Receive in not predictable intervalls (10- 70 minutes) mails like:

check_slave_status: Server hyperx-01: Some Pool Slaves not not responding , d02c94c8-4cb6-496b-adbb-ab4f83226779

and

Server hyperx-02: Failed to contact pool master - manual intervention may be required

so what should I do to configure ha-lizard correctly?

Kind regards,
Daniel

Please Log in or Create an account to join the conversation.

Last edit: by Danger. Reason: Append logs

2-Pool Xcp-Ng 7.5 Receive System Alert mails 5 years 7 months ago #1648

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Hi Daniel
The configuration looks OK. The issue appears to be a sporadic network problem causing a break in communication between the master and the slave. In the master's log, the connection to the slave was restored in about 1 second.

Interestingly, I saw a similar report about a week ago, also on xs 7.5 I'm not sure whether there is an issue with the latest xs/xcp release or this is truly a network issue.

Can you try monitoring with a rapid ping from a host on the same subnet to both the master and the slave to see if there is a break in communicaiton?

Please Log in or Create an account to join the conversation.

2-Pool Xcp-Ng 7.5 Receive System Alert mails 5 years 7 months ago #1649

Hello,

yeah sure, I monitor a ping to Netgear M7100 Switch and the each host from the other, let's see whats happened.

Another strange thing is slave is in diskless state, when i use cat /proc/drbd. in XenCenter it shows connected to both hosts. but the backups on slave hosts are broken an not restorable.

So I moved all machines to nfs share and on slave to reinstall ha-lizard after we see something in ping monitoring.

Kind regards,
Daniel

Please Log in or Create an account to join the conversation.

2-Pool Xcp-Ng 7.5 Receive System Alert mails 5 years 7 months ago #1650

Hello,

so I started the ping tests an here are the results:

ping hyperx-01 from hyperx-02: almost less 1ms, sometine 1ms
ping hyperx-02 from hyperx-01: almost less 1ms, sometine 1ms
ping switch from hyperx-01 which is heuristicIP: almost less than 1ms, sometimes 1-4ms
ping switch from hyperx-02 which is heuristicIP: almost less than 1ms, sometimes 1-4ms
ping google 8.8.8.8 from hyperx-01 almost 8ms, sometimes 10-16ms
ping google 8.8.8.8 from hyperx-02 almost 8ms, sometimes 10-16ms

So I suggested there is no network issue, isn't it?

Any ideas? I could offer to make an appointment to look live via TeamViewer onto the machines, if you want.

Kind regards,
Daniel

Please Log in or Create an account to join the conversation.

2-Pool Xcp-Ng 7.5 Receive System Alert mails 5 years 7 months ago #1651

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Did you run the ping test continually until the next occurrence of the email alert? The point of the test would be to see, from a node outside of your pool, whether there is a network issue that is observable at the same time as the momentary drops in connectivity.

Please Log in or Create an account to join the conversation.

2-Pool Xcp-Ng 7.5 Receive System Alert mails 5 years 6 months ago #1652

Hello,

yes I monitored 3 hours of pings to each destination, no paket loss at any time and many mail from ha-lizard. Actually my master crashes and I had some other fun :-)....

So we monitored our infrastructure from outside our collocation in datacenter to see if services crashes and are not reachable, also no alerts occured.

So the machines are available from inside and outside the subnet where the pool is placed. So I would say not connection issues caused by network.

Kind regards,
Daniel

Please Log in or Create an account to join the conversation.

  • Page:
  • 1
  • 2