Forum
Welcome, Guest
Username: Password: Remember me

TOPIC:

drbd status - diskless 4 years 2 months ago #1961

  • gerry kernan
  • gerry kernan's Avatar Topic Author
  • Offline
  • Posts: 16
hi

just noticed that one of our 2 node setups has stopped replication via DRBD . primary is showing as diskless .
is there a procedure to recover from this . i'm guess that the secondary node will be out of sync so i cant fail-over to that to run the VM's

[root@xensrva ~]# cat /proc/drbd
version: 8.4.5 (api:1/proto:86-101)
srcversion: 2A6B2FA4F0703B49CA9C727

1: cs:Connected ro:Primary/Secondary ds:Diskless/UpToDate C r
ns:0 nr:8505572 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Please Log in or Create an account to join the conversation.

drbd status - diskless 4 years 2 months ago #1963

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
if no changes (maintenance, reboots, etc) have been made and the block device shows as diskless, it may be a failing disk.

Chances are your VMs are not working either. If that is the case, the secondary would have your most up to date data, but, do not take my word on that. You will need to confirm this before invalidating any data.

If this is the case, you can enter manual mode and promote the slave to the primary mode, however, this could fail depending on which disk is seen as most up to date by drbd.

If you are certain that your disk is OK, you can try to detach and reattach the disk from the drbdadm utility.

you could also try re-initiating all services on the master.
service iscsi-ha-watchdog stop
service iscsi-ha stop
service drbd stop
service tgtd stop

once everything is stopped, you can re-start the storage controller daemon which will ensure that everything comes up in the correct order.
service iscsi-ha start

Please Log in or Create an account to join the conversation.

drbd status - diskless 4 years 2 months ago #1964

  • gerry kernan
  • gerry kernan's Avatar Topic Author
  • Offline
  • Posts: 16
Hi Salvatore
thanks for getting back to me so quick .
think i found the problem . had lots of these entries in user.log about duplicate PV

Jan 8 11:52:03 xensrva iscsi-ha: 11977 check_drbd_resource_state: DRBD Resource: iscsi1 in Primary mode
Jan 8 11:52:03 xensrva iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: Found duplicate PV eCCVC9K8ncE0V3HlYONKdikdtnYUXRUE: using /dev/drbd1 not /dev/sdb
Jan 8 11:52:03 xensrva iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: Using duplicate PV /dev/drbd1 from subsystem DRBD, ignoring /dev/sdb
Jan 8 11:52:13 xensrva iscsi-ha: 12198 validate_drbd_resources_loaded: Checking DRBD has loaded with resources. Checking [ 5 ] > [ 2 ]


i found that i had mistake in lvm.conf i had
filter = [ "r|/dev/sdb|", "a|sd.*|", "r|/dev/VG_Xen.*/*|", "r|/dev/drbd.*|", "r|/dev/VGX.*|"]

would this be causing the issue.
ive corrected it to
filter = [ "r|/dev/sdb|", "r|sd.*|", "r|/dev/VG_Xen.*/*|", "r|/dev/drbd.*|", "r|/dev/VGX.*|"]

Gerry

Please Log in or Create an account to join the conversation.

drbd status - diskless 4 years 2 months ago #1965

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Hi Gerry,
Yes, an incorrect filter would cause LVM to read signatures from both the DRBD device and the underlying block device, which are the same.

FYI - you will also need to update the filter in /etc/lvm/master/lvm.conf to match

Please Log in or Create an account to join the conversation.

drbd status - diskless 4 years 3 weeks ago #1990

Dear Salvatore,

I had discovered today during a check of our two node HA-Lizard cluster that the primary node was in diskless status. The dmesg log shows the root cause of the error and the default action of DRBD to detach and to go into the diskless status.

So far so good. However, what I would have expected from the scsi-cfg watchdog that it would report (alert) the change from its status. Is this something achievable? If yes, could you advise me how to reach that goal or direct me in the right direction...

drbd1: local READ IO error sector 115222944+8 on sdb
Fri Jan 31 09:42:33 2020 block drbd1: disk( UpToDate -> Failed )
Fri Jan 31 09:42:33 2020 block drbd1: Local IO failed in __req_mod. Detaching...
Fri Jan 31 09:42:33 2020 block drbd1: local READ IO error sector 115222952+8 on sdb
Fri Jan 31 09:42:33 2020 block drbd1: local READ IO error sector 115222960+8 on sdb
Fri Jan 31 09:42:33 2020 block drbd1: local READ IO error sector 115222968+8 on sdb
Fri Jan 31 09:42:33 2020 block drbd1: local READ IO error sector 115222976+8 on sdb
Fri Jan 31 09:42:33 2020 block drbd1: bitmap WRITE of 1 pages took 2 jiffies
Fri Jan 31 09:42:33 2020 block drbd1: 4 KB (1 bits) marked out-of-sync by on disk bit-map.
Fri Jan 31 09:42:33 2020 block drbd1: disk( Failed -> Diskless )

BR Andreas

Please Log in or Create an account to join the conversation.

drbd status - diskless 4 years 3 weeks ago #1991

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
HA-Lizard will alert on this condition and also disk failures, however, alerts must be enabled. Regarding your specific alert, this would trigger an alert hourly until the condition has been resolved.

You can configure email alerts with the following settings:

1) Ensure you have this line in /etc/iscsi-ha/iscsi-ha.conf on both hosts
MAIL_USE_SHARED_PARAMS=1


2) For email alerts the following parameters need to be configured. Once the above has been satisfied, the below email settings can be set one time from any host using the ha-cfg CLI tool. Once set, all hosts in the pool will receive the same settings.
MAIL_FROM=from_email_address
MAIL_ON=1
MAIL_SUBJECT="SYSTEM_ALERT-FROM_HOST:$HOSTNAME"
MAIL_TO=to_email_address
SMTP_PASS=smtp_password
SMTP_PORT=smtp_server_port
SMTP_SERVER=smtp_host
SMTP_USER=smtup_user_name

You can also optionally enable all alerts to also flow through XenCenter/XCP-ngCenter by setting the below: This will display all alerts within XenCenter
ha-cfg set enable_alerts 1

Hope this helps.

Please Log in or Create an account to join the conversation.