Forum
Welcome, Guest
Username: Password: Remember me

TOPIC: HA-Lizard supported PV not available

HA-Lizard supported PV not available 6 days 19 hours ago #1586

Hi,

I have a problem with my Xenserver 6.5 with HA-Lizard Cluster. Due, I think, to an NFS server lock-up, one of my VM's (windows 2008) pegged CPU 0 and refused to respond to input. I force killed it, eventually using the 'destroy domain' trick to fully shut it down but it would not clear 'amber' status. Google suggested a reboot of the physical host would clear the issue. So i rebooted the physical host.

After a series of reboots, some very fervent prayers and a lot more time than seemed reasonable, the cluster came back up. In the interim, I had rebooted the NFS server and it and its SR were now available. Unfortunately, my HA-iscsi SR was not available. I was able to force it to repair and it mounted up.

However, when i tried to start the VM, I got 'the VDI is not available.' I can see the SR and its listing of VHD's but rescan fails as well.

At this point, some more googling led me to try PVScan with the following results...
[root@xen01 ~]# pvscan
  Couldn't find device with uuid Pf96EV-vSny-eQpw-SLlK-0uzf-Npqa-Uxmdc9.
  PV unknown device   VG VG_XenStorage-905cf1e0-a955-b220-feaf-e4151896e6e0   lvm2 [2.18 TB / 233.72 GB free]
  Total: 1 [2.18 TB] / in use: 1 [2.18 TB] / in no VG: 0 [0   ]

and vgscan reported...
[root@xen01 ~]# vgscan
  Reading all physical volumes.  This may take a while...
  Couldn't find device with uuid Pf96EV-vSny-eQpw-SLlK-0uzf-Npqa-Uxmdc9.
  Found volume group "VG_XenStorage-905cf1e0-a955-b220-feaf-e4151896e6e0" using metadata type lvm2

I've had absolutely no luck finding a way to recover the access to this data.

iSCSI-HA status says...
| VIRTUAL IP:             10.10.10.3 is not local                                |
| ISCSI TARGET:           tgtd is stopped [expected stopped]                     |
| DRBD ROLE:              iscsi1=Secondary                                       |
| DRBD CONNECTION:        iscsi1 in Connected state                              |
----------------------------------------------------------------------------------
Control + C to exit


---------------
| DRBD Status |
---------------
-------------------------------------------------------------------------
| version: 8.4.3 (api:1/proto:86-101)                                   |
| srcversion: 19422058F8A2D4AC0C8EF09                                   |
|  1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----   |
|     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 |
-------------------------------------------------------------------------

DRBD Seems happy...

[root@xen01 ~]# drbd-overview
  1:iscsi1/0  Connected Secondary/Primary UpToDate/UpToDate C r-----

I've got a restore from backup running in parallel but would love to recover the data intact since there was work done since the last backup.

Any suggestions or guidance would be most welcome.

---Devin
The administrator has disabled public write access.

HA-Lizard supported PV not available 3 days 1 hour ago #1588

** Issue resolved ***

I reached out to Sal via the support email and arranged for him to remote into my system. After quite a bit of muttering and exclamations of 'I have never seen THIS happen before...' Sal was able to determine that the metadata for the Physical volume housing the HA-iSCSI volume was missing. All of the data was there, CentOS just lost the instructions on how to find it.

With no small amount of trepidation and warnings of 'I'm not sure how this is going to go,' I authorized Sal to try to recreate the PV. He, as I understand it, destroyed and recreated the physical drive (its iSCSI so it didnt ACTUALLY destroy anything) using the original UUID. The allowed the rest of the LVM and suchlike to find the data and address it and restored my access to missing data.

I wouldn't wish any one this sort of failure but am very grateful to Sal for his efforts and persistence in troubleshooting and repairing this issue.

---Devin
The administrator has disabled public write access.
Time to create page: 0.072 seconds