Jump to content
Welcome to our new Citrix community!
  • 0

XenServer 7.6: SR Rescan VDI not available & The snapshot chain is too long


Ove Starckjohann

Question

Hello!

 

Between dec, 27th and 28th *something* happend to our XenServer 7.6 system (3 hosts) with multipath attached iSCSI-StorageRepositories.

As we encountered hanging VMs we restarted the master host and everything seemed fine.

But later we realized that scheduled snapshots of ONE VM are not created anymore. After shutting down this VM it did not restart with error "VDI is not available". Reverting to the latest snapshot did not work because it seems broken. Creating a new VM from the 2nd newest snapshot worked and new VM is up and running. After that we deleted the original VM including the broken VDI. But it seems that this broken VDI still is available *somewhere* and may be there are more problems within the SR.

Some days later more and more VMs state "The snapshot chain is too long" and no new snapshots can be created from VMs using VDIs residing on that problematic SR.

When trying to "SR rescan" on this SR we get error "VDI is not available" in XenCenter.

Looking into /var/log/SMLog we see:

Jan 11 04:03:58 XS03 SM: [29063] LVMCache: refreshing
Jan 11 04:03:58 XS03 SM: [29063] ['/sbin/lvs', '--noheadings', '--units', 'b', '-o', '+lv_tags', '/dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f9
80ec4c']
Jan 11 04:03:58 XS03 SM: [29063]   pread SUCCESS
Jan 11 04:03:58 XS03 SM: [29063] ['/usr/bin/vhd-util', 'scan', '-f', '-c', '-m', 'VHD-*', '-l', 'VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c
']
Jan 11 04:03:59 XS03 SM: [29063]   pread SUCCESS
Jan 11 04:03:59 XS03 SM: [29063] ***** VHD scan error: vhd=VHD-043c7546-8e65-471a-b5be-e2b279634763 scan-error=-22 error-message='invalid footer'
Jan 11 04:03:59 XS03 SM: [29063] *** vhd-scan error: 043c7546-8e65-471a-b5be-e2b279634763
Jan 11 04:03:59 XS03 SM: [29063] Raising exception [46, The VDI is not available [opterr=Error scanning VDI 043c7546-8e65-471a-b5be-e2b279634763]]
Jan 11 04:03:59 XS03 SM: [29063] lock: released /var/lock/sm/6505febe-9166-9aaa-eb2b-33b7f980ec4c/sr
Jan 11 04:03:59 XS03 SM: [29063] ***** generic exception: sr_scan: EXCEPTION <class 'SR.SROSError'>, The VDI is not available [opterr=Error scanning VDI 043c7546-8e65-471a-b5be-e2b279634763]
Jan 11 04:03:59 XS03 SM: [29063]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
Jan 11 04:03:59 XS03 SM: [29063]     return self._run_locked(sr)
Jan 11 04:03:59 XS03 SM: [29063]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
Jan 11 04:03:59 XS03 SM: [29063]     rv = self._run(sr, target)
Jan 11 04:03:59 XS03 SM: [29063]   File "/opt/xensource/sm/SRCommand.py", line 358, in _run
Jan 11 04:03:59 XS03 SM: [29063]     return sr.scan(self.params['sr_uuid'])
Jan 11 04:03:59 XS03 SM: [29063]   File "/opt/xensource/sm/LVMoISCSISR", line 535, in scan
Jan 11 04:03:59 XS03 SM: [29063]     LVHDSR.LVHDSR.scan(self, sr_uuid)
Jan 11 04:03:59 XS03 SM: [29063]   File "/opt/xensource/sm/LVHDSR.py", line 690, in scan
Jan 11 04:03:59 XS03 SM: [29063]     self._loadvdis()
Jan 11 04:03:59 XS03 SM: [29063]   File "/opt/xensource/sm/LVHDSR.py", line 875, in _loadvdis
Jan 11 04:03:59 XS03 SM: [29063]     opterr='Error scanning VDI %s' % uuid)
Jan 11 04:03:59 XS03 SM: [29063]
Jan 11 04:03:59 XS03 SM: [29063] ***** LVHD over iSCSI: EXCEPTION <class 'SR.SROSError'>, The VDI is not available [opterr=Error scanning VDI 043c7546-8e65-471a-b5be-e2b279634763]
Jan 11 04:03:59 XS03 SM: [29063]   File "/opt/xensource/sm/SRCommand.py", line 372, in run
Jan 11 04:03:59 XS03 SM: [29063]     ret = cmd.run(sr)
Jan 11 04:03:59 XS03 SM: [29063]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
Jan 11 04:03:59 XS03 SM: [29063]     return self._run_locked(sr)
Jan 11 04:03:59 XS03 SM: [29063]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
Jan 11 04:03:59 XS03 SM: [29063]     rv = self._run(sr, target)
Jan 11 04:03:59 XS03 SM: [29063]   File "/opt/xensource/sm/SRCommand.py", line 358, in _run
Jan 11 04:03:59 XS03 SM: [29063]     return sr.scan(self.params['sr_uuid'])
Jan 11 04:03:59 XS03 SM: [29063]   File "/opt/xensource/sm/LVMoISCSISR", line 535, in scan
Jan 11 04:03:59 XS03 SM: [29063]     LVHDSR.LVHDSR.scan(self, sr_uuid)
Jan 11 04:03:59 XS03 SM: [29063]   File "/opt/xensource/sm/LVHDSR.py", line 690, in scan
Jan 11 04:03:59 XS03 SM: [29063]     self._loadvdis()
Jan 11 04:03:59 XS03 SM: [29063]   File "/opt/xensource/sm/LVHDSR.py", line 875, in _loadvdis
Jan 11 04:03:59 XS03 SM: [29063]     opterr='Error scanning VDI %s' % uuid)
Jan 11 04:03:59 XS03 SM: [29063]
Jan 11 04:03:59 XS03 SM: [29063] lock: closed /var/lock/sm/6505febe-9166-9aaa-eb2b-33b7f980ec4c/sr
Jan 11 04:04:00 XS03 SM: [29263] MPATH: Time wait is done
Jan 11 04:04:00 XS03 SM: [29263] Matched SCSIid, updating 2684c337a4851424f
Jan 11 04:04:00 XS03 SM: [29263] MPATH: Updating entry for [2684c337a4851424f], current: [2, 2]
  ...
  ...
Jan 11 05:04:37 XS03 SM: [21965] Refcount for lvm-6505febe-9166-9aaa-eb2b-33b7f980ec4c:d2a298e8-7316-43a0-9d52-e48c616428b9 set => (1, 0b)
Jan 11 05:04:37 XS03 SM: [21965] lock: released /var/lock/sm/lvm-6505febe-9166-9aaa-eb2b-33b7f980ec4c/d2a298e8-7316-43a0-9d52-e48c616428b9
Jan 11 05:04:37 XS03 SM: [21965] lock: closed /var/lock/sm/lvm-6505febe-9166-9aaa-eb2b-33b7f980ec4c/d2a298e8-7316-43a0-9d52-e48c616428b9
Jan 11 05:04:37 XS03 SM: [21965] ***** generic exception: vdi_snapshot: EXCEPTION <class 'SR.SROSError'>, The snapshot chain is too long
Jan 11 05:04:37 XS03 SM: [21965]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
Jan 11 05:04:37 XS03 SM: [21965]     return self._run_locked(sr)
Jan 11 05:04:37 XS03 SM: [21965]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
Jan 11 05:04:37 XS03 SM: [21965]     rv = self._run(sr, target)
Jan 11 05:04:37 XS03 SM: [21965]   File "/opt/xensource/sm/SRCommand.py", line 254, in _run
Jan 11 05:04:37 XS03 SM: [21965]     return target.snapshot(self.params['sr_uuid'], self.vdi_uuid)
Jan 11 05:04:37 XS03 SM: [21965]   File "/opt/xensource/sm/VDI.py", line 414, in snapshot
Jan 11 05:04:37 XS03 SM: [21965]     secondary=secondary, cbtlog=cbtlog)
Jan 11 05:04:37 XS03 SM: [21965]   File "/opt/xensource/sm/LVHDSR.py", line 1690, in _do_snapshot
Jan 11 05:04:37 XS03 SM: [21965]     snapResult = self._snapshot(snapType, cloneOp, cbtlog, consistency_state)
Jan 11 05:04:37 XS03 SM: [21965]   File "/opt/xensource/sm/LVHDSR.py", line 1734, in _snapshot
Jan 11 05:04:37 XS03 SM: [21965]     raise xs_errors.XenError('SnapshotChainTooLong')
Jan 11 05:04:37 XS03 SM: [21965]
Jan 11 05:04:37 XS03 SM: [21965] ***** LVHD over iSCSI: EXCEPTION <class 'SR.SROSError'>, The snapshot chain is too long
Jan 11 05:04:37 XS03 SM: [21965]   File "/opt/xensource/sm/SRCommand.py", line 372, in run
Jan 11 05:04:37 XS03 SM: [21965]     ret = cmd.run(sr)
Jan 11 05:04:37 XS03 SM: [21965]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
Jan 11 05:04:37 XS03 SM: [21965]     return self._run_locked(sr)
Jan 11 05:04:37 XS03 SM: [21965]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
Jan 11 05:04:37 XS03 SM: [21965]     rv = self._run(sr, target)
Jan 11 05:04:37 XS03 SM: [21965]   File "/opt/xensource/sm/SRCommand.py", line 254, in _run
Jan 11 05:04:37 XS03 SM: [21965]     return target.snapshot(self.params['sr_uuid'], self.vdi_uuid)
Jan 11 05:04:37 XS03 SM: [21965]   File "/opt/xensource/sm/VDI.py", line 414, in snapshot
Jan 11 05:04:37 XS03 SM: [21965]     secondary=secondary, cbtlog=cbtlog)
Jan 11 05:04:37 XS03 SM: [21965]   File "/opt/xensource/sm/LVHDSR.py", line 1690, in _do_snapshot
Jan 11 05:04:37 XS03 SM: [21965]     snapResult = self._snapshot(snapType, cloneOp, cbtlog, consistency_state)
Jan 11 05:04:37 XS03 SM: [21965]   File "/opt/xensource/sm/LVHDSR.py", line 1734, in _snapshot
Jan 11 05:04:37 XS03 SM: [21965]     raise xs_errors.XenError('SnapshotChainTooLong')
Jan 11 05:04:37 XS03 SM: [21965]
Jan 11 05:04:37 XS03 SM: [21965] lock: closed /var/lock/sm/6505febe-9166-9aaa-eb2b-33b7f980ec4c/sr

The VDI 043c7546-8e65-471a-b5be-e2b279634763 was the defective VDI which is already deleted - but somewhere "hanging around" and filling SMLog. 

xe vdi-list is NOT showing the VDI 043c7546-8e65-471a-b5be-e2b279634763 anymore.

The SR 6505febe-9166-9aaa-eb2b-33b7f980ec4c seems not healthy in my opinion.

When trying to live migrate a VDI from that SR to another SR it is NOT working with error "VDI is not available".

When shutting down a VM (having a VDI from that SR)  and doing a COPY VM to another SR this is working.

 

Question is: (How) Is it possible to remove the broken VDI which is not needed by me anymore and so reenable Coalesce Processing (and snapshots) on this SR again?

 

 

Ove

 

Link to comment

Recommended Posts

  • 0

Hi!

The VDI is not available under Xen anymore:

[root@XS03 ~]# xe vdi-list|grep 043c7546-8e65-471a-b5be-e2b279634763
[root@XS03 ~]#
 

 

But inside the /var/log/SMLog i still see it being complained:

 

Jan 12 08:52:02 XS03 SM: [23559] ['/sbin/vgs', '--noheadings', '--nosuffix', '--units', 'b', 'VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c']
Jan 12 08:52:02 XS03 SM: [23559]   pread SUCCESS
Jan 12 08:52:02 XS03 SM: [23559] LVMCache: refreshing
Jan 12 08:52:02 XS03 SM: [23559] ['/sbin/lvs', '--noheadings', '--units', 'b', '-o', '+lv_tags', '/dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f9
80ec4c']
Jan 12 08:52:02 XS03 SM: [23559]   pread SUCCESS
Jan 12 08:52:02 XS03 SM: [23559] ['/usr/bin/vhd-util', 'scan', '-f', '-c', '-m', 'VHD-*', '-l', 'VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c
']
Jan 12 08:52:03 XS03 SM: [23559]   pread SUCCESS
Jan 12 08:52:03 XS03 SM: [23559] ***** VHD scan error: vhd=VHD-043c7546-8e65-471a-b5be-e2b279634763 scan-error=-22 error-message='invalid footer'
Jan 12 08:52:03 XS03 SM: [23559] *** vhd-scan error: 043c7546-8e65-471a-b5be-e2b279634763
Jan 12 08:52:03 XS03 SM: [23559] Raising exception [46, The VDI is not available [opterr=Error scanning VDI 043c7546-8e65-471a-b5be-e2b279634763]]
Jan 12 08:52:03 XS03 SM: [23559] lock: released /var/lock/sm/6505febe-9166-9aaa-eb2b-33b7f980ec4c/sr
Jan 12 08:52:03 XS03 SM: [23559] ***** generic exception: sr_scan: EXCEPTION <class 'SR.SROSError'>, The VDI is not available [opterr=Error scannin
g VDI 043c7546-8e65-471a-b5be-e2b279634763]
Jan 12 08:52:03 XS03 SM: [23559]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
Jan 12 08:52:03 XS03 SM: [23559]     return self._run_locked(sr)
Jan 12 08:52:03 XS03 SM: [23559]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
Jan 12 08:52:03 XS03 SM: [23559]     rv = self._run(sr, target)
Jan 12 08:52:03 XS03 SM: [23559]   File "/opt/xensource/sm/SRCommand.py", line 358, in _run
Jan 12 08:52:03 XS03 SM: [23559]     return sr.scan(self.params['sr_uuid'])
Jan 12 08:52:03 XS03 SM: [23559]   File "/opt/xensource/sm/LVMoISCSISR", line 535, in scan
Jan 12 08:52:03 XS03 SM: [23559]     LVHDSR.LVHDSR.scan(self, sr_uuid)
Jan 12 08:52:03 XS03 SM: [23559]   File "/opt/xensource/sm/LVHDSR.py", line 690, in scan
Jan 12 08:52:03 XS03 SM: [23559]     self._loadvdis()
Jan 12 08:52:03 XS03 SM: [23559]   File "/opt/xensource/sm/LVHDSR.py", line 875, in _loadvdis
Jan 12 08:52:03 XS03 SM: [23559]     opterr='Error scanning VDI %s' % uuid)
Jan 12 08:52:03 XS03 SM: [23559]
Jan 12 08:52:03 XS03 SM: [23559] ***** LVHD over iSCSI: EXCEPTION <class 'SR.SROSError'>, The VDI is not available [opterr=Error scanning VDI 043c7546-8e65-471a-b5be-e2b279634763]
Jan 12 08:52:03 XS03 SM: [23559]   File "/opt/xensource/sm/SRCommand.py", line 372, in run
Jan 12 08:52:03 XS03 SM: [23559]     ret = cmd.run(sr)
Jan 12 08:52:03 XS03 SM: [23559]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
Jan 12 08:52:03 XS03 SM: [23559]     return self._run_locked(sr)
Jan 12 08:52:03 XS03 SM: [23559]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
Jan 12 08:52:03 XS03 SM: [23559]     rv = self._run(sr, target)
Jan 12 08:52:03 XS03 SM: [23559]   File "/opt/xensource/sm/SRCommand.py", line 358, in _run
Jan 12 08:52:03 XS03 SM: [23559]     return sr.scan(self.params['sr_uuid'])
Jan 12 08:52:03 XS03 SM: [23559]   File "/opt/xensource/sm/LVMoISCSISR", line 535, in scan
Jan 12 08:52:03 XS03 SM: [23559]     LVHDSR.LVHDSR.scan(self, sr_uuid)
Jan 12 08:52:03 XS03 SM: [23559]   File "/opt/xensource/sm/LVHDSR.py", line 690, in scan
Jan 12 08:52:03 XS03 SM: [23559]     self._loadvdis()
Jan 12 08:52:03 XS03 SM: [23559]   File "/opt/xensource/sm/LVHDSR.py", line 875, in _loadvdis
Jan 12 08:52:03 XS03 SM: [23559]     opterr='Error scanning VDI %s' % uuid)
Jan 12 08:52:03 XS03 SM: [23559]
Jan 12 08:52:03 XS03 SM: [23559] lock: closed /var/lock/sm/6505febe-9166-9aaa-eb2b-33b7f980ec4c/sr

 

So my question is: How do i get rid of this error and let Xen and the storage rep complete forget about this (defective?) VDI ?

 

Ove

Link to comment
  • 0

Is there an orphan VBD? Shouldn't be if the storage was destroyed properly. I assume an sr-scan has no effect on this, which is essentially what a vhd-utility scan does. Maybe there's an error in the LVM database, is what it appears to indicate based on the error messages.  This is a very useful comprehensive LVM touubleshooting guide which may be beneficial to you: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Cluster_Logical_Volume_Manager/troubleshooting.html

 

-=Tobias

Link to comment
  • 0

Hi!

 

On no Xenhost i do find a VBD for the (not anymore existing) VDI 043c7546-8e65-471a-b5be-e2b279634763.

[root@XS03 /]# xe vbd-list |grep 043c7546
[root@XS03 /]#
 

If running an lvmdump i do only find "043c7546-8e65-471a-b5be-e2b279634763" in one place in the tgz-archive: /lvm/backup/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c:

 

...

        VHD-043c7546-8e65-471a-b5be-e2b279634763 {
            id = "Fl1AEd-5y6A-Lf5f-Babk-YTi7-fmhh-RcoVr6"
            status = ["READ", "WRITE", "VISIBLE"]
            flags = []
            creation_host = "XS03"
            creation_time = 1577593303    # 2019-12-29 05:21:43 +0100
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 128252    # 500.984 Gigabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv0", 7887527
                ]
            }
        }
...

 

The linked redhat article unfortunately does not light me up.

I still have no clue how to get rid of this ghosting VDI to be able to use the SR "normally".

Being forced to do an offline COPY-VM for all affected VMs (with VDIs on that SR) will be a time-consuming job with many DOWN-hours for the users...

 

Ove

 

Link to comment
  • 0

Hi!

No - that VM does not exist any more.

But as I wrote we have some other VMs (with VDIs on that SR) complaining about not being able to be snapshotted due to "The snapshot chain is too long"

When trying to (online) migrate such a VM (or better VDI) to another SR this fails with "VDI is not available". Doing an offline COPY-VM of such a VM succeeds.

When looking to the output of 

vhd-util scan –f –m "VHD-*" –l VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c
it looks a bit horrible to me becasue of that many parents of many VHDs.

This is an example of one data-disk of our exchange-server:

vhd=VHD-6b63b509-8e13-4cf0-8de5-45942e2e64fc capacity=1610612736000 size=1054502551552 hidden=1 parent=none
   vhd=VHD-67343f4f-2b82-4cda-8d7b-cb8a83c01e9e capacity=1610612736000 size=8388608 hidden=0 parent=VHD-6b63b509-8e13-4cf0-8de5-45942e2e64fc
   vhd=VHD-3e54aa80-56fc-463b-85c3-a0b446fc3d4f capacity=1610612736000 size=69206016000 hidden=1 parent=VHD-6b63b509-8e13-4cf0-8de5-45942e2e64fc
      vhd=VHD-d1048ae8-23d5-4c5a-a996-756f1161cb05 capacity=1610612736000 size=3833593856 hidden=1 parent=VHD-3e54aa80-56fc-463b-85c3-a0b446fc3d4f
         vhd=VHD-5df8ab6a-466b-408e-b99f-04161b89ac94 capacity=1610612736000 size=2176843776 hidden=1 parent=VHD-d1048ae8-23d5-4c5a-a996-756f1161cb05
            vhd=VHD-51b22a41-ae5a-43a3-9ed1-2e0486cefcbc capacity=1610612736000 size=1958739968 hidden=1 parent=VHD-5df8ab6a-466b-408e-b99f-04161b89ac94
               vhd=VHD-f91e2ea7-31a4-4ad3-8717-16c576871fd6 capacity=1610612736000 size=4575985664 hidden=1 parent=VHD-51b22a41-ae5a-43a3-9ed1-2e0486cefcbc
                  vhd=VHD-3d18906a-6151-4d7b-a41a-075726a1f796 capacity=1610612736000 size=4374659072 hidden=1 parent=VHD-f91e2ea7-31a4-4ad3-8717-16c576871fd6
                     vhd=VHD-efbc3f07-a356-489a-b385-e51595b7a4f6 capacity=1610612736000 size=3602907136 hidden=1 parent=VHD-3d18906a-6151-4d7b-a41a-075726a1f796
                        vhd=VHD-4ae743e6-1f55-4100-b456-a2732dc0b52b capacity=1610612736000 size=3133145088 hidden=1 parent=VHD-efbc3f07-a356-489a-b385-e51595b7a4f6
                           vhd=VHD-c533ad57-41f4-4489-aa5a-d714bf25327b capacity=1610612736000 size=5091885056 hidden=1 parent=VHD-4ae743e6-1f55-4100-b456-a2732dc0b52b
                              vhd=VHD-12a4f6c1-9b01-4c46-b919-0ef790e2ad95 capacity=1610612736000 size=11228151808 hidden=1 parent=VHD-c533ad57-41f4-4489-aa5a-d714bf25327b
                                 vhd=VHD-0e34b7b4-0282-484c-994b-eb5efdf87968 capacity=1610612736000 size=2248146944 hidden=1 parent=VHD-12a4f6c1-9b01-4c46-b919-0ef790e2ad95
                                    vhd=VHD-79ef3a62-79cc-4140-8e64-66c2a1984f9c capacity=1610612736000 size=1174405120 hidden=1 parent=VHD-0e34b7b4-0282-484c-994b-eb5efdf87968
                                       vhd=VHD-a15817ab-d081-452e-9018-ce922cdf7ced capacity=1610612736000 size=1887436800 hidden=1 parent=VHD-79ef3a62-79cc-4140-8e64-66c2a1984f9c
                                          vhd=VHD-4b4f26d5-12dd-4e10-b8e2-6ca254ed7269 capacity=1610612736000 size=1451229184 hidden=1 parent=VHD-a15817ab-d081-452e-9018-ce922cdf7ced
                                             vhd=VHD-7eaf3121-743e-45d4-ad5c-a2a618bfa99d capacity=1610612736000 size=3028287488 hidden=1 parent=VHD-4b4f26d5-12dd-4e10-b8e2-6ca254ed7269
                                                vhd=VHD-dce4a04b-079f-4af5-87c8-533651d89cc2 capacity=1610612736000 size=2277507072 hidden=1 parent=VHD-7eaf3121-743e-45d4-ad5c-a2a618bfa99d
                                                   vhd=VHD-5d3e7ad3-1287-41d4-8196-acd204ea5cad capacity=1610612736000 size=2151677952 hidden=1 parent=VHD-dce4a04b-079f-4af5-87c8-533651d89cc2
                                                      vhd=VHD-b6f16eb0-be02-41ea-93ea-790f2b5f9c0f capacity=1610612736000 size=960495616 hidden=1 parent=VHD-5d3e7ad3-1287-41d4-8196-acd204ea5cad
                                                         vhd=VHD-b468d184-c12b-4e28-95fd-7f23865264e4 capacity=1610612736000 size=1795162112 hidden=1 parent=VHD-b6f16eb0-be02-41ea-93ea-790f2b5f9c0f
                                                            vhd=VHD-b8a5de57-4d34-45c4-833a-6d641228af11 capacity=1610612736000 size=8388608 hidden=0 parent=VHD-b468d184-c12b-4e28-95fd-7f23865264e4
                                                            vhd=VHD-8418a22e-2f65-4ff1-840a-fd7af5ac80f3 capacity=1610612736000 size=2218786816 hidden=1 parent=VHD-b468d184-c12b-4e28-95fd-7f23865264e4
                                                               vhd=VHD-c3471e67-5a3b-4927-9087-3b1d9140c365 capacity=1610612736000 size=8388608 hidden=0 parent=VHD-8418a22e-2f65-4ff1-840a-fd7af5ac80f3
                                                               vhd=VHD-9bd0dcf3-63f5-4791-be0b-31a86edbdc03 capacity=1610612736000 size=3577741312 hidden=1 parent=VHD-8418a22e-2f65-4ff1-840a-fd7af5ac80f3
                                                                  vhd=VHD-31bbcde2-783d-4ad7-9909-fcec790f6868 capacity=1610612736000 size=8388608 hidden=0 parent=VHD-9bd0dcf3-63f5-4791-be0b-31a86edbdc03
                                                                  vhd=VHD-cb228b80-952e-4d90-803f-9cd75f893cec capacity=1610612736000 size=3212836864 hidden=1 parent=VHD-9bd0dcf3-63f5-4791-be0b-31a86edbdc03
                                                                     vhd=VHD-0d0aaa7e-f122-4185-9a4f-7b09b906271e capacity=1610612736000 size=8388608 hidden=0 parent=VHD-cb228b80-952e-4d90-803f-9cd75f893cec
                                                                     vhd=VHD-2a2e28b7-eb10-4f4b-b920-7e55394be59e capacity=1610612736000 size=3560964096 hidden=1 parent=VHD-cb228b80-952e-4d90-803f-9cd75f893cec
                                                                        vhd=VHD-55f66cff-66d2-4457-b268-09277c4fa5bd capacity=1610612736000 size=8388608 hidden=0 parent=VHD-2a2e28b7-eb10-4f4b-b920-7e55394be59e
                                                                        vhd=VHD-35186aa4-9d58-4db1-a264-106981b38d0a capacity=1610612736000 size=2503999488 hidden=1 parent=VHD-2a2e28b7-eb10-4f4b-b920-7e55394be59e
                                                                           vhd=VHD-5c9b39d7-a047-45d6-991d-4835207da997 capacity=1610612736000 size=8388608 hidden=0 parent=VHD-35186aa4-9d58-4db1-a264-106981b38d0a
                                                                           vhd=VHD-fb39d9fd-d52a-461b-859e-975cb26f6e76 capacity=1610612736000 size=2109734912 hidden=1 parent=VHD-35186aa4-9d58-4db1-a264-106981b38d0a
                                                                              vhd=VHD-ec32a1f3-8434-4b0e-8b53-17b8a691760d capacity=1610612736000 size=8388608 hidden=0 parent=VHD-fb39d9fd-d52a-461b-859e-975cb26f6e76
                                                                              vhd=VHD-a2024eb9-5313-47d8-b4cf-ce427930b66c capacity=1610612736000 size=4538236928 hidden=1 parent=VHD-fb39d9fd-d52a-461b-859e-975cb26f6e76
                                                                                 vhd=VHD-02071e4d-f864-4bd6-ae4d-550ad95f5e1c capacity=1610612736000 size=8388608 hidden=0 parent=VHD-a2024eb9-5313-47d8-b4cf-ce427930b66c
                                                                                 vhd=VHD-7854b21a-1f9c-4022-8ed5-12dc5504ee54 capacity=1610612736000 size=54525952 hidden=1 parent=VHD-a2024eb9-5313-47d8-b4cf-ce427930b66c
                                                                                    vhd=VHD-6dc5337e-ec13-403d-af1a-3a33ad8b005c capacity=1610612736000 size=8388608 hidden=0 parent=VHD-7854b21a-1f9c-4022-8ed5-12dc5504ee54
                                                                                    vhd=VHD-5ba87e1f-bea4-4895-849e-40093a9c0981 capacity=1610612736000 size=3988783104 hidden=1 parent=VHD-7854b21a-1f9c-4022-8ed5-12dc5504ee54
                                                                                       vhd=VHD-226d33b7-9bc7-4807-8055-0cf26868b700 capacity=1610612736000 size=8388608 hidden=0 parent=VHD-5ba87e1f-bea4-4895-849e-40093a9c0981
                                                                                       vhd=VHD-4795558e-7440-4ec3-be5f-2a546d15896b capacity=1610612736000 size=1613766852608 hidden=0 parent=VHD-5ba87e1f-bea4-4895-849e-40093a9c0981

For me it looks like since the problem with the one VDI began, the merging / snapshot coalescing for EVERY VM on that SR stopped and i believe that if i'm able to make XenServer forget about this problematic VDI , than the normal operations on that SR should resume...

 

Ove

 

Link to comment
  • 0

Hi!

 

Simply deleting older snapshots from the chain doesn't do the job.

Have one server with 10 daily snapshots complaining about "The snapshot chain is too long". After deleting the 9 oldest snapshots and only keeping the newest the situation did not change.

The VDI-uuid is 21d6805e-c196-4c5e-a6da-36b56a6a4ee7

The uuid of the latest-snapshot VDI is 24e32704-d41d-45e5-be14-5080a7ecade8

Here is the vhd-util-output:

vhd=VHD-0f144cdf-ccb2-4e04-905c-eebc1bcc785b capacity=53687091200 size=27514634240 hidden=1 parent=none
   vhd=VHD-dba0596b-f91d-432f-a3d6-c7e2373a4008 capacity=107374182400 size=15703474176 hidden=1 parent=VHD-0f144cdf-ccb2-4e04-905c-eebc1bcc785b
      vhd=VHD-68e1c31d-03ff-4088-a2eb-df7f1498b146 capacity=107374182400 size=931135488 hidden=1 parent=VHD-dba0596b-f91d-432f-a3d6-c7e2373a4008
         vhd=VHD-11047d24-d595-4ff1-ba95-ead8391d4bee capacity=107374182400 size=2738880512 hidden=1 parent=VHD-68e1c31d-03ff-4088-a2eb-df7f1498b146
            vhd=VHD-aabaa693-77f1-4303-8613-4f3125f4756b capacity=107374182400 size=3128950784 hidden=1 parent=VHD-11047d24-d595-4ff1-ba95-ead8391d4bee
               vhd=VHD-1b2cb57c-4008-495e-8d88-c24a135586ac capacity=107374182400 size=4659871744 hidden=1 parent=VHD-aabaa693-77f1-4303-8613-4f3125f4756b
                  vhd=VHD-3e419d54-3bb2-4735-b69d-ec6ca7d37fcf capacity=107374182400 size=1946157056 hidden=1 parent=VHD-1b2cb57c-4008-495e-8d88-c24a135586ac
                     vhd=VHD-2bbd65a8-c369-48b7-915c-e352bcb98418 capacity=107374182400 size=2923429888 hidden=1 parent=VHD-3e419d54-3bb2-4735-b69d-ec6ca7d37fcf
                        vhd=VHD-05489245-9c0f-4acd-a960-4e8d1759387c capacity=107374182400 size=1551892480 hidden=1 parent=VHD-2bbd65a8-c369-48b7-915c-e352bcb98418
                           vhd=VHD-7bd1394a-71ae-42c2-a1b7-6706343f8664 capacity=107374182400 size=994050048 hidden=1 parent=VHD-05489245-9c0f-4acd-a960-4e8d1759387c
                              vhd=VHD-31c6e832-1216-4adc-a2c7-9aff32fca49f capacity=107374182400 size=9097445376 hidden=1 parent=VHD-7bd1394a-71ae-42c2-a1b7-6706343f8664
                                 vhd=VHD-976df50f-3e5b-415a-a55b-2bcf846704a9 capacity=107374182400 size=3904897024 hidden=1 parent=VHD-31c6e832-1216-4adc-a2c7-9aff32fca49f
                                    vhd=VHD-1922ff27-ca62-4f58-af6c-850e208411bc capacity=107374182400 size=1031798784 hidden=1 parent=VHD-976df50f-3e5b-415a-a55b-2bcf846704a9
                                       vhd=VHD-bd0280ce-ad39-408a-b5a1-37ff3ca2f33a capacity=107374182400 size=947912704 hidden=1 parent=VHD-1922ff27-ca62-4f58-af6c-850e208411bc
                                          vhd=VHD-946beb52-a552-44f2-aa79-df4317fb7398 capacity=107374182400 size=968884224 hidden=1 parent=VHD-bd0280ce-ad39-408a-b5a1-37ff3ca2f33a
                                             vhd=VHD-67100a6b-73e8-45d0-a7fa-f5a637f43f0c capacity=107374182400 size=2210398208 hidden=1 parent=VHD-946beb52-a552-44f2-aa79-df4317fb7398
                                                vhd=VHD-8c1dd410-38c9-42a4-96ee-6e624463b890 capacity=107374182400 size=8388608 hidden=0 parent=VHD-67100a6b-73e8-45d0-a7fa-f5a637f43f0c
                                                vhd=VHD-e3e10956-055d-4863-ab20-e074dddc1e2d capacity=107374182400 size=176160768 hidden=1 parent=VHD-67100a6b-73e8-45d0-a7fa-f5a637f43f0c
                                                   vhd=VHD-74c0669b-2fa5-428f-822c-a580af1be9b2 capacity=107374182400 size=901775360 hidden=1 parent=VHD-e3e10956-055d-4863-ab20-e074dddc1e2d
                                                      vhd=VHD-6542c793-97a0-4c57-a82e-b465a054fa5b capacity=107374182400 size=947912704 hidden=1 parent=VHD-74c0669b-2fa5-428f-822c-a580af1be9b2
                                                         vhd=VHD-343d03d8-366f-477d-810b-a4ad0a054620 capacity=107374182400 size=1929379840 hidden=1 parent=VHD-6542c793-97a0-4c57-a82e-b465a054fa5b
                                                            vhd=VHD-42e940a8-8c3e-451c-8429-3e3e4c7dc7f4 capacity=107374182400 size=1509949440 hidden=1 parent=VHD-343d03d8-366f-477d-810b-a4ad0a054620
                                                               vhd=VHD-62df5faa-85cb-4156-8909-917d786e846a capacity=107374182400 size=964689920 hidden=1 parent=VHD-42e940a8-8c3e-451c-8429-3e3e4c7dc7f4
                                                                  vhd=VHD-2353f818-8007-4da7-b2da-1f3079c729e4 capacity=107374182400 size=2382364672 hidden=1 parent=VHD-62df5faa-85cb-4156-8909-917d786e846a
                                                                     vhd=VHD-c483ac06-b762-4728-9813-3574e8a874df capacity=107374182400 size=2554331136 hidden=1 parent=VHD-2353f818-8007-4da7-b2da-1f3079c729e4
                                                                        vhd=VHD-3d721b53-6ebd-4137-8435-c4d8a93b9388 capacity=107374182400 size=1161822208 hidden=1 parent=VHD-c483ac06-b762-4728-9813-3574e8a874df
                                                                           vhd=VHD-2efb6a00-3f1d-482e-bc30-4efb238ba19f capacity=107374182400 size=973078528 hidden=1 parent=VHD-3d721b53-6ebd-4137-8435-c4d8a93b9388
                                                                              vhd=VHD-16d763d4-9519-495b-8f26-bae9a89e9801 capacity=107374182400 size=2969567232 hidden=1 parent=VHD-2efb6a00-3f1d-482e-bc30-4efb238ba19f
                                                                                 vhd=VHD-dbda7973-6336-4ac9-aa48-fba2ce026d98 capacity=107374182400 size=8518631424 hidden=1 parent=VHD-16d763d4-9519-495b-8f26-bae9a89e9801
                                                                                    vhd=VHD-1f96ad73-0970-47ec-8a3c-70b3f4058cae capacity=107374182400 size=4471128064 hidden=1 parent=VHD-dbda7973-6336-4ac9-aa48-fba2ce026d98
                                                                                       vhd=VHD-21d6805e-c196-4c5e-a6da-36b56a6a4ee7 capacity=107374182400 size=107592286208 hidden=0 parent=VHD-1f96ad73-0970-47ec-8a3c-70b3f4058cae
                                                                                       vhd=VHD-24e32704-d41d-45e5-be14-5080a7ecade8 capacity=107374182400 size=8388608 hidden=0 parent=VHD-1f96ad73-0970-47ec-8a3c-70b3f4058cae

We see both uuid's at the bottom of the list.

The excerpt from the SMLog is attached as file. There we can see things like :

 

...
Jan 13 05:21:50 XS03 SM: [29446] vdi_snapshot {'sr_uuid': '6505febe-9166-9aaa-eb2b-33b7f980ec4c', 'subtask_of': 'DummyRef:|8e32a110-a66b-4454-a345-062735f7e195|VDI.snapshot', 'vdi_ref': 'OpaqueRef:ea8d06fd-d57f-486e-8004-e1770d5f553a', 'vdi_on_boot': 'persist', 'args': [], 'vdi_location': '21d6805e-c196-4c5e-a6da-36b56a6a4ee7', 'host_ref': 'OpaqueRef:b75a2f87-4513-4815-bfa2-3fe526a5f5df', 'session_ref': 'OpaqueRef:e8592df6-e2d7-47af-a069-4b400850ab40', 'device_config': {'target': '192.168.32.100', 'multihomelist': '192.168.32.100:3260,192.168.22.10:3260,192.168.21.10:3260,192.168.24.10:3260,10.20.30.25:3260,192.168.23.10:3260,192.168.33.100:3260', 'targetIQN': 'iqn.2019-08:jov1.target0', 'SRmaster': 'true', 'device': '/dev/disk/mpInuse/249754455466f3847', 'SCSIid': '249754455466f3847', 'port': '3260'}, 'command': 'vdi_snapshot', 'vdi_allow_caching': 'false', 'sr_ref': 'OpaqueRef:4b4c2468-5db6-491e-b450-d6b0e350e434', 'local_cache_sr': '5d0edab4-9a59-303c-838c-1db69a2e7820', 'driver_params': {'epochhint': 'a4f11c82-396a-6b08-f894-d0af0990e00a'}, 'vdi_uuid': '21d6805e-c196-4c5e-a6da-36b56a6a4ee7'}
...
Jan 13 05:21:51 XS03 SM: [29446] FAILED in util.pread: (rc 5) stdout: '', stderr: '  /run/lvm/lvmetad.socket: connect failed: No such file or directory
Jan 13 05:21:51 XS03 SM: [29446]   WARNING: Failed to connect to lvmetad. Falling back to internal scanning.
Jan 13 05:21:51 XS03 SM: [29446]   Failed to find logical volume "VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/21d6805e-c196-4c5e-a6da-36b56a6a4ee7.cbtlog"
Jan 13 05:21:51 XS03 SM: [29446] '
Jan 13 05:21:51 XS03 SM: [29446] Ignoring exception for LV check: /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/21d6805e-c196-4c5e-a6da-36b56a6a4ee7.cbtlog !
Jan 13 05:21:51 XS03 SM: [29446] Pause request for 21d6805e-c196-4c5e-a6da-36b56a6a4ee7
Jan 13 05:21:51 XS03 SM: [29446] Calling tap-pause on host OpaqueRef:b75a2f87-4513-4815-bfa2-3fe526a5f5df
Jan 13 05:21:51 XS03 SM: [29528] lock: opening lock file /var/lock/sm/21d6805e-c196-4c5e-a6da-36b56a6a4ee7/vdi
Jan 13 05:21:51 XS03 SM: [29528] lock: acquired /var/lock/sm/21d6805e-c196-4c5e-a6da-36b56a6a4ee7/vdi
Jan 13 05:21:51 XS03 SM: [29528] Pause for 21d6805e-c196-4c5e-a6da-36b56a6a4ee7
Jan 13 05:21:51 XS03 SM: [29528] Calling tap pause with minor 48
Jan 13 05:21:51 XS03 SM: [29528] ['/usr/sbin/tap-ctl', 'pause', '-p', '5350', '-m', '48']
Jan 13 05:21:51 XS03 SM: [29528]  = 0
Jan 13 05:21:51 XS03 SM: [29528] lock: released /var/lock/sm/21d6805e-c196-4c5e-a6da-36b56a6a4ee7/vdi
Jan 13 05:21:51 XS03 SM: [29528] lock: closed /var/lock/sm/21d6805e-c196-4c5e-a6da-36b56a6a4ee7/vdi
Jan 13 05:21:51 XS03 SM: [29446] LVHDVDI._snapshot for 21d6805e-c196-4c5e-a6da-36b56a6a4ee7 (type 2)
Jan 13 05:21:51 XS03 SM: [29446] lock: opening lock file /var/lock/sm/lvm-6505febe-9166-9aaa-eb2b-33b7f980ec4c/21d6805e-c196-4c5e-a6da-36b56a6a4ee7
Jan 13 05:21:51 XS03 SM: [29446] lock: acquired /var/lock/sm/lvm-6505febe-9166-9aaa-eb2b-33b7f980ec4c/21d6805e-c196-4c5e-a6da-36b56a6a4ee7
Jan 13 05:21:51 XS03 SM: [29446] Refcount for lvm-6505febe-9166-9aaa-eb2b-33b7f980ec4c:21d6805e-c196-4c5e-a6da-36b56a6a4ee7 (0, 1) + (1, 0) => (1, 1)
Jan 13 05:21:51 XS03 SM: [29446] Refcount for lvm-6505febe-9166-9aaa-eb2b-33b7f980ec4c:21d6805e-c196-4c5e-a6da-36b56a6a4ee7 set => (1, 1b)
Jan 13 05:21:51 XS03 SM: [29446] lock: released /var/lock/sm/lvm-6505febe-9166-9aaa-eb2b-33b7f980ec4c/21d6805e-c196-4c5e-a6da-36b56a6a4ee7
Jan 13 05:21:51 XS03 SM: [29446] lock: closed /var/lock/sm/lvm-6505febe-9166-9aaa-eb2b-33b7f980ec4c/21d6805e-c196-4c5e-a6da-36b56a6a4ee7
Jan 13 05:21:51 XS03 SM: [29446] ['/usr/bin/vhd-util', 'query', '--debug', '-vsf', '-n', '/dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-21d6805e-c196-4c5e-a6da-36b56a6a4ee7']
Jan 13 05:21:51 XS03 SM: [29446]   pread SUCCESS
Jan 13 05:21:51 XS03 SM: [29446] ['/usr/bin/vhd-util', 'scan', '-f', '-c', '-m', 'VHD-21d6805e-c196-4c5e-a6da-36b56a6a4ee7', '-l', 'VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c', '-a']
Jan 13 05:21:52 XS03 SM: [29446]   pread SUCCESS
Jan 13 05:21:52 XS03 SM: [29446] lock: opening lock file /var/lock/sm/lvm-6505febe-9166-9aaa-eb2b-33b7f980ec4c/bd0280ce-ad39-408a-b5a1-37ff3ca2f33a
Jan 13 05:21:52 XS03 SM: [29446] lock: acquired /var/lock/sm/lvm-6505febe-9166-9aaa-eb2b-33b7f980ec4c/bd0280ce-ad39-408a-b5a1-37ff3ca2f33a
...
Jan 13 05:21:52 XS03 SM: [29446] ['/usr/bin/vhd-util', 'query', '--debug', '-d', '-n', '/dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-21d6805e-c196-4c5e-a6da-36b56a6a4ee7']
Jan 13 05:21:52 XS03 SM: [29446]   pread SUCCESS
Jan 13 05:21:52 XS03 SM: [29446] Raising exception [109, The snapshot chain is too long]
Jan 13 05:21:52 XS03 SM: [29446] Unpause request for 21d6805e-c196-4c5e-a6da-36b56a6a4ee7 secondary=None
Jan 13 05:21:52 XS03 SM: [29446] Calling tap-unpause on host OpaqueRef:b75a2f87-4513-4815-bfa2-3fe526a5f5df
Jan 13 05:21:52 XS03 SM: [29557] lock: opening lock file /var/lock/sm/21d6805e-c196-4c5e-a6da-36b56a6a4ee7/vdi
Jan 13 05:21:52 XS03 SM: [29557] lock: acquired /var/lock/sm/21d6805e-c196-4c5e-a6da-36b56a6a4ee7/vdi
Jan 13 05:21:52 XS03 SM: [29557] Unpause for 21d6805e-c196-4c5e-a6da-36b56a6a4ee7
Jan 13 05:21:52 XS03 SM: [29557] Realpath: /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-21d6805e-c196-4c5e-a6da-36b56a6a4ee7
Jan 13 05:21:52 XS03 SM: [29557] Setting LVM_DEVICE to /dev/disk/by-scsid/249754455466f3847
Jan 13 05:21:53 XS03 SM: [29557] lock: opening lock file /var/lock/sm/6505febe-9166-9aaa-eb2b-33b7f980ec4c/sr
Jan 13 05:21:53 XS03 SM: [29557] LVMCache created for VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c
Jan 13 05:21:53 XS03 SM: [29557] ['/sbin/vgs', '--readonly', 'VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c']
Jan 13 05:21:53 XS03 SM: [29557]   pread SUCCESS
Jan 13 05:21:53 XS03 SM: [29557] Entering _checkMetadataVolume
Jan 13 05:21:53 XS03 SM: [29557] LVMCache: will initialize now
Jan 13 05:21:53 XS03 SM: [29557] LVMCache: refreshing
Jan 13 05:21:53 XS03 SM: [29557] ['/sbin/lvs', '--noheadings', '--units', 'b', '-o', '+lv_tags', '/dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c']
Jan 13 05:21:53 XS03 SM: [29557]   pread SUCCESS
Jan 13 05:21:53 XS03 SM: [29557] ['/sbin/lvs', '--noheadings', '/dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/21d6805e-c196-4c5e-a6da-36b56a6a4ee7.cbtlog']
Jan 13 05:21:53 XS03 SM: [29557] FAILED in util.pread: (rc 5) stdout: '', stderr: '  /run/lvm/lvmetad.socket: connect failed: No such file or directory
Jan 13 05:21:53 XS03 SM: [29557]   WARNING: Failed to connect to lvmetad. Falling back to internal scanning.
Jan 13 05:21:53 XS03 SM: [29557]   Failed to find logical volume "VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/21d6805e-c196-4c5e-a6da-36b56a6a4ee7.cbtlog"
Jan 13 05:21:53 XS03 SM: [29557] '
Jan 13 05:21:53 XS03 SM: [29557] Ignoring exception for LV check: /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/21d6805e-c196-4c5e-a6da-36b56a6a4ee7.cbtlog !
Jan 13 05:21:53 XS03 SM: [29557] Calling tap unpause with minor 48
Jan 13 05:21:54 XS03 SM: [29557] ['/usr/sbin/tap-ctl', 'unpause', '-p', '5350', '-m', '48', '-a', 'vhd:/dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-21d6805e-c196-4c5e-a6da-36b56a6a4ee7']
Jan 13 05:21:54 XS03 SM: [29557]  = 0
Jan 13 05:21:54 XS03 SM: [29557] lock: closed /var/lock/sm/6505febe-9166-9aaa-eb2b-33b7f980ec4c/sr
Jan 13 05:21:54 XS03 SM: [29557] lock: released /var/lock/sm/21d6805e-c196-4c5e-a6da-36b56a6a4ee7/vdi
Jan 13 05:21:54 XS03 SM: [29557] lock: closed /var/lock/sm/21d6805e-c196-4c5e-a6da-36b56a6a4ee7/vdi
Jan 13 05:21:54 XS03 SM: [29446] lock: released /var/lock/sm/6505febe-9166-9aaa-eb2b-33b7f980ec4c/sr
...

There should be plenty of space being available in the SR. But it seems that this ONE problematic VDI 043c7546-8e65-471a-b5be-e2b279634763 from the beginning of the thread is being the reason why every "cleaning or coalescing-task"  is stopped inside this SR, because I repeatly see the following being logged again over and over every 30 seconds inside /var/log/SMLog:

Jan 13 09:56:55 XS03 SM: [24528]   pread SUCCESS
Jan 13 09:56:55 XS03 SM: [24528] LVMCache: refreshing
Jan 13 09:56:55 XS03 SM: [24528] ['/sbin/lvs', '--noheadings', '--units', 'b', '-o', '+lv_tags', '/dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c']
Jan 13 09:56:56 XS03 SM: [24528]   pread SUCCESS
Jan 13 09:56:56 XS03 SM: [24528] ['/usr/bin/vhd-util', 'scan', '-f', '-c', '-m', 'VHD-*', '-l', 'VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c']
Jan 13 09:56:57 XS03 SM: [24528]   pread SUCCESS
Jan 13 09:56:57 XS03 SM: [24528] ***** VHD scan error: vhd=VHD-043c7546-8e65-471a-b5be-e2b279634763 scan-error=-22 error-message='invalid footer'
Jan 13 09:56:57 XS03 SM: [24528] *** vhd-scan error: 043c7546-8e65-471a-b5be-e2b279634763
Jan 13 09:56:57 XS03 SM: [24528] Raising exception [46, The VDI is not available [opterr=Error scanning VDI 043c7546-8e65-471a-b5be-e2b279634763]]
Jan 13 09:56:57 XS03 SM: [24528] lock: released /var/lock/sm/6505febe-9166-9aaa-eb2b-33b7f980ec4c/sr
Jan 13 09:56:57 XS03 SM: [24528] ***** generic exception: sr_scan: EXCEPTION <class 'SR.SROSError'>, The VDI is not available [opterr=Error scanning VDI 043c7546-8e65-471a-b5be-e2b279634763]
Jan 13 09:56:57 XS03 SM: [24528]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
Jan 13 09:56:57 XS03 SM: [24528]     return self._run_locked(sr)
Jan 13 09:56:57 XS03 SM: [24528]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
Jan 13 09:56:57 XS03 SM: [24528]     rv = self._run(sr, target)
Jan 13 09:56:57 XS03 SM: [24528]   File "/opt/xensource/sm/SRCommand.py", line 358, in _run
Jan 13 09:56:57 XS03 SM: [24528]     return sr.scan(self.params['sr_uuid'])
Jan 13 09:56:57 XS03 SM: [24528]   File "/opt/xensource/sm/LVMoISCSISR", line 535, in scan
Jan 13 09:56:57 XS03 SM: [24528]     LVHDSR.LVHDSR.scan(self, sr_uuid)
Jan 13 09:56:57 XS03 SM: [24528]   File "/opt/xensource/sm/LVHDSR.py", line 690, in scan
Jan 13 09:56:57 XS03 SM: [24528]     self._loadvdis()
Jan 13 09:56:57 XS03 SM: [24528]   File "/opt/xensource/sm/LVHDSR.py", line 875, in _loadvdis
Jan 13 09:56:57 XS03 SM: [24528]     opterr='Error scanning VDI %s' % uuid)
Jan 13 09:56:57 XS03 SM: [24528]
Jan 13 09:56:57 XS03 SM: [24528] ***** LVHD over iSCSI: EXCEPTION <class 'SR.SROSError'>, The VDI is not available [opterr=Error scanning VDI 043c7546-8e65-471a-b5be-e2b279634763]
Jan 13 09:56:57 XS03 SM: [24528]   File "/opt/xensource/sm/SRCommand.py", line 372, in run
Jan 13 09:56:57 XS03 SM: [24528]     ret = cmd.run(sr)
Jan 13 09:56:57 XS03 SM: [24528]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
Jan 13 09:56:57 XS03 SM: [24528]     return self._run_locked(sr)
Jan 13 09:56:57 XS03 SM: [24528]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
Jan 13 09:56:57 XS03 SM: [24528]     rv = self._run(sr, target)
Jan 13 09:56:57 XS03 SM: [24528]   File "/opt/xensource/sm/SRCommand.py", line 358, in _run
Jan 13 09:56:57 XS03 SM: [24528]     return sr.scan(self.params['sr_uuid'])
Jan 13 09:56:57 XS03 SM: [24528]   File "/opt/xensource/sm/LVMoISCSISR", line 535, in scan
Jan 13 09:56:57 XS03 SM: [24528]     LVHDSR.LVHDSR.scan(self, sr_uuid)
Jan 13 09:56:57 XS03 SM: [24528]   File "/opt/xensource/sm/LVHDSR.py", line 690, in scan
Jan 13 09:56:57 XS03 SM: [24528]     self._loadvdis()
Jan 13 09:56:57 XS03 SM: [24528]   File "/opt/xensource/sm/LVHDSR.py", line 875, in _loadvdis
Jan 13 09:56:57 XS03 SM: [24528]     opterr='Error scanning VDI %s' % uuid)
Jan 13 09:56:57 XS03 SM: [24528]
Jan 13 09:56:57 XS03 SM: [24528] lock: closed /var/lock/sm/6505febe-9166-9aaa-eb2b-33b7f980ec4c/sr

There should be a way to let XenServer forget about this VDI imho and not complaining about anymore...but at the moment i'm unsure how this should be accomplished...

 

Ove

SMLog.txt

Link to comment
  • 0

Hi!

 

Yes - i already tested this: the FULL COPY is an option.

Due to many VMs being involved (and the downtimes) i try to find out if there is an option to circumvent this offline full copy-vm jobs.

I would prefer to get the SR in a state that allows me to do an online storage-migration - that's what i'm asking about.

 

So i wonder how to convince XenServer to forget this VDI  043c7546-8e65-471a-b5be-e2b279634763 it is complaining about every 30 seconds in /var/log/SMLog ?!

 

Ove

Link to comment
  • 0

See if you can find an associate VBD for that VDI I take it it does not show up at all on the SR? Sometimes you can improve things by migrating VMs to a different SR and clening up or even re-initializing the SR to get rid of any remnants. It can be a long process, alas. Export/import has of course the disadvantage of requiring the VMs to be shut down as part of the process.

 

-=Tobias

Link to comment
  • 0

Hi Tobias,

 

i cannot find an associated VBD for the VDI 043c7546-8e65-471a-b5be-e2b279634763 on neither xen-host with "vbd-list".

For me it looks like Xen THINKS the VDI 043c7546-8e65-471a-b5be-e2b279634763 still has to be available but it didn't find it in the SR:

Of course also in "vdi-list" there is no entry for that VDI anymore...

/var/log/SMLog shows:

Jan 13 17:09:22 XS03 SM: [5873] *** vhd-scan error: 043c7546-8e65-471a-b5be-e2b279634763
...
Jan 13 17:09:22 XS03 SM: [5873] Raising exception [46, The VDI is not available [opterr=Error scanning VDI 043c7546-8e65-471a-b5be-e2b279634763]]
...
Jan 13 17:09:22 XS03 SM: [5873] ***** generic exception: sr_scan: EXCEPTION <class 'SR.SROSError'>, The VDI is not available [opterr=Error scanning VDI 043c7546-8e65-471a-b5be-e2b279634763]
...
Jan 13 17:09:22 XS03 SM: [5873] ***** LVHD over iSCSI: EXCEPTION <class 'SR.SROSError'>, The VDI is not available [opterr=Error scanning VDI 043c7546-8e65-471a-b5be-e2b279634763]

Why is Xen trying to find this VDI in the SR every 30 seconds ?

Do you know an option how to completely let Xen forget about this VDI ?

 

Ove

 

Link to comment
  • 0

You make have to try to move your VMs' storage to another SR to get this resolved. If the SR gets too full, it can no longer coalesce and therefore it's nearly impossible to clean things up in this state. Is there maybe a remnant VM that shows not having shut down proeprly even if it no longer exists? Perhaps run something like "xe vm-list params=powerstate" and see if something shows up in an odd state.

Link to comment
  • 0

Looks like your SR is not using all the physical space, but is definitely allocated well above in terms of virtual storage.

 

I guess I'd next look at lvdisplay to see if you can associate the defunct VDI with the stoarage location. I take it the VM that that VDI was not on doesn't show up at all now, correct? You've already tried "xe vdi-destroy uuid=..." correct?

Link to comment
  • 0

Hi!

 

the VM that VDI was on does not exist anymore. Also the VDI is not available under Xen anymore.

lvscan and lvdisplay still show the logical volume.

[root@XS03 ~]# lvscan|grep 043
  /run/lvm/lvmetad.socket: connect failed: No such file or directory
  WARNING: Failed to connect to lvmetad. Falling back to internal scanning.
  inactive          '/dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-043c7546-8e65-471a-b5be-e2b279634763' [500.98 GiB] inherit

[root@XS03 ~]# lvdisplay /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-043c7546-8e65-471a-b5be-e2b279634763
  /run/lvm/lvmetad.socket: connect failed: No such file or directory
  WARNING: Failed to connect to lvmetad. Falling back to internal scanning.
  --- Logical volume ---
  LV Path                /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-043c7546-8e65-471a-b5be-e2b279634763
  LV Name                VHD-043c7546-8e65-471a-b5be-e2b279634763
  VG Name                VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c
  LV UUID                Fl1AEd-5y6A-Lf5f-Babk-YTi7-fmhh-RcoVr6
  LV Write Access        read/write
  LV Creation host, time XS03, 2019-12-29 05:21:43 +0100
  LV Status              NOT available
  LV Size                500.98 GiB
  Current LE             128252
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  
[root@XS03 ~]# xe vdi-destroy uuid=043c7546-8e65-471a-b5be-e2b279634763
The uuid you supplied was invalid.
type: VDI
uuid: 043c7546-8e65-471a-b5be-e2b279634763

Ove

Link to comment
  • 0

Hi!

Offline COPY VM should be that last solution. Still hoping to be able to get the SR in a state where coalesing of snapshots begins to work again.

Will a "lvremove" help may be ?

Normally when deleting a VDI also the LV is removed , but for my VDI  043c7546-8e65-471a-b5be-e2b279634763 (which is deleted) the LV /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-043c7546-8e65-471a-b5be-e2b279634763 still exists...

 

Ove

Link to comment
  • 0

[root@XS03 ~]# lvremove /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-043c7546-8e65-471a-b5be-e2b279634763
  /run/lvm/lvmetad.socket: connect failed: No such file or directory
  WARNING: Failed to connect to lvmetad. Falling back to internal scanning.
  lvremove /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-043c7546-8e65-471a-b5be-e2b279634763: Command not permitted while global/metadata_read_only is set.
[root@XS03 ~]#
 

 

Link to comment
  • 0
29 minutes ago, Ove Starckjohann said:

[root@XS03 ~]# lvremove /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-043c7546-8e65-471a-b5be-e2b279634763
  /run/lvm/lvmetad.socket: connect failed: No such file or directory
  WARNING: Failed to connect to lvmetad. Falling back to internal scanning.
  lvremove /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-043c7546-8e65-471a-b5be-e2b279634763: Command not permitted while global/metadata_read_only is set.
[root@XS03 ~]#
 

Hi!

You need put this lvremove /dev/VG_XenStorage-XXX/VHD-XXX --config global{metadata_read_only=0} only if you sure  :)

 

29 minutes ago, 45dewd said:

 

 

Link to comment
  • 0

Hi!

 

did it:

[root@XS03 ~]# lvremove /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-043c7546-8e65-471a-b5be-e2b279634763 --config global{metadata_read_only=0}
  /run/lvm/lvmetad.socket: connect failed: No such file or directory
  WARNING: Failed to connect to lvmetad. Falling back to internal scanning.
  Logical volume "VHD-043c7546-8e65-471a-b5be-e2b279634763" successfully removed
[root@XS03 ~]#

 

 

Now see the following in /var/log/SMLog:
Jan 15 20:59:08 XS03 SM: [4106] Raising exception [40, The SR scan failed  [opterr=Command ['/sbin/lvchange', '-ay', '/dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-043c7546-8e65-471a-b5be-e2b279634763'] failed (/run/lvm/lvmetad.socket: connect failed: No such file or directory
Jan 15 20:59:08 XS03 SM: [4106]   WARNING: Failed to connect to lvmetad. Falling back to internal scanning.
Jan 15 20:59:08 XS03 SM: [4106]   Failed to find logical volume "VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-043c7546-8e65-471a-b5be-e2b279634763"): Input/output error]]
Jan 15 20:59:08 XS03 SM: [4106] ***** LVHD over iSCSI: EXCEPTION <class 'SR.SROSError'>, The SR scan failed  [opterr=Command ['/sbin/lvchange', '-ay', '/dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-043c7546-8e65-471a-b5be-e2b279634763'] failed (/run/lvm/lvmetad.socket: connect failed: No such file or directory
Jan 15 20:59:08 XS03 SM: [4106]   WARNING: Failed to connect to lvmetad. Falling back to internal scanning.
Jan 15 20:59:08 XS03 SM: [4106]   Failed to find logical volume "VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/VHD-043c7546-8e65-471a-b5be-e2b279634763"): Input/output error]

 

According to https://discussions.citrix.com/topic/389477-sr_backend_failure_40-failed-to-find-logical-volume-vg_xenstorage/

i plan do use the following command:

lvrename /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/MGT /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/MGT.old --config global{metadata_read_only=0}
 

But i'm really unsure if there will be any danger for the SR ?!

 

Ove

 

Link to comment
  • 0

I was similar problem in Xen 7.1CU2

 

first need release and delete all VDI/VBD orphane. Sometime only need move VM from one SR to another. If detect a orphane VHD without VDI/VDB then deleted with lvremove.

 

In my case when this is happend, move VM, most time work and orphane VHD was deleted with lvremove if not have vdi/vbd associated. Exist a thread about this.

 

lvremove is the last command in the chain to apply.

 

In your case move data to another SR and clean the old SR if is possible.

Link to comment
  • 0

my plan is to move all VMs (VDIs) to another SR. But at the moment i'm doing this only offline via COPY VM after a VM is shutdown.

Due to the fact, that there are also "more important" VMs i wonder if i'll be able to live-migrate them when up and running. In a test with one VM this failed with the error "VDI is not available" but afterwards an OFFLINE COPY VM succeeded.

 

So i would be happy if someone may give me an answer to the question:

Is it safe to issue the the following command?

 

lvrename /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/MGT /dev/VG_XenStorage-6505febe-9166-9aaa-eb2b-33b7f980ec4c/MGT.old --config global{metadata_read_only=0}

 

Ove

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...