Jump to content
Welcome to our new Citrix community!
  • 0

Garbage Collection Problem - VHD Corrupted


Usuario Sistemas

Question

Hello,

 

From 04/21/2021 we are reporting a problem with the Garbage Collection process.  The process shows us the following error:

 

Apr 23 14:20:35 xxx SMGC: [13659] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Apr 23 14:20:35 xxx SMGC: [13659]          ***********************
Apr 23 14:20:35 xxx SMGC: [13659]          *  E X C E P T I O N  *
Apr 23 14:20:35 xxx SMGC: [13659]          ***********************
Apr 23 14:20:35 xxx SMGC: [13659] coalesce: EXCEPTION <class 'util.SMException'>, VHD ee40bf51[VHD](50.000G//2.898G|a) corrupted
Apr 23 14:20:35 xxx SMGC: [13659]   File "/opt/xensource/sm/cleanup.py", line 1542, in coalesce
Apr 23 14:20:35 xxx SMGC: [13659]     self._coalesce(vdi)
Apr 23 14:20:35 xxx SMGC: [13659]   File "/opt/xensource/sm/cleanup.py", line 1732, in _coalesce
Apr 23 14:20:35 xxx SMGC: [13659]     vdi._doCoalesce()
Apr 23 14:20:35 xxx SMGC: [13659]   File "/opt/xensource/sm/cleanup.py", line 1179, in _doCoalesce
Apr 23 14:20:35 xxx SMGC: [13659]     self.parent.validate()
Apr 23 14:20:35 xxx SMGC: [13659]   File "/opt/xensource/sm/cleanup.py", line 1172, in validate
Apr 23 14:20:35 xxx SMGC: [13659]     VDI.validate(self, fast)
Apr 23 14:20:35 xxx SMGC: [13659]   File "/opt/xensource/sm/cleanup.py", line 673, in validate
Apr 23 14:20:35 xxx SMGC: [13659]     raise util.SMException("VHD %s corrupted" % self)
Apr 23 14:20:35 xxx SMGC: [13659]
Apr 23 14:20:35 xxx SMGC: [13659] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*

 

As you can see, they say that the VHD ee40bf51 is corrupted. I read this article -> https://support.citrix.com/article/CTX201296 but I can't resolve the problem. If anyone would analyse the code, I attach the entire log.

 

I do this test:

 

Find the VHD path:

 

[root@xxx ~]# lvdisplay  | grep -B2 -A11 "ee40bf51"

  --- Logical volume ---
  LV Path                /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0
  LV Name                VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0
  VG Name                VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd
  LV UUID                IJe1sI-ia1U-jzK1-dgZ7-SEXj-uinb-6lR4LX
  LV Write Access        read only
  LV Creation host, time ,
  LV Status              NOT available
  LV Size                2.90 GiB
  Current LE             742
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto

Check the path on all of the hosts:

 

[root@xxx ~]# ls -l /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0
ls: cannot access /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0: No such file or directory


[root@yyy ~]# ls -l /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0
ls: cannot access /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0: No such file or directory


[root@zzz ~]# ls -l /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0
ls: cannot access /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0: No such file or directory


[root@aaa ~]# ls -l /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0
ls: cannot access /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0: No such file or directory


Check if VHD is active or not:

 

[root@xxx ~]# lvscan | grep "ee40bf51"
  inactive          '/dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0' [2.90 GiB] inherit

 

Try to repair the volume:

 

[root@xxx ~]# vhd-util repair -n /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0
error opening /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0: -2

 

Check if volume is associated to any VM:

 

[root@xxx ~]# xe vbd-list vdi-uuid=ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0
[root@xxx ~]#

Do a vdi-forget:

 

[root@xxx ~]# xe vdi-forget uuid=ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0

 

Search more information about the volume:

 

[root@xxx ~]#  xe vdi-param-list uuid=ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0
uuid ( RO)                    : ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0
              name-label ( RW): base copy
        name-description ( RW):
           is-a-snapshot ( RO): false
             snapshot-of ( RO): <not in database>
               snapshots ( RO):
           snapshot-time ( RO): 19700101T00:00:00Z
      allowed-operations (SRO): generate_config; force_unlock; update; forget; destroy; snapshot; resize; copy; clone
      current-operations (SRO):
                 sr-uuid ( RO): 1189b3ac-a362-3dd1-e657-4f11794238dd
           sr-name-label ( RO): PS405_DCP
               vbd-uuids (SRO):
         crashdump-uuids (SRO):
            virtual-size ( RO): 53687091200
    physical-utilisation ( RO): 3112173568
                location ( RO): ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0
                    type ( RO): User
                sharable ( RO): false
               read-only ( RO): true
            storage-lock ( RO): false
                 managed ( RO): false
     parent ( RO) [DEPRECATED]: <not in database>
                 missing ( RO): false
            is-tools-iso ( RO): false
            other-config (MRW):
           xenstore-data (MRO):
               sm-config (MRO): vhd-blocks: eJztlEs....NGVLUI; vhd-parent: 4d5f02ca-46ac-4a5c-ab77-940362440fc3; vdi_type: vhd
                 on-boot ( RW): persist
           allow-caching ( RW): false
         metadata-latest ( RO): false
        metadata-of-pool ( RO): <not in database>
                    tags (SRW):
             cbt-enabled ( RO): false

Here I see the "vhd-parent" -> 4d5f02ca-46ac-4a5c-ab77-940362440fc3

 

Then, I check if VHD parent is active or not and see again the path (no results again): 

 

[root@xxx ~]# lvscan | grep "4d5f02ca"
  inactive          '/dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-4d5f02ca-46ac-4a5c-ab77-940362440fc3' [1.55 GiB] inherit

And I do again all the test, but no result...


Finally, looking again on the logs, I read this sentence:

 

Apr 23 14:20:35 xxx SM: [19295] Failed to lock /var/lock/sm/mpathcount1/host on first attempt, blocked by PID 19232

But when I looking for the PID, nothing is returned:

 

[root@xxx ~]# ps -p 19232
  PID TTY          TIME CMD

 

I don't know how to do with these volumes. I don't try to do a lvremove, but I think it would not work, since according to the system the volume does not exist. And before doing this I want to make sure I don't delete any virtual machines or client snapshots.

 

XCP Version:

 

[root@xxx ~]# cat /etc/redhat-release
XCP-ng release 7.5.0 (xenenterprise)

 

 

Thank you in advance.

 

Regards,

David.

logs_exception.txt

Link to comment

8 answers to this question

Recommended Posts

  • 0

Hello Tobias and thank you for the reply.

 

I didn't try to reactive the volume, because I assumed that if the volume is inactive is for some reason (like deleted snapshot or deleted vm). 

 

I searched under the /etc/lvm/backup and I don't see the volume: 

 

[root@xxx ~]# ls -l /etc/lvm/backup
total 100
-rw------- 1 root root  1403 May 27  2020 VG_XenStorage-06ee4789-8c9a-6489-ff1b-85dd39aeb645
-rw------- 1 root root 87743 Apr 23 17:05 VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd
-rw------- 1 root root  1838 Mar 29 12:27 VG_XenStorage-413867b7-fe7a-66a7-3c93-6af274a59280

The second line is referenced to the UUID of the main storage
 

Under /etc/lvm/archive there's nothing:

 

[root@xxx ~]# ls -l  /etc/lvm/archive
total 0


Thank you,

 

Sincerely,

David.

Link to comment
  • 0

Hello, 

 

Now the SMLog say:

 

Apr 26 09:01:27 xxx SMGC: [32260] SR 1189 ('PS405_DCP') (156 VDIs in 114 VHD trees): no changes
Apr 26 09:01:27 xxx SMGC: [32260] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Apr 26 09:01:27 xxx SMGC: [32260]          ***********************
Apr 26 09:01:27 xxx SMGC: [32260]          *  E X C E P T I O N  *
Apr 26 09:01:27 xxx SMGC: [32260]          ***********************
Apr 26 09:01:27 xxx SMGC: [32260] gc: EXCEPTION <class 'XenAPI.Failure'>, ['UUID_INVALID', 'VDI', 'ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0']
Apr 26 09:01:27 xxx SMGC: [32260]   File "/opt/xensource/sm/cleanup.py", line 2712, in gc
Apr 26 09:01:27 xxx SMGC: [32260]     _gc(None, srUuid, dryRun)
Apr 26 09:01:27 xxx SMGC: [32260]   File "/opt/xensource/sm/cleanup.py", line 2597, in _gc
Apr 26 09:01:27 xxx SMGC: [32260]     _gcLoop(sr, dryRun)
Apr 26 09:01:27 xxx SMGC: [32260]   File "/opt/xensource/sm/cleanup.py", line 2561, in _gcLoop
Apr 26 09:01:27 xxx SMGC: [32260]     sr.updateBlockInfo()
Apr 26 09:01:27 xxx SMGC: [32260]   File "/opt/xensource/sm/cleanup.py", line 2239, in updateBlockInfo
Apr 26 09:01:27 xxx SMGC: [32260]     if not vdi.getConfig(vdi.DB_VHD_BLOCKS):
Apr 26 09:01:27 xxx SMGC: [32260]   File "/opt/xensource/sm/cleanup.py", line 477, in getConfig
Apr 26 09:01:27 xxx SMGC: [32260]     config = self.sr.xapi.getConfigVDI(self, key)
Apr 26 09:01:27 xxx SMGC: [32260]   File "/opt/xensource/sm/cleanup.py", line 341, in getConfigVDI
Apr 26 09:01:27 xxx SMGC: [32260]     cfg = self.session.xenapi.VDI.get_sm_config(vdi.getRef())
Apr 26 09:01:27 xxx SMGC: [32260]   File "/opt/xensource/sm/cleanup.py", line 473, in getRef
Apr 26 09:01:27 xxx SMGC: [32260]     self._vdiRef = self.sr.xapi.getRefVDI(self)
Apr 26 09:01:27 xxx SMGC: [32260]   File "/opt/xensource/sm/cleanup.py", line 316, in getRefVDI
Apr 26 09:01:27 xxx SMGC: [32260]     return self._getRefVDI(vdi.uuid)
Apr 26 09:01:27 xxx SMGC: [32260]   File "/opt/xensource/sm/cleanup.py", line 313, in _getRefVDI
Apr 26 09:01:27 xxx SMGC: [32260]     return self.session.xenapi.VDI.get_by_uuid(uuid)
Apr 26 09:01:27 xxx SMGC: [32260]   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__
Apr 26 09:01:27 xxx SMGC: [32260]     return self.__send(self.__name, args)
Apr 26 09:01:27 xxx SMGC: [32260]   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request
Apr 26 09:01:27 xxx SMGC: [32260]     result = _parse_result(getattr(self, methodname)(*full_params))
Apr 26 09:01:27 xxx SMGC: [32260]   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
Apr 26 09:01:27 xxx SMGC: [32260]     raise Failure(result['ErrorDescription'])
Apr 26 09:01:27 xxx SMGC: [32260]
Apr 26 09:01:27 xxx SMGC: [32260] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Apr 26 09:01:27 xxx SMGC: [32260] * * * * * SR 1189b3ac-a362-3dd1-e657-4f11794238dd: ERROR
Apr 26 09:01:27 xxx SMGC: [32260]


The UUID is the same that is supposedly corrupted.

 

Thanks in advance.

 

David.

Link to comment
  • 0

Hello Tobias,

 

I answer your questions:

 

3 hours ago, Tobias Kreidl said:

Do you see the volume mounted, or I assume still not?

 

 

The volume appears "INACTIVE" as previously:

 

[root@xxx backup]# lvscan | grep "ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0"
  inactive          '/dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0' [2.90 GiB] inherit


If I search this path, system say that the path don't exists:

 

[root@xxx ~]# ls -l /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0
ls: cannot access /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0: No such file or directory

 

3 hours ago, Tobias Kreidl said:

You could try to restore from the backup folder, using the most recent (April of this year) if you believe it to be current enough. I'd made sure all VMs are properly backed up, first, before taking such a drastic step.


About to restore the volume from the backup folder, the most recent backup is from April 26th:

 

[root@xxx backup]# ls -la
total 108
drwx------ 2 root root  4096 Apr 26 08:57 .
drwxr-xr-x 7 root root  4096 Mar 25  2019 ..
-rw------- 1 root root  1403 May 27  2020 VG_XenStorage-06ee4789-8c9a-6489-ff1b-85dd39aeb645
-rw------- 1 root root 90077 Apr 26 08:57 VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd
-rw------- 1 root root  1838 Mar 29 12:27 VG_XenStorage-413867b7-fe7a-66a7-3c93-6af274a59280

 

However, I think that even restoring the copy the problem will persist since the copy is very recent. 

 

In this host they are only 15 machines, I can move them to other host (like yyy , or zzz or aaa) and then restore the backup. But this is a big change and I would like to contemplate all the possibilites. 

 

Do you have any other idea?  Such delete the volume (I don't' think if it works either because the system can't find them). 

 

Thank you again,

 

Regards.

David.

Link to comment
  • 0
2 hours ago, Tobias Kreidl said:

At this point, do you just want to get rid of the volume (meaning you don't care about trying to salvage any of the content)?

 

Hello Tobias,

 

I want to delete the volume , but I don't want to make affectation. This is a production server and there are machines on them. 

 

Is there any way to check if the volume is associated to any disk_machine / snapshot or whatever?

 

Thank you again for your time,

 

Best regards,

David.

Link to comment
  • 0

The issue is of  the chicken-and-egg variety: you probably cannot delete the volume until the SR associated with it is destroyed. Have you tried that?My guess is that willnot work because the LVM isn't active and it cannot easily be activated if it's corrupted.

I would think as long as you are sure of the LVM/VG ID, a vgremove plus an lvmemove ought to work. You may need to clean up some of the SR inks afterwards in the /dev directory area. You might want to deactivate the volume with lvchange first. See: https://www.thegeekdiary.com/centos-rhel-how-to-delete-lvm-volume/

See also: https://support.hpe.com/hpesc/public/docDisplay?docId=kc0129469en_us&docLocale=en_US

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...