Jump to content
Welcome to our new Citrix community!
  • 0

XenServer 7.6 VMs not booting, possible corruption


Jonathan Kosakowski

Question

Hello, had to hard reboot our xen server yesterday and after that we aren't able to get any of our VMs to start. It looks like they start to boot and then the console is blank, I did have one that started booting Windows server but then just hung. We have one server and one NAS with iSCSI to store VMs on. I've spent most of the day yesterday trying to figure this out, vhd-util check shows bad checksum and I don't know how to repair these. Any help would be greatly appreciated.

 

TIA

 

vhd-util check -n /dev/VG_XenStorage-5108a519-bf75-1c9d-528c-9d53d17f0d07/
MGT                                       VHD-911ae2c3-d21e-477b-8362-788b66d820f9  VHD-dc1a29ab-f673-463f-824b-d1d4db5af0fa
VHD-18da7c03-0b76-4132-8f39-e9267f006f67  VHD-98afdb47-a509-48c0-b441-3e9bbd64331a  VHD-f2aa5666-4ea0-4be6-a88b-e418910901e1
VHD-7f0316e3-603d-4baa-99e3-2e45729249d4  VHD-b6195ef7-1064-40be-ac5a-2e0599b056b1  VHD-fe6292ef-5e6e-4fe9-a6fb-b3004e8ef57f
VHD-873d2082-657f-485a-804b-e86265c911bf  VHD-d9ef36ac-cdd7-4461-987f-c1ee82fb0dc8
[root@xenserver-icsc mapper]#  vhd-util check -n /dev/VG_XenStorage-5108a519-bf75-1c9d-528c-9d53d17f0d07/MGT
primary footer invalid: invalid cookie
/dev/VG_XenStorage-5108a519-bf75-1c9d-528c-9d53d17f0d07/MGT appears invalid; dumping metadata
Failed to open /dev/VG_XenStorage-5108a519-bf75-1c9d-528c-9d53d17f0d07/MGT: -22

/dev/VG_XenStorage-5108a519-bf75-1c9d-528c-9d53d17f0d07/MGT appears invalid; dumping headers

VHD Footer Summary:
-------------------
Cookie              : XSSM:224
Features            : (0x32353620)
File format version : Major: 8224, Minor: 8250
Data offset         : 35472027700670
Timestamp           : Sat Jan 28 22:38:08 2017
Creator Application : '    '
Creator version     : Major: 8224, Minor: 8224
Creator OS          : Unknown!
Original disk size  : 2207646876162 MB (23148855308184 Bytes)
Current disk size   : 2207646876162 MB (23148855308184 Bytes)
Geometry            : Cyl: 8224, Hds: 32, Sctrs: 32
                    : = 4112 MB (4311744512 Bytes)
Disk type           : Unknown type!

Checksum            : 0x20202020|0xffffbece (Bad!)
UUID                : 20202020-2020-2020-2020-202020202020
Saved state         : Yes
Hidden              : 32

VHD Header Summary:
-------------------
Cookie              :
Data offset (unusd) : 0
Table offset        : 0
Header version      : 0x00000000
Max BAT size        : 0
Block size          : 0 (0 MB)
Parent name         :
Parent UUID         : 00000000-0000-0000-0000-000000000000
Parent timestamp    : Fri Dec 31 19:00:00 1999
Checksum            : 0x0|0xffffffff (Bad!)
 

 

Link to comment

4 answers to this question

Recommended Posts

  • 0

The MGT volume is not a VDI or even VHD formatted. It is the metadata database for the SR and it is therefore not expected that a vhd-util check of this volume will succeed.

 

If you have support please raise this with your support representative.

 

Otherwise you will need to look in the logs in /var/log for errors around the time that you tried to start the VMs.

Link to comment
  • 0

I am seeing a lot of Failed to lock /var/locl/sm/xxxxxxx/sr on first attempt, blocked by PID xxxxx in the SMlog

 

Nov 26 10:55:25 xenserver-icsc SM: [26318] PATHDICT: key 192.168.0.228:3260: {'path': '/dev/iscsi/iqn.2004-08.jp.buffalo.106f3f173204.nas/192.168.0.228:3260', 'ipaddr': '192.168.0.228', 'port': 3260L}
Nov 26 10:55:25 xenserver-icsc SM: [26318] lock: opening lock file /var/lock/sm/iscsiadm/running
Nov 26 10:55:25 xenserver-icsc SM: [26318] lock: acquired /var/lock/sm/iscsiadm/running
Nov 26 10:55:25 xenserver-icsc SM: [26318] lock: released /var/lock/sm/iscsiadm/running
Nov 26 10:55:25 xenserver-icsc SM: [26318] lock: closed /var/lock/sm/iscsiadm/running
Nov 26 10:55:25 xenserver-icsc SM: [26318] path /dev/iscsi/iqn.2004-08.jp.buffalo.106f3f173204.nas/192.168.0.228:3260
Nov 26 10:55:25 xenserver-icsc SM: [26318] iscsci data: targetIQN iqn.2004-08.jp.buffalo.106f3f173204.nas, portal 192.168.0.228
Nov 26 10:55:25 xenserver-icsc SM: [26318] lock: opening lock file /var/lock/sm/iscsiadm/running
Nov 26 10:55:25 xenserver-icsc SM: [26318] lock: acquired /var/lock/sm/iscsiadm/running
Nov 26 10:55:25 xenserver-icsc SM: [26318] lock: released /var/lock/sm/iscsiadm/running
Nov 26 10:55:25 xenserver-icsc SM: [26318] lock: closed /var/lock/sm/iscsiadm/running
Nov 26 10:55:25 xenserver-icsc SM: [26318] ['ls', '/dev/iscsi/iqn.2004-08.jp.buffalo.106f3f173204.nas/192.168.0.228:3260', '-1', '--color=never']
Nov 26 10:55:25 xenserver-icsc SM: [26318]   pread SUCCESS
Nov 26 10:55:25 xenserver-icsc SM: [26318] getSCSIid: fixing invalid input sdb
Nov 26 10:55:25 xenserver-icsc SM: [26318] ['/usr/lib/udev/scsi_id', '-g', '--device', '/dev/sdb']
Nov 26 10:55:25 xenserver-icsc SM: [26318]   pread SUCCESS
Nov 26 10:55:25 xenserver-icsc SM: [26318] dev from lun sdb 36001405ca9dd939510a49ef8559d8c39
Nov 26 10:55:25 xenserver-icsc SM: [26318] lun match in /dev/iscsi/iqn.2004-08.jp.buffalo.106f3f173204.nas/192.168.0.228:3260
Nov 26 10:55:25 xenserver-icsc SM: [26318] lock: opening lock file /var/lock/sm/5108a519-bf75-1c9d-528c-9d53d17f0d07/sr
Nov 26 10:55:25 xenserver-icsc SM: [26318] LVMCache created for VG_XenStorage-5108a519-bf75-1c9d-528c-9d53d17f0d07
Nov 26 10:55:25 xenserver-icsc SM: [26318] ['/sbin/vgs', '--readonly', 'VG_XenStorage-5108a519-bf75-1c9d-528c-9d53d17f0d07']
Nov 26 10:55:25 xenserver-icsc SM: [26318]   pread SUCCESS
Nov 26 10:55:25 xenserver-icsc SM: [26318] Failed to lock /var/lock/sm/5108a519-bf75-1c9d-528c-9d53d17f0d07/sr on first attempt, blocked by PID 14558

Link to comment
  • 0

Finding what PID 14558 is would help, I suspect it may be the SR garbage collector but that should not hold the lock for long periods. The message in and of itself is not concerning, it just means the locks are doing what they're supposed to do, i.e. prevent two independent parts of the system from modifying the SR at the same time.

Link to comment
  • 0

Looks like there were two processes running this same command. I've killed both and the SR scan completed for the iSCSI, it's still scanning the local storage.

 

root     14558  0.0  0.0 127844 17712 ?        Ss   Nov25   0:00 /usr/bin/python /opt/xensource/sm/LVMoISCSISR <methodCall><methodName>vdi_snapshot</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:8

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...