Jump to content
Welcome to our new Citrix community!
  • 0

CH8.0 hosts take > 30min to shutdown


Jacob Degeling

Question

O Venerable Ones, hear me. I need your knowledge and wisdom.

 

My initial thoughts are that this is related to iSCSI. It could be related to something else, but not sure how to test that.

 

I had tried MS iSCSI on XS7.6 or thereabouts, which didn't work and I remember discovering that it wasn't supported. Later on, I remember seeing somewhere that CH8.0 was going to support MS iSCSI (can't find this now) so as soon as I upgraded I created a 10TB MS iSCSI SR and it was all working so I thought it was very good.

 

The first time I had to reboot some hosts after doing this, I noticed that they were taking a very long time to shutdown. I wasn't in a rush so I waited. It was more than 30 minutes for a reboot. Then recently we had a planned power outage. Cutover to a generator wasn't supposed to take more time than our UPSs could supply, but we were forced to shutdown servers when it took longer than expected. When they weren't going to finish shutting down on time, we just force powered them off. When they came back up there were some issues with some SRs which we fixed, but ideally the servers wouldve shut down in a timely fashion.

 

I've been in touch with Citrix support, and the support tech sent me a couple of links to the HCL, which MS iSCSI wasn't on. He said that MS iSCSI works but isn't supported. I asked him for a list of supported software iSCSI vendors and he sent me a link to the HCL again. I was a bit puzzled (because it's a hardware compatability list, not a software compatibility list) but accepted his answer.

 

Do I have any hope? We are due for a storage and server refresh in the latter half of next year, and I was hoping iSCSI could get us through to then.

Link to comment

8 answers to this question

Recommended Posts

  • 0

Sorry, I thought I was being very clear!

 

So we have 2 Dell host servers with 2 shared SRs connected and multipathed via LVM over HBA. The HBA-based storage machine is a Dell MD3400, on which we are almost out of space. Most VMs live on this MD3400 box.

 

We also have a Dell NX Windows Storage Server 2016 server for backup storage which has a bit of spare space from which the iSCSI drive (not fully up with the terminology) is shared out to the 2 hosts. So the software iSCSI drive is mounted as an SR on Hypervisor on which we have a few large VDIs that can't fit on the main SRs.

 

Hope that clears it up!

Link to comment
  • 0

Thanks for those pointers. I know CH8 has a higer memory requirement for Dom0, and it is just at whatever that default is.

 

Dom0 CPU usage on one of the pool hosts (master, 18 domains) is ~150–250%, the other (16 domains) is ~50–150% (both have 16 vCPUS). RAM on these hosts is 7584MB (128GB total). Another host (not a pool member) Dom0 has ~150% (8 vCPUs). RAM is 4720MB of 72GB total.

 

Top gives:

 

host 1: 

top - 10:15:00 up 17 days, 57 min,  1 user,  load average: 0.80, 1.10, 1.07
Tasks: 3883 total,   2 running, 409 sleeping,   0 stopped, 2878 zombie
%Cpu(s):  1.4 us,  1.9 sy,  0.0 ni, 89.8 id,  5.5 wa,  0.0 hi,  0.7 si,  0.6 st
KiB Mem :  7498176 total,  1270616 free,  1833132 used,  4394428 buff/cache
KiB Swap:  1048572 total,  1048572 free,        0 used.  5310732 avail Mem 

 

df-h gives:

Filesystem                      Size  Used Avail Use% Mounted on
devtmpfs                        3.6G  112K  3.6G   1% /dev
tmpfs                           3.6G  1.1M  3.6G   1% /dev/shm
tmpfs                           3.6G   15M  3.6G   1% /run
tmpfs                           3.6G     0  3.6G   0% /sys/fs/cgroup
/dev/sda1                        18G  2.4G   15G  15% /
xenstore                        3.6G     0  3.6G   0% /var/lib/xenstored
/dev/sda4                       512M  2.0M  510M   1% /boot/efi
/dev/sda5                       3.9G  2.4G  1.3G  65% /var/log
10.0.8.82:/var/nfs/iso-library   50G   24G   23G  51% /run/sr-mount/a32deab3-5615-1290-95ed-358c46d6f256
tmpfs                           733M     0  733M   0% /run/user/0

 

host 2:

top - 10:14:57 up 17 days, 57 min,  1 user,  load average: 0.86, 0.81, 0.87
Tasks: 759 total,   1 running, 358 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.4 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.2 st
KiB Mem :  7498176 total,  3432996 free,  1567288 used,  2497892 buff/cache
KiB Swap:  1048572 total,  1048572 free,        0 used.  5647632 avail Mem

 

df-h gives

Filesystem                      Size  Used Avail Use% Mounted on
devtmpfs                        3.6G   68K  3.6G   1% /dev
tmpfs                           3.6G  644K  3.6G   1% /dev/shm
tmpfs                           3.6G   13M  3.6G   1% /run
tmpfs                           3.6G     0  3.6G   0% /sys/fs/cgroup
/dev/sda1                        18G  2.4G   15G  15% /
xenstore                        3.6G     0  3.6G   0% /var/lib/xenstored
/dev/sda4                       512M  2.0M  510M   1% /boot/efi
/dev/sda5                       3.9G  1.4G  2.3G  39% /var/log
10.0.8.82:/var/nfs/iso-library   50G   24G   23G  51% /run/sr-mount/a32deab3-5615-1290-95ed-358c46d6f256
tmpfs                           733M     0  733M   0% /run/user/0

 

host 3 (non pool member):

top - 10:25:54 up 17 days,  1:37,  1 user,  load average: 2.08, 2.03, 2.11
Tasks: 1620 total,   1 running, 249 sleeping,   0 stopped, 1246 zombie
%Cpu(s):  7.1 us, 12.1 sy,  0.0 ni, 70.7 id,  8.9 wa,  0.0 hi,  0.5 si,  0.7 st
KiB Mem :  4619276 total,  1955936 free,  1017128 used,  1646212 buff/cache
KiB Swap:  1048572 total,  1048572 free,        0 used.  3434324 avail Mem 

 

df -h gives:

Filesystem                      Size  Used Avail Use% Mounted on
devtmpfs                        2.2G   56K  2.2G   1% /dev
tmpfs                           2.3G  500K  2.3G   1% /dev/shm
tmpfs                           2.3G   11M  2.2G   1% /run
tmpfs                           2.3G     0  2.3G   0% /sys/fs/cgroup
/dev/sda1                        18G  2.4G   15G  15% /
xenstore                        2.3G     0  2.3G   0% /var/lib/xenstored
/dev/sda5                       3.9G  772M  2.9G  21% /var/log
10.0.8.82:/var/nfs/iso-library   50G   24G   23G  51% /run/sr-mount/790192f4-c72f-c542-0de0-8054afd3b3c9
tmpfs                           452M     0  452M   0% /run/user/0

 

Not sure what to do from here regarding Dom0 memory or performance. What are your suggestions?

 

The reason we added an iSCSI SR was to help with space issues on the HBA-based storage.

 

We did have some issues with SRs about a month ago. VMs couldn't be suspended, or migrated between hosts, which I sorted out by upgrading firmware on the storage box. I have had a couple of VMs recently with snapshot chain too long errors. The first one I fixed by exporting and reimporting the VM (64GB VDI only). There is one at the moment which has 3 VDIs totalling ~2TB which won't be fun to export and reimport. Are there better ways of fixing that in CH8???

 

I'm wondering if this performance problem is iSCSI now because the 3rd host doesn't have an iSCSI SR, just local storage and 40TB disk attached as a removable disk (to give a VM the whole 40TB). I'm wondering if it may be the backup software we are using: Quadric Alike's A2 (it uses the 40TB disk mentioned before as its backup storage). It provisions small helper VMs that it leaves running. The 3 hosts are part of the regular backup schedule (which is normally working really well). I am going to turn off the persistent helper VMs in Alike's settings, kill all the resident helpers and restart the server and see if that helps.

Link to comment
  • 0

Good idea on the backups for testing, any 3rd party intrusion could be an issue.  Unfortunately with chain issues

exporting/importing is the best fix. I've never had any chaining issues with LUN's, only local storage. Maybe I have

just been lucky. So I keep local VM storage small not only for that reason but also for backup/restores. Anything

large that 300MB or so goes on an iSCSI LUN.

 

--Alan--

 

Link to comment
  • 0

You can run an sr-scan from XenCenter or try to see if any coalescing is needed on an individual VM using

xe host-call-plugin host-uuid=<host-UUID> plugin=coalesce-leaf fn=leaf-coalesce args:vm_uuid=<VM-UUID>

which I think is still supported in more recent XenServer/Citrix Hypervisor releases.

If not, then, you can (1) try to do just a storage migration to a different SR that should help clean up stale snapshots, or to be 100% sure,

do the export/import (which entails of course a long time as well as quite a bit of downtime).

 

-=Tobias
 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...