Jump to content
Welcome to our new Citrix community!
  • 0

Hypervisor 8.2 vms constantly crashing


Ryan Haas

Question

Greetings.  I am fairly new to this product and I inherited this setup.  I have spent the last 3 days (and nights) dealing with an issue for one of our customers (work for an msp).  We have 2 hypervisor hosts in a pool (named ad02, ad03).  Issue started a few days ago when we had to shutdown the infrastructure because the SR (synology nas) needed an update.  Upon, bringing the infrastructure back, it was noted that the performance was slow.  

A few key vms (mostly on ad03) were constantly becoming unresponsive and or getting a bsod.  I know it is not my SR (synology) and I will get to that.  Even trying to power off a vm became problematic as it was just stay in a state of "paused/yellow" for hours on end.  Restarting the tool stack had little...if any effect.  When it would eventually enter in a state of "shut off", it could take a long time for the vm to even power on.  

Even trying to move the vm to the other host in the pool (ad02) would often not work.  It looks like it would be moved and you would try to power it on only to be in a state of yellow.  You power it back on, move back to ad03 and then you would be able to power it on.  Trying to reboot the hosts would also bring their own issues.  A rebooted host would stay in "maintenance mode" for hours and when you try to exit maintenance mode, xencenter would report that it is still booting.  Trying this directly on the server itself would also not work.  

That brings me to day 3.  I somehow managed to get all vms on one host (dont ask me how, i was going on 26hrs at this point).  The host with no vms was in maintenance mode/still booting up.  At that point, all vms running on ad02 were running FLAWLESSLY.  It was the best performance the users have experienced in years, if ever.   At that point, I went to bed...for an hour where they began having issues again.  It was noted that ad03 had exited maintenance mode but had no vms running on it at that point.  

Later on that night, I rebooted ad03 again and it came up in maintenance mode.  Everything on AD02 was running great again.  An hour later, AD03 exited maintenance mode and key vms on AD02 started hesitating and even bsod' again.  It was then i truly discovered the correlation and put AD03 in maintenance mode and everything stabilized.  Just to be safe, i even powered off ad03. 

What is going on AD03 to cause VMs on AD02 to become unresponsive and even crash?  


Thanks  

Link to comment

3 answers to this question

Recommended Posts

  • 0
4 hours ago, Jeff Riechers1709152667 said:

It sounds like a multipathing issue on the SR.  Could the firmware update have changed something with the network connectivity?

Yea about that, the firmware update was never completed (long story) so I really don't think its a multipathing issue.  It seems this issue has progressively gotten worse within the last year.  This was just the latest (and worse) of their vm outages and as of yesterday, nothing was literally running on the AD03 when vms on AD02 were having an issue.  

However, I guess I can try standing up AD03, wait for it to come out of maintenance mode, severe ties with the SR (in this case the synology) and see what happens.  Happen to know of any multipathing (in NFS) settings on a Synology to look for?

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...