Jump to content
  • 0

VMs Stuck on Boot in Citrix DaaS - Random Freezes, MCS with Non-Persistent VMs


Jeremiah Alger

Question

Posted

Description: I am experiencing an issue with some of my Citrix VMs (created using Machine Creation Services (MCS)) that are randomly getting stuck on boot and freezing before they can complete the startup process. The problem occurs intermittently in a Delivery Group. After several attempts, the VMs eventually boot successfully and register in the Citrix DaaS environment. The issue has been ongoing and is not specific to a particular VM or host in the cluster. The VMs are non-persistent, and the VDA version currently in use is 2402.100.629.

For example the delivery group currently has 7 XenDesk VMs. 6 of the 7 are registered and users are logged in and there is no issue. The 7th is stuck on boot in vSphere and "unregistered" in the delivery group. This will happen randomely to any of the 7 VMs. After several force reboots the VM will eventually finish its boot cycle and register. Like I said it is random, there could be 4 registered and 3 unregistered or one day all 7 are fine. 
 
Environment:
Citrix DaaS (Cloud-managed Citrix Virtual Apps and Desktops)
vSphere Cluster with 9 hypervisors (load balanced)
vSphere Client version 7.0.3.01900
VM Creation: Machine Creation Services (MCS) for creating non-persistent VMs
VDA Version: 2402.100.629
VMs are configured with 16GB RAM and 4vCPUs
Storage: Shared storage with vSAN for VM disk provisioning
Non-Persistent VMs using MCS
Network: DHCP IP allocation with proper DNS configuration
VMware Tools: Up-to-date and functional on all VMs

Issue:
Random Freezing on Boot: Some of the VMs in the Delivery Group occasionally freeze during boot, and they remain stuck in the "Starting Windows" phase or similar until they eventually boot up after several attempts.
The VMs are stuck during the boot process and don’t register with Citrix DaaS until they successfully start up after repeated retries.
This problem occurs randomly and is not consistently replicable on any particular VM, user, or time of day.
  
Steps Taken/Configuration: 

Deleted and rebuilt the Delivery Group and Machine Catalog (basically started from scratch).
VMware HA (High Availability): VM Monitoring is enabled with VM Monitoring Only and High Sensitivity to automatically restart unresponsive VMs.
vMotion: The issue persists across multiple ESXi hosts in the vSphere Cluster, indicating it's not host-specific.
Citrix VDA Logs: No specific errors in Citrix Director or VDA logs during boot-up. 
VMware Logs: No obvious errors related to VM boot or registration.

Symptoms:
The VMs are non-persistent and are provisioned via MCS. After being stuck on boot, they eventually recover and register after rebooting them several times via the delivery group in Citrix DaaS.
Citrix DaaS sessions for affected users fail to launch until the VM successfully starts and registers.
The issue is random and not tied to any specific VM or configuration.

Questions for the Community:
 
1. Has anyone experienced random boot freezes in Citrix DaaS or MCS environments, especially in non-persistent VM configurations?
 
2. Are there any known issues with VDA version 2402.100.629 that could be causing these intermittent freezing issues?
 
3. Can VMware HA or MCS configurations be tweaked to better handle this situation?
 
4. Are there any recommended best practices for load balancing VMs in vSphere environments that could help prevent these random boot freezes?
 
5. Any insights into specific vSphere settings, VMware Tools, or storage configurations that might contribute to this issue?
 

Thank you in advance for any help or guidance!

6 answers to this question

Recommended Posts

  • 0
Posted
2 minutes ago, Jeff Riechers said:

That sounds like a load issue on the hypervisor.  Any alerts there?

That's what I would think too, but there are no alerts. On the surface it looks pretty good. Every thing in my opinion would point to resource contention. 

  • 0
Posted

Just an update to drive the point home, now that VM is working perfectly fine. It is registered and available for a user to log in. The only thing I did was "force restart" it for the 10 - 20th time. I did notice that vSphere migrated the VM to another hypervisor right before it went into a registered state. 

  • 0
Posted

I still have not discovered what is causing this to happen. One thing I noticed was that if I deleted the DHCP record from the DHCP server and also deleted the the DNS forward and reverse lookup records from the DNS server, of a XenDesk VM that is stuck on boot and then force restart it, it will more likly boot to the OS. This does not 100% of the time work, but it definitely works often. Has anyone experienced this or maybe has some idea as to what is going on? 

  • 0
Posted

Just incase anyone ever has an issue like this, here is how we resolved it. The issue was not the DHCP server but the DNS server. Be sure that for both Forward and Reverse Lookup Zones you have "Dynamic Updates" and "Aging / Scavenge Stale Resource Records" setup correctly. Also get rid of any stale records from old VMs. In our case old VMs that no longer existed, still had records tied to an IP address that was being used by an active VM. So basically duplicate IP entries. We manually deleted the old records, flushed DNS, stop and started DNS. 

This seemed to help quite a bit but once in a while a VM would still hang on boot. What ultimately fixed it was setting the MCS Master image to have static DNS settings, updating the catalog, and rebooting the VMs to get the updated setting.

Hopefully this might help someone else one day. Thank you as always @Jeff Riechers for jumping in right away and trying to help! 

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...