Jump to content
Welcome to our new Citrix community!
  • 0

Win 10, PVS, userlayer workstations lockup


Mike Gray1709151604

Question

I have been having some of our Windows 10 layered, PVS provisioned, with user layer machines lock up of some users.  Typically, director says it is a session prepare failure.  The machine becomes unregistered.  It still pings but you cannot connect to the machine in any way and must be force restarted.  The scenario is always, user has been logged in for a day or more, disconnects in the evening, reconnects in the morning and gets the session prepare failure.  I have about 50 of these machines running now with 35 running concurrent user connections.  Almost daily I will have one or two lockup and become unregistered like I described.

 

They have Ram cache with disk failover.  500MB of RAM cache and 6GB of failover disk space.  I am guessing one of two things is occurring, either write cache fills up and they lock or they lose connection to the user layer vhd and they lock.  The machines rarely lock while a user is using them it always seems to be after they have disconnected for the evening and when they return in the morning they are locked.  

 

I use these machines daily and in the last 3 weeks it has locked on me twice in the way I described.  I typically log in Monday to a new desktop then just disconnect and reconnect all week.  The machines all restart during the weekend.

 

My question is: What is the best way to go about troubleshooting this issue?  When the machines lockup there is no way to go back and review the machines event logs.  Are there logs on the PVS server or something that logs the userlayer connection?

Link to comment

6 answers to this question

Recommended Posts

  • 0

I have worked with a customer that struggled with this for a while and found that the technology they were using to provide the file share used kerberos and the kerberos ticket expired after 12 hours disconnecting the user layer vhd files.  Network issues could cause similar issues.

 

So one test might be to use a test account and connect your session.  Leave it connected over night and see if the connection is broken.   if it is investigate with your storage vendor.

 

For your cache idea write a script that check the available space on the cache disk and writes a log to the cache disk so it persists.  Then you can see if your running out of space.  remember to put a sleep in that script si it only runs says every 5-10 minutes once started.

 

Let us know what you find.

 

 

 

 

Link to comment
  • 0

I am using a windows 2016 core server to provide a Windows file share served from Pure ssd SAN storage.  This doesnt happen for everyone but I can start checking when they logged in and when it was disconnected.  For me I am connected daily and disconnected for 14-15 hours each night typically and I have had my machine lock up a couple times in the past three weeks.

 

We do run a script from the PVS servers that checks each app server's write cache and takes certain actions when it gets above 80% full.  I know this script works and reports the Virtual App servers, but I need to go review that script and make sure it is checking on the virtual desktops we serve out of pvs.

 

https://www.citrix.com/blogs/2018/01/08/optimizing-windows-and-citrix-app-layering/

 

In this article it says Increase Disk IO Timeout to 200 ms.  Is this the key you are referring to?  HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk  Everything I have seen about that key has seconds listed, not milliseconds.  I was curious if the user layer could be slow to respond and cause the virtual to hang?

Link to comment
  • 0

I have not experienced my desktop lock during the day but I have had a couple users tell me they have locked.  Most of what I see is through director and most all the errors are in the morning and are communication errors.  I believe this is due to the workstation being hung.  I do not have any antivirus running on the server providing the share.  We use crowdstrike on the desktop which does not run any type of schedule scans.  I did run the optimizer.

Link to comment
  • 0

Another interesting test might be to mount a user layer on a regular PC and leave it connected.  See if it gets disconnected over time.  That will of course differentiate something happening on the network or share from the virtual machines.  Its not perfect because the network is different so you could also do the same from a full clone vm on the same subnet.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...