Jump to content
Updated Privacy Statement
  • 0

Citrix Virtual Apps and Desktops 1912 PVS random VDA server will freeze/crash and I have to hard reset the VM


Alex Liu1709161424

Question

Running CVAD 1912 PVS on our VMware 6.5 environment.  All of our Citrix servers are running Windows Server 2019 Datacenter.  We currently have about 6 VDA servers that users log into for work.  Randomly each week, one of those servers will freeze/crash.  When I try to access it from the VM console, I can't do anything.  Users report that their sessions are frozen as well.  The only thing to do is to do a hard reset on the VM.  Once it reboots, it's okay again until the next time.  I don't see anything in the Event logs that suggest what the culprit is.  I am currently working with Citrix support to try and diagnose the issue.  They are looking at memory dumps.  We are using Trend Micro Apex One as our antivirus, and Citrix Support suggested we put in place the proper exclusions.  However, it's still freezing with the exclusions in place.  Once the server freezes, eventually you'll see an error log show up on the Delivery Controller that the Broker Agent lost contact with the server.  Yesterday, a user first called that they seemed stuck.  At first, I checked the VM console and saw that I could still log into it.  So I thought the server was okay still.  I tried to shadow the user and it timed out.  Shortly after, we received other calls from users that they couldn't move either.  I noticed they were all on the same server.  When I went back to the VM console, it was frozen.  Anyone experience anything like this before?

Link to comment
  • Answers 51
  • Created
  • Last Reply

Top Posters For This Question

Recommended Posts

  • 0

Hi,

I am having the same problem with Windows 2019 and LTSR CU1 - the problem was happening about once a month, but we also use Fslogix and after increasing the number of user profile folders redirected to the C drive, the number of crashes have increased to 1-2 per day. We also have a Windows 2012 desktop and the crashes do not occur at all on this. We use Sophos anti-virus, but I don't think this is causing the problem, since there are no crashes on the 2012 desktop or the 2016 desktop we used before switching to the 2019 desktop.

 

After reading this thread, I'm going to upgrade to CU2 and also the latest version of Fslogix (2.9.7654.46150) so hopefully these will fix the problem.

Thanks,

Gary.

Link to comment
  • 0

Hi Gary,

 

Welcome, and sorry to hear about your issue.  For our environment, we are on Windows Server 2019 Datacenter, running LTSR 1912 CU1.  Someone else also tried upgrading to CU2, but was still experiencing the same issue.  In both cases, our problem seems to have something to do with shadowing.  Since monitoring shadowing times, we have found that everytime someone shadows a user, at some point during that day, that VDA server will crash.  As long as we don't shadow, we never get a crash.  I am also using Fslogix (version 2.9.7237.48865).

 

As previously mentioned, Citrix support has given me a private fix file to try.  It still crashed with the private fix file.  However, after that, Citrix wanted to do more monitoring by tracing and turning on Debug logging for the VDA.  Since then, I haven't been able to reproduce the crash when shadowing.  Now, I just need to find time to troubleshoot further and see if I can find a reason for that.

 

Researching on other threads, it does seem like there could be other reasons why servers can be crashing.  So let us know if anything you do helps your situation.

 

Thanks,

Alex

Link to comment
  • 0

Hi Alex,

 

Thanks for the reply. I did not suspect that shadowing could be a cause, but today I was able to confirm that one of my colleagues was shadowing a user when their session stopped responding, though it may have been a coincidence. We are using Windows 2019 Standard edition. but now I wish we'd stuck with Windows 2016. I've logged a call with Citrix Support and asked if we can get a copy of the private fix. We'll also try disabling shadowing for a few days to see if the crashes stop. I'll let you know if we make any progress.

Thanks.

Gary.

Link to comment
  • 0

Hi Gary,

 

Yeah, the shadowing crash behavior is weird because sometimes it will stop responding during the shadow, sometimes you can shadow fine and sometime after, can be up to a few hours, that server will stop responding.  But we have been able to reproduce it many times.  However, the other weird part is that I've tried to reproduce it when just updating the main image, that way I don't impact production when users are logged in.  But when I log into the update server to make changes to the main image, and then try to shadow that user, I can't get that server to crash.  So I don't know if there needs to be at least several users logged in at the same time doing things or what.  The fact that the time of crashing is variable makes it seem like there is something else that determines when it crashes.  But it seems like shadowing acts as a catalyst of some sort.

 

Good luck with obtaining the private fix.  Before they gave it to me, they write out that it's specific to my environment.  They needed to collect stats for what programs and services we had installed before they were able to come up with the private fix file.  But it doesn't hurt to ask! :)

 

Thanks,

Alex

Link to comment
  • 0

Hi Alex,

 

Well, I haven't got the fix yet - I have to send support a memory dump from a crashed server first.

 

I noticed that the problem got steadily worse over the last 3 months, until it was happening to 1-3 servers per day. I've got a feeling that it may be related to the number of PVS vdisk versions, as over that time we went up to 5 versions, but since the vdisk was merged into one version the problem hasn't happened. It may be just a coincidence, but it's been a week without any crashes so far.

 

On quite a few incidents, a tech was shadowing from the server that froze to a user on another server. Now we only shadow from non-Citrix servers, which reduced the number of crashes.

 

On one occasion, user sessions on the same server started freezing, though we could still connect to that server. Director reported the users as having active sessions, and we tried logging them off but their sessions were still listed as active - a remote quser query showed that they had indeed been logged off. I could still remotely connect to the server and check event logs, but could not log on via RDP or the VM console. The eventlog showed a number of Application Errors for C:\Program Files\Citrix\Virtual Desktop Agent\DirectorComServer.exe, and I tried restarting the Citrix services on that server, but it didn't revive it. I restarted the RDS service, but soon after the server went into a dead hang and could no longer be connected to or even pinged. Director still showed the users as active until the server was hard reset. Usually the server goes straight into a dead hang without any useful events, so it may have been a different problem.

 

Thanks,

 

Gary.

Link to comment
  • 0

Thank you for this discussion.  I wanted to chime in too.  We are seeing almost the exact same issue as what is described in this discussion.  We are running:

 

LTSR 1912 CU1 with Windows Server 2019 Standard Version 1809

VMware 6.7

2 x PVS servers with 22 Target Devices / Citrix session hosts

 

Approximately once to twice a week, at random times, one to three of the 22 Citrix Servers will freeze up.  The user's sessions still show as Active in Studio, though Studio shows the server(s) as Unregistered.   Our remote screen control tool still thinks the servers are connected, but when we try to connect we just get a black screen.  We cannot ping the frozen servers.  PVS console shows those server targets as Down.  Sometime, amazingly, we can still access the write cache drive somehow via UNC path even though we can't ping the frozen servers.  In VMware, it shows server is Powered On.  If we open Console screen, we either see all black screen or we see the Windows screen where it's waiting for you to send a Ctrl+Alt+Del, except sending Ctrl+Alt+Del doesn't seem to do anything at all.  The VM appears to be completely frozen.  The users that were on that server cannot login to other servers, and cannot work.  The only option seems to be to Reset VM from VMware.  As soon as it boots up, it's fine again.  User's sessions are released and they can login again.

 

There's nothing in the Event Viewer that helps explain the freeze.  We've sent CDFControl Logs and Problem Reports, but Citrix Support has not found anything in those to help.  I've got WireShark running now on both PVS servers to try to get a packet capture from the PVS side when it happens.  I'm looking into if we can get a WireShark running on the Target device side, but we are in a multi-tenant cloud hosting environment and getting promiscuous mode turned on a vSwitch is probably not going to happen.  We might be able to Wireshark to the Write Cache Drive since it is persistent.  It's a pain to have to start it on 22 servers though, because we don't know which one will freeze next or how to recreate the issue.  I was going to see if I could trigger a freeze on a test server by Shadowing a session, per some of the posts in this discussion.  I'm going to try to Suspend the VM and get the .vmss file for a memory dump the next time one freezes.

Very frustrating.  Alex, has Citrix Support figured out anything for you yet?  How did people go about disabling shadowing?

 

Thanks in advance.

Link to comment
  • 0

Update:  We believe we have also linked the issue to Shadowing sessions.  It's not every time.  There may be 10 instances of sessions being shadowed before the issue occurs.  But there have now been multiple times where an IT staff user was in one Citrix server and shadowing a Citrix user on another Citrix server, and both servers froze shortly afterward.


We are going to avoid using the Shadowing from Director for a few weeks and see if the issue re-occurs or not.  So far, we've gone 3 days without another freeze.  But we've gone as long as 8 days before without a freeze.  So it will take a couple weeks minimum without a freeze to really be able to say there's a difference.


We are not getting very far with Citrix Support so far.  They have only asked for more logs with no solutions.  I think this Discussion thread is plenty of evidence there is a somewhat widespread bug with Windows Server 2019 & PVS most likely related to shadowing, that is causing PVS Device Targets / Citrix Session Hosts to completely freeze up intermittently.

Has anyone else received any actual answers or solutions for this issue from Citrix support yet?

Link to comment
  • 0

I work on an environment is that experiencing the same problem.  This freeze problem is happening on 2012R2 HyperV and 2019 XenServer Target Devices.  HyperV Target Devices just reboot when the problem happens but the XenServer Target Devices hang in a frozen state until they get a manual force restart. 

 

As far as I can tell this problem was reported not long after upgrading from PVS 7 1903 (maybe 1906) to LTSR 1912, all on Server 2012R2 both PVS and Target Device software.  Since then I've replaced the PVS servers with Windows 2019, built a new 2019 Target Device on different hardware but same freezing result still happens.  I saw somewhere else that the frozen or locked up serves were a result of iDRAC Passthrough setting and it needs to be disabled.  However the Dell PowerEdge servers are already set this way.

 

I never correlated to Shadowing until I discovered this post.  Just last week one of the help desk members indicated that a frozen server was the same one hosting a user session he shadowed.

 

Yesterday I updated to CU2 on the VDA, PVS and Target Device software.  I considered going latest CR instead just to try and stabilize the environment.  The post Alex Liu mentioned suggested disabling URCP but it sounds like that did not solve the problem.  I have a case open with Citrix Support and we are at the stage of trying to collect a full memory dump.

Link to comment
  • 0

Hi,

 

I got a memory dump to Citrix Support and they found numerous locked threads for Outlook and Chrome, plus a few for bnistack6. 

 

We are in the process of upgrading to CU2 for storefront, delivery controllers, PVS servers and the clients on the VDAs, but it sounds as though it won't help. We've also upgraded to the latest version of fslogix. We have two other desktops that are based on Windows 2012R2 and 2016, which never have this problem, and use the same PVS servers. 

 

Hopefully, Citrix will give this a higher profile now that more people are reporting the same problem.

 

Thanks,

 

Gary.

Link to comment
  • 0

Hi All,

 

we experience the exact same problem and thanks to this topic haven't had any freezes since we stopped shadowing sessions. Citrix is now asking us to disable URCP and do some further testing, but we would like to not bother our users with more freezes while testing. I haven't read a conclusive answer: In your opinion, does disabling URCP help?
 

Thanx,

Jasper

Link to comment
  • 0

Hi

we have exactly the same issue.

1912 CU3 PVS on Xen Server

after 2 cases, citrix said to involved Microsoft saying it is not their fault

 

EDT was set to prefered, and changed to disabled

we are trying to disable URCP

it is not shadowing issue because we have session freezing even when no one is shadowing

 

the facts :

session are frozen for some users.

the workaround is to hide the session with the powershell command, but it involves a lot of disconnection, so we can not hide session for the rest of our lives

 

 

if anyone have news, it would be appreciated

 

thanks !

 

Link to comment
  • 0
On 1/5/2022 at 5:41 PM, Theo Portaal said:

Hi
Did anyone solve this problem? Disabling shadowing is not an option for us, so a real fix would be great!

Hi

fast reconnect was our issue

after setting the registry FastReconnect"=dword:00000000

([HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\Reconnect "FastReconnect"=dword:00000000)

it solves our probleme,

 

Link to comment
  • 0
On 3/17/2022 at 1:25 AM, Markus Fumasoli1709152661 said:

@Gary Lloyd: Was you able to fix the issue? We have the same issue but Citrix Support is not really a help

 

 

Hi,

Citrix and Microsoft support blame each other for any issue and their tactics are to procrastinate by requesting the same logs and information repeatedly until you either give up or solve it yourself.

 

I didn't try the URCP fix in the end, however, we've since upgraded our hosts and storage and have been able to allocate more memory to the Citrix servers. The crashes caused by shadowing always happened during peak usage (usually the afternoon) when the load on the farm was reaching 80-90%. Now it gets to 60-70% load and I have been cautiously testing shadowing without any crashes so far. I'm going to re-enable shadowing for the rest of the team to see how it goes - I'll let you know what happens.

 

Link to comment
  • 0
On 4/28/2021 at 7:00 PM, Gary Lloyd1709162562 said:

We haven't had any freezes since disabling shadowing either. We are going to try disabling URCP, then re-enable shadowing. In the meantime, people are using Teams to share screens.

Did this make a difference? One could make the assumption the issue could be UDP related when you have a MTU mismatch.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...