Jump to content
Welcome to our new Citrix community!
  • 0

Citrix Virtual Apps and Desktops 1912 PVS random VDA server will freeze/crash and I have to hard reset the VM


Alex Liu1709161424

Question

Running CVAD 1912 PVS on our VMware 6.5 environment.  All of our Citrix servers are running Windows Server 2019 Datacenter.  We currently have about 6 VDA servers that users log into for work.  Randomly each week, one of those servers will freeze/crash.  When I try to access it from the VM console, I can't do anything.  Users report that their sessions are frozen as well.  The only thing to do is to do a hard reset on the VM.  Once it reboots, it's okay again until the next time.  I don't see anything in the Event logs that suggest what the culprit is.  I am currently working with Citrix support to try and diagnose the issue.  They are looking at memory dumps.  We are using Trend Micro Apex One as our antivirus, and Citrix Support suggested we put in place the proper exclusions.  However, it's still freezing with the exclusions in place.  Once the server freezes, eventually you'll see an error log show up on the Delivery Controller that the Broker Agent lost contact with the server.  Yesterday, a user first called that they seemed stuck.  At first, I checked the VM console and saw that I could still log into it.  So I thought the server was okay still.  I tried to shadow the user and it timed out.  Shortly after, we received other calls from users that they couldn't move either.  I noticed they were all on the same server.  When I went back to the VM console, it was frozen.  Anyone experience anything like this before?

Link to comment
  • Answers 51
  • Created
  • Last Reply

Top Posters For This Question

Recommended Posts

  • 0

I've got a similar problem but using virtual desktops instead, so the impact is less, but seems to happen more frequently than aliu790. Out of 1000 unique connections, we may see about 5 different VMs per day where the VM will lose registration with the delivery controller. Upon trying to view the console in XenCenter, the VM is unresponsive. I have a feeling it might be related to CVAD 1912 CU1 VDA on the streamed image, as I don't recall seeing this particular problem before we applied the CU1 VDA. Upon once instance we noticed a quick CDViewer.exe error pop up just before the user lost connection, and the VM went unregistered.

 

The other time we're seeing this, is when brokering a session upon a fresh login, some VMs are failing before a user even gets fully logged in. The session gets stuck, and it takes manually shutting down the VM they were trying to sign into.

Link to comment
  • 0

Hi @Rick Culler,

 

Thanks for the response.  Yeah, this has been a frustrating one to deal with.  I am currently testing the VDAs with Trend Micro Apex One uninstalled and using a different AV solution, making sure the Citrix exclusions are still in place.  Now it's just wait and see if things improve or not.  Sometimes it would go almost two weeks before a freeze, so we'll see.  I'll follow up down the road with whatever happens.

Link to comment
  • 0

Hi @Joe W,

 

Thanks for the response.  We are using ESXi 6.5.  The VMs are all Windows Server 2019 Datacenter.  I don't recall actually trying to ping the frozen server from an external device when it freezes.  However, I did try an experiment once where I had a script that was logging pings to a text file on the server itself.  It would ping the server itself, it would ping our gateway, and it would ping an external IP, like 8.8.8.8.  The next time that server froze, I checked the text file and noticed that 2 minutes before the server froze, it starts logging a "General failure" error.

 

We are still monitoring our servers after uninstalling Trend Micro Apex One, and it's been almost a week now without any freezing.  We'll see.

Link to comment
  • 0

Hi, 

 

I'm currently experiencing the same issue. 

We don't have AV installed (even defender is disabled).

I raised a case with Citrix, but they didn't find any issue in the memorydump and asked to open a case with microsoft.

Microsoft also didn't find the cause, but saw some Nvidia and Citrix related issues. 

Now I have a case with Nvidia and Citrix to look into the issue. 

 

Because the eventvwr is logging inside the C drive all logs are lost after a reboot. I just found a script to change every log file to the WCD so after a reboot I can check the eventvwr. 

 

We are using:

Citrix Hypervisor 8.2 (latest hotfixes)
Windows 2019 with VDA 1912 CU1 and PVS 1912 CU1.

PVS servers are also Windows 2019 with PVS 1912 CU1.

 

Please let me know if any of you have a solution. 

 

Regards,

 

Sjoerd

Link to comment
  • 0

Hi Sjoerd,

 

Thanks for posting. Please let us know the outcome of your case(s). 

 

We just recently upgraded our Provisioning to 1912 CU1 and since have not experienced the issue. I'm not saying that it is resolved but just wanted to document it here. 

Link to comment
  • 0

Hi Sjoerd and JWoltering,

 

Regarding my case, here is the latest development:

 

We are also using Windows Server 2019 Datacenter, and 1912 CU1.  I've gone from Trend Micro to Defender, to Kaspersky.  All three have froze on us.  One thing I started noticing was that it MAY have something to do with shadowing.  It first started on my radar when I tried to shadow a user.  When I was connecting to them, it froze the server.  We then had that user log onto a different server and tried to shadow them again.  Once again, it froze.  However, when we tried a 3rd time, it didn't freeze.

 

Sometime later, when I needed to shadow someone else, I made note of who I was shadowing and what server they were on.  The shadow session went fine, and I was able to close the shadow session without any issues.  Interestingly, about 2 hours later, that server froze!  The next day, I had to shadow someone again, and almost the same thing happened.  Around an hour and a half later, that server froze.

 

I mentioned this theory to the Citrix Support guy I'm currently working with, and he wanted to do a Wireshark trace session and see if I can duplicate the freeze.  We picked out one of the VDAs we were going to do the trace on.  Turned on Promiscuous Mode on the vSwitch it's using.  Then we installed Wireshark on another VM that is on the same ESX host, using the same vSwitch, and started a packet trace.  I then found a user on that server and started a shadow session with them. I just had them print a PDF in their email.  Then I requested control and printed another PDF from their email and ended the shadow session.  About 30 minutes later, that server froze!  I then stopped the packet trace and uploaded it to Citrix Support.  They are now going to analyze it and get back to me.

 

I'll post back once I have more information.

 

Alex

Link to comment
  • 0

Hi Alex,

 

We saw that same behavior with another customer and stopped using the shadowing feature and then they didn't had that issue. 

 

I will test this at a customer to disable shadowing. 

If Citrix finds anything hopefully this solves our problem. 

 

We just had a crash: users got shadowed at 8:07, server crashed 15 minutes later. at 8:08 all logging stopped. (logging was redirected to WCD). 

I just found this forum post: https://discussions.citrix.com/topic/408153-remote-assistance-crashes-local-and-remote-session-when-it-ends/ 

 

Maybe this would solve our issue. 

 

Thanks for the update. 

 

regards.

 

Sjoerd

Edited by sjoerd van den Nieuwenhof
Extra info
Link to comment
  • 0

Hi Sjoerd,

 

Thanks for the update and additional evidence/clues pointing to shadowing.  Referring to the forum post you mentioned, I actually saw that before and tried disabling URCP.  Unfortunately, for me, it didn't help.  Perhaps you can try it and let me know if it made any difference for you.  Maybe I didn't do it correctly.  I just applied it to the VDA image.

 

Alex

Link to comment
  • 0
12 hours ago, Sjoerd Van den Nieuwenhof said:

Hi Alex,

 

We saw that same behavior with another customer and stopped using the shadowing feature and then they didn't had that issue. 

 

I will test this at a customer to disable shadowing. 

If Citrix finds anything hopefully this solves our problem. 

 

We just had a crash: users got shadowed at 8:07, server crashed 15 minutes later. at 8:08 all logging stopped. (logging was redirected to WCD). 

I just found this forum post: https://discussions.citrix.com/topic/408153-remote-assistance-crashes-local-and-remote-session-when-it-ends/ 

 

Maybe this would solve our issue. 

 

Thanks for the update. 

 

regards.

 

Sjoerd

 

Strange. We've never had any issues with shadowing in any version going all the way back to XenApp 5.0. It just goes to show that there isn't any one, single solution to this issue. At least not that we've discovered yet.

Link to comment
  • 0

Hi Jwoltering,

 

Yeah, definitely agreed.  I even saw another thread that looked very promising because starting on page 2, there were people posting about the same symptoms we are facing.  Then on page 4, what worked for them was adjusting the Rx Buffers. 

https://discussions.citrix.com/topic/403251-pvs-715-cu3-bnistack-failed-bsod-cvhdmpsys-xenserver/page/2/

 

I saw similar overflow numbers they were experiencing, so I thought it might be the answer, but even after adjusting them and seeing good numbers like them, it still crashes when shadowing.

 

We're coming from a Xenapp 6 environment, and we've used Citrix since Presentation Server 4.5 without any problems with shadowing before either. I guess with the complexity with all the different components now, there can be so many factors involved that can cause issues.  Hopefully, with all this documented online, it can at least help some people. :)

 

Alex

Link to comment
  • 0

Hi,

 

Currently we are testing the new CU2 for provisoning services, they have fixed an issue where due to a network interference the servers could crash:

 

Attempts to migrate a target device set to Cache in Device RAM with overflow on hard disk and Asynchronous IO might fail. The device experiences a fatal exception and displays a blue screen. [CVADHELP-15039]

 

We installed it on Thursday and until today we don't have any crashes. I will keep you posted if it's solved completely.

Link to comment
  • 0

Hi Sjoerd,

 

That sounds like good news.  Hopefully, CU2 will help.  Are you able to test with shadowing and see if it resolves that issue as well?  The weird thing about the shadowing issue is that there can be a delayed response when it comes to crashing.  For us, since we started monitoring it more, it seems to happen anywhere from the time of shadowing, to up to 2 hours later.

Link to comment
  • 0

Hi Sjoerd,

 

Yes, understood.  This is definitely a very annoying problem.  It would be interesting to know though, if CU2 will fix the shadowing problem.  I spoke with the Citrix Support Engineer yesterday, going over the Wireshark trace we did.  The only thing he could show me was to confirm that at some point, the VDA starts to try and reconnect to the PVS server, but can't.  You can see the numerous retries, over and over again, until it freezes.  Unfortunately, he still couldn't pinpoint the exact reason why it starts doing that.  I told him that, for us, the common denominator seems to be shadowing.  He is going to open a ticket with their engineers to take a closer look at that behavior now.

 

Alex

Link to comment
  • 0

Hi Sjoerd,

 

Happy New Year to you too!

 

Interesting.  So the CU2 patch fixed one issue for you, but the shadowing problem still exists.  That's good to know.

 

On my side, Citrix support weren't able to get back to me until yesterday.  They asked me to provide them a list of all the programs installed on my VDA.  They had me run this command in the command line and provide them the file so they can start their analysis:

 

wmic product get name,version > Software.txt

 

So now I'm waiting to see what they come up with.  Until then, we have just been not shadowing, when troubleshooting, and it hasn't crashed.  The only time it did crash was when someone accidentally forgot and shadowed someone one time.

 

Alex

Link to comment
  • 0

Hi Alex,

 

I hope they will find something. 
We are going to open a Case also because we have 4 customers with different setups that have the same issue. 

The only thing they have in common is VDA 1912, PVS 1912 and Windows 2019. 

 

If you don't want any user to accidently shadow, you can change the role and disable shadowing. That's what we do to make sure nobody is shadowing. 

 

Regards,

 

Sjoerd

Link to comment
  • 0

Hi Sjoerd,

 

Since you implement Citrix for your customers, may I ask you a question regarding antivirus?  I know that this thread started out with me questioning if antivirus was the culprit for these freezes.  But it looks like that theory has been debunked at this point, after starting with Trend Micro, then trying Kaspersky and Defender, and having it crash with all three.  I think you or someone else even mentioned that it crashed without any antivirus running at one point.

 

But my question is that in your experience with implementations, do you have any recommendations in regards to antivirus?  Do you have any good experiences with any particular antivirus in regards to protection and minimal performance impact?  We are not tied to any one brand and are just looking to see what other IT professionals recommend?  Is Defender now to the point where it's good enough, or are other 3rd party vendors still better and which ones (for Citrix environments)?

 

Thanks,

Alex

Link to comment
  • 0
On 1/6/2021 at 4:10 PM, Alex Liu1709161424 said:

Hi Sjoerd,

 

Okay, good.  I also forwarded your comment above to Citrix Support, so they will know that other people are experiencing the same issue.  Any evidence for them can hopefully allow them to come up with a fix.

 

Thanks,

Alex

 

Hi Alex. 

 

Do you have any information regarding the Citrix Case?

 

Regards, 

 

Sjoerd

Link to comment
  • 0
On 1/6/2021 at 7:33 PM, Alex Liu1709161424 said:

Hi Sjoerd,

 

Since you implement Citrix for your customers, may I ask you a question regarding antivirus?  I know that this thread started out with me questioning if antivirus was the culprit for these freezes.  But it looks like that theory has been debunked at this point, after starting with Trend Micro, then trying Kaspersky and Defender, and having it crash with all three.  I think you or someone else even mentioned that it crashed without any antivirus running at one point.

 

But my question is that in your experience with implementations, do you have any recommendations in regards to antivirus?  Do you have any good experiences with any particular antivirus in regards to protection and minimal performance impact?  We are not tied to any one brand and are just looking to see what other IT professionals recommend?  Is Defender now to the point where it's good enough, or are other 3rd party vendors still better and which ones (for Citrix environments)?

 

Thanks,

Alex

Hi, 

 

We always implement Kaspersky, I didn't do a analysis to see whats the impact of this. 

When looking at Bitdefender they have  Hypervisor Introspection, I didn't work with this but did see some demo's. 

https://www.bitdefender.com/business/enterprise-products/hypervisor-introspection.html

 

When using a Hypervisor security layer, it doesn't impact you VM's as much as the normal AV does. 

 

Regards, 

 

Sjoerd

Link to comment
  • 0

Hi Sjoerd,

 

We are currently doing some testing still.  Citrix support gave us a private fix file to replace in our VDA image to see how it would perform with the shadowing.  I believe it's the bnistack6.sys file.  I implemented it and tried shadowing, but unfortunately, the server still ended up freezing/crashing.  I reported this to Citrix Support and they wanted me to test it again, but this time with more tracing so they could analyze it further.  They had me do a Wireshark trace again, by putting the hypervisor host in promiscuous mode and setting up Wireshark on another VM that was on the same host as the VDA.  They also wanted to have something called CDFControl running with an AOT.ctl file loaded, and have the PVS console set to Debug level logging for the VDA I plan on making crash.

 

This is the twist in the story so far.  Since they have started these traces and Debug logging, I haven't been able to reproduce the crashing!  I've tried on two separate days to make a server crash by shadowing, and it hasn't happened yet.  So now the question is why.  I still need to find more time to do more testing, but this is an interesting development.

 

As for antivirus, thank you for your response.  So it sounds like you've been pretty happy with Kaspersky then.  That's good to know.  I've also heard about the hypervisor layer of security and have been interested in how well those work as well.  I'll probably check out some demos at some point.

 

Thanks,

Alex

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...