Jump to content
Welcome to our new Citrix community!
  • 0

Very slow boot times


Dennis van der Velde

Question

Hello all,

 

We're encountering the following,

 

Since last tuesday somewhere in the afternoon all of our devices suddenly started booting very slow, then during the night the problems slowly fade away and in the morning boot times are normal again (2-3 min) only to further slow down again the later we get in the day.

We're running 2203 CU1 on server 2016 servers (4 pvs on 2 different datacenters) and our clients are Windows 10 20H2

 

All citrix components have been restarted (some had their latest security patches reverted), AV has been disabled, Older disk(s) have been tested.

All our other teams have performed scans and performance checks and all our data has been shared with both Citrix and our network vendor and nothing weird has been found yet.

 

We and (support) are slowly running out of ideas, hopefully anyone here has any insights.

 

 

 

 

Link to comment

17 answers to this question

Recommended Posts

  • 1

Hi folks, we also encountered this exact issue and nailed it down to Windows Defender Cache Maintenance.  We cut a new vdisk gold image for our non-persistent environment and included the following in our MDT deployment task which resolved the issue.

 

SCHTASKS /Change /TN "\Microsoft\Windows\Windows Defender\Windows Defender Cache Maintenance" /Disable

 

This task was firing up on each target device and consuming 1-3MB/sec read as it lazily traversed the C:\ drive.  Even with 10Gb end-to-end and oversized RAM for PVS cache, the Streaming Service could not keep up with >200 PVS Win10 target devices per server and anything above that caused boot times to increase anywhere from 5 to 30min in some cases.  After the change we are back to ~600 target devices per PVS host with boot times ~28sec.  Added screenshots showing timing of Scheduled Task Start and Target Device network traffic spike alignment.

TargetDeviceNetwork.png

TargetDeviceTaskLog.png

  • Like 2
Link to comment
  • 1

Dennis

 

The retries are sometimes  a symptom of tcp offloading.

Have you disabled tcp offloading for both the VDI and PvS Servers?

 

I normally do this both in the network interface on the VDI gold image and PvS Server on the streaming interface, and also via a registry key (belt and braces approach)

see this article - https://discussions.citrix.com/topic/414141-provisioning-service-tcp-offload/

 

Additional reasons for excess retries. (not exhaustive) ...

 

Problems if you've teamed/bonded the NICs on the XenServer or PvS Servers

Other network switch issues - check logs on switches

Not enough RAM on the PvS Servers to cache the vdisk reads, or too many different vdisks being used simultaneously for the amount of RAM. fix is to increase RAM on PvS server. I knew a customer that was running the PvS Servers with 4GB RAM and had 4 different vdisks being streamed to four different machine catalogs and they were using old SCSI disks for storage

You're using the PvS Server for the write cache and it's too slow. 

Faulty network cable - check all network cables/ports being used by the hypervisors/PvS Servers

Anti-virus - Make sure the vdisks are excluded from AV scanning on the PvS Servers, and you've added all the recommended process exclusions in the VDI gold image (try disabling the AV temporarily on both the VDI images and the PvS Server to see if that's to blame)

 

Regards

 

Ken Z

  • Like 1
Link to comment
  • 1

Hi Dennis, We have the exact same problem as you. In fact .... with us this also happened exactly at the same time. So there is a second call at Citrix regarding this issue. Just had a session with citrix and they went through the article below with us. Citrix has also indicated that they have an issue with Citrix PVS and Delivery Controller with a certain Windows Defender version (possibly going live on December 6, 2022?). We see a lot of read actions van PVS to the target. 

 

https://support.citrix.com/article/CTX475144/checking-pvs-targets-for-processes-with-high-disk-reads

 

PS: I have sent you a LinkedIn request to possibly discuss this with you.

  • Like 1
Link to comment
  • 0

Dennis

 

so the problem is *only* booting, not issues logging onto a VDI after booting?

There are a lot of things that you haven't mentioned, such as, 

 

Are the NICs that stream from the PvS to the VDI VM on a separate VLAN to the NIC that the users connect via? (i.e. do the VDIs have 2 NICs or 1?)

Have you monitored the throughput of the PvS NIC? what's the data transfer rate shown on the PvS NIC in the morning compared to the afternoon/evening?

What's the hypervisor used? XenServer? 

if XenServer, have you installed/configured the PvS Accelerator? Have you increased  Control Domain RAM? To what value?

If VMware, are the NICs VMXNET3 or E1000? are the VMware tools up to date?

Have you monitored the disk I/O (e.g. Avg Disk Queue length) of the disk hosting the PvS Store?

Are the VDIs that are slow at booting, streaming from a particular PvS server, or slow from all of them?

What retries (if any) are you seeing in the PvS stream? (you can view this from within the PvS Console). Can you compare retires in the morning vs Afternoon vs. Evening?

What happens if you roll back to a previous PvS image? does the same problem occur?

Have you enabled Verbose mode on the PvS boot sequence? Are there any clues to the slow boot on the console of the VDI as it's booting?

 

Regards

 

Ken Z

 

Link to comment
  • 0

Hello Ken,

 

Yes only booting is affected, once the machine is there, users are able to work without issues.

We've just made some logging of this and those numbers are being checked atm.

This is Xenserver, with Pvs accelerator and dom0 is set to 24GB.

The average disk queue length was a bit high but nothing extreme.

Yes, the retries for all the machines are very high.

We've rolled back to both our october and september image and the issue persists.

Yes this is enabled but shows no clues.

 

Thanks for your reply!

Link to comment
  • 0

Ken,

 

tcp offloading is disabled on both sides.

 

We've just completed tests with a PVS machine hosted within xenserver itself, so we could cut out the storage and most of the network components and the issue persists.

Ram usage on these servers is low compared to what they have available (32GB and 1 disk)

The network cables is a good point, i'll make sure to get that checked out.

AV has been disabled and the exclusions have been double checked and all is good there.

 

Thanks again for taking your time on our problem! Have a good weekend.

 

Link to comment
  • 0
On 1/9/2023 at 8:22 PM, Kyle Stewart said:

Hi folks, we also encountered this exact issue and nailed it down to Windows Defender Cache Maintenance.  We cut a new vdisk gold image for our non-persistent environment and included the following in our MDT deployment task which resolved the issue.

 

SCHTASKS /Change /TN "\Microsoft\Windows\Windows Defender\Windows Defender Cache Maintenance" /Disable

 

This task was firing up on each target device and consuming 1-3MB/sec read as it lazily traversed the C:\ drive.  Even with 10Gb end-to-end and oversized RAM for PVS cache, the Streaming Service could not keep up with >200 PVS Win10 target devices per server and anything above that caused boot times to increase anywhere from 5 to 30min in some cases.  After the change we are back to ~600 target devices per PVS host with boot times ~28sec.  Added screenshots showing timing of Scheduled Task Start and Target Device network traffic spike alignment.

TargetDeviceNetwork.png

TargetDeviceTaskLog.png

 

Hello Kyle, thanks to this scheduled task we're also starting to see positive results! Thanks so much!

Link to comment
  • 0
13 hours ago, Dennis van der Velde said:

 

With our newest build we have not seen issues yet, we have seen that there is some timing involved in disabling this task, have you found anything yet?

 Hi Dennis,

 

Can you share your observations on timing?  Also, are you managing Microsoft Defender updates via policy and if so, what source & method?  MMPC, WSUS, SMB share, no auto-updates?

Link to comment
  • 0
14 hours ago, Kyle Stewart said:

 Hi Dennis,

 

Can you share your observations on timing?  Also, are you managing Microsoft Defender updates via policy and if so, what source & method?  MMPC, WSUS, SMB share, no auto-updates?

 

No auto updates for Defender, we use sccm/wsus.

We now wait till the task gets registered before we seal the disk, if we don't do this we saw that when users logged on, the task in some cases still registered and then ran, but this might be something that is environment specific.

 

 

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...