Very slow boot times

Dennis van der Velde · December 13, 2022

Hello all,

We're encountering the following,

Since last tuesday somewhere in the afternoon all of our devices suddenly started booting very slow, then during the night the problems slowly fade away and in the morning boot times are normal again (2-3 min) only to further slow down again the later we get in the day.

We're running 2203 CU1 on server 2016 servers (4 pvs on 2 different datacenters) and our clients are Windows 10 20H2

All citrix components have been restarted (some had their latest security patches reverted), AV has been disabled, Older disk(s) have been tested.

All our other teams have performed scans and performance checks and all our data has been shared with both Citrix and our network vendor and nothing weird has been found yet.

We and (support) are slowly running out of ideas, hopefully anyone here has any insights.

Kyle Stewart · January 9, 2023

Hi folks, we also encountered this exact issue and nailed it down to Windows Defender Cache Maintenance. We cut a new vdisk gold image for our non-persistent environment and included the following in our MDT deployment task which resolved the issue.

SCHTASKS /Change /TN "\Microsoft\Windows\Windows Defender\Windows Defender Cache Maintenance" /Disable

This task was firing up on each target device and consuming 1-3MB/sec read as it lazily traversed the C:\ drive. Even with 10Gb end-to-end and oversized RAM for PVS cache, the Streaming Service could not keep up with >200 PVS Win10 target devices per server and anything above that caused boot times to increase anywhere from 5 to 30min in some cases. After the change we are back to ~600 target devices per PVS host with boot times ~28sec. Added screenshots showing timing of Scheduled Task Start and Target Device network traffic spike alignment.

Ken Z · December 16, 2022

Dennis

The retries are sometimes a symptom of tcp offloading.

Have you disabled tcp offloading for both the VDI and PvS Servers?

I normally do this both in the network interface on the VDI gold image and PvS Server on the streaming interface, and also via a registry key (belt and braces approach)

see this article - https://discussions.citrix.com/topic/414141-provisioning-service-tcp-offload/

Additional reasons for excess retries. (not exhaustive) ...

Problems if you've teamed/bonded the NICs on the XenServer or PvS Servers

Other network switch issues - check logs on switches

Not enough RAM on the PvS Servers to cache the vdisk reads, or too many different vdisks being used simultaneously for the amount of RAM. fix is to increase RAM on PvS server. I knew a customer that was running the PvS Servers with 4GB RAM and had 4 different vdisks being streamed to four different machine catalogs and they were using old SCSI disks for storage

You're using the PvS Server for the write cache and it's too slow.

Faulty network cable - check all network cables/ports being used by the hypervisors/PvS Servers

Anti-virus - Make sure the vdisks are excluded from AV scanning on the PvS Servers, and you've added all the recommended process exclusions in the VDI gold image (try disabling the AV temporarily on both the VDI images and the PvS Server to see if that's to blame)

Regards

Ken Z

Ruud Jacobs · January 9, 2023

Hi Dennis, We have the exact same problem as you. In fact .... with us this also happened exactly at the same time. So there is a second call at Citrix regarding this issue. Just had a session with citrix and they went through the article below with us. Citrix has also indicated that they have an issue with Citrix PVS and Delivery Controller with a certain Windows Defender version (possibly going live on December 6, 2022?). We see a lot of read actions van PVS to the target.

https://support.citrix.com/article/CTX475144/checking-pvs-targets-for-processes-with-high-disk-reads

PS: I have sent you a LinkedIn request to possibly discuss this with you.

Ken Z · December 15, 2022

Dennis

so the problem is *only* booting, not issues logging onto a VDI after booting?

There are a lot of things that you haven't mentioned, such as,

Are the NICs that stream from the PvS to the VDI VM on a separate VLAN to the NIC that the users connect via? (i.e. do the VDIs have 2 NICs or 1?)

Have you monitored the throughput of the PvS NIC? what's the data transfer rate shown on the PvS NIC in the morning compared to the afternoon/evening?

What's the hypervisor used? XenServer?

if XenServer, have you installed/configured the PvS Accelerator? Have you increased Control Domain RAM? To what value?

If VMware, are the NICs VMXNET3 or E1000? are the VMware tools up to date?

Have you monitored the disk I/O (e.g. Avg Disk Queue length) of the disk hosting the PvS Store?

Are the VDIs that are slow at booting, streaming from a particular PvS server, or slow from all of them?

What retries (if any) are you seeing in the PvS stream? (you can view this from within the PvS Console). Can you compare retires in the morning vs Afternoon vs. Evening?

What happens if you roll back to a previous PvS image? does the same problem occur?

Have you enabled Verbose mode on the PvS boot sequence? Are there any clues to the slow boot on the console of the VDI as it's booting?

Regards

Ken Z

Dennis van der Velde · December 15, 2022

Hello Ken,

Yes only booting is affected, once the machine is there, users are able to work without issues.

We've just made some logging of this and those numbers are being checked atm.

This is Xenserver, with Pvs accelerator and dom0 is set to 24GB.

The average disk queue length was a bit high but nothing extreme.

Yes, the retries for all the machines are very high.

We've rolled back to both our october and september image and the issue persists.

Yes this is enabled but shows no clues.

Thanks for your reply!

Dennis van der Velde · December 16, 2022

Ken,

tcp offloading is disabled on both sides.

We've just completed tests with a PVS machine hosted within xenserver itself, so we could cut out the storage and most of the network components and the issue persists.

Ram usage on these servers is low compared to what they have available (32GB and 1 disk)

The network cables is a good point, i'll make sure to get that checked out.

AV has been disabled and the exclusions have been double checked and all is good there.

Thanks again for taking your time on our problem! Have a good weekend.

Carl Fallis · December 22, 2022

You should take a look at the following CTX https://support.citrix.com/article/CTX476110/low-pvs-boot-throughput

Dennis van der Velde · January 3, 2023

Dennis van der Velde · January 3, 2023

On 12/22/2022 at 7:50 PM, Carl Fallis said:

You should take a look at the following CTX https://support.citrix.com/article/CTX476110/low-pvs-boot-throughput

Hello Carl,

We've removed this value but see no improvements yet.

Ruud Jacobs · January 10, 2023

After putting the command SCHTASKS /Change /TN "\Microsoft\Windows\Windows Defender\Windows Defender Cache Maintenance" /Disable in a GPO and rebooting the VDI's our issue was gone. The were no more spikes within vmware and netapp. ?

Dennis van der Velde · January 11, 2023

On 1/9/2023 at 8:22 PM, Kyle Stewart said:

Hi folks, we also encountered this exact issue and nailed it down to Windows Defender Cache Maintenance. We cut a new vdisk gold image for our non-persistent environment and included the following in our MDT deployment task which resolved the issue.

SCHTASKS /Change /TN "\Microsoft\Windows\Windows Defender\Windows Defender Cache Maintenance" /Disable

This task was firing up on each target device and consuming 1-3MB/sec read as it lazily traversed the C:\ drive. Even with 10Gb end-to-end and oversized RAM for PVS cache, the Streaming Service could not keep up with >200 PVS Win10 target devices per server and anything above that caused boot times to increase anywhere from 5 to 30min in some cases. After the change we are back to ~600 target devices per PVS host with boot times ~28sec. Added screenshots showing timing of Scheduled Task Start and Target Device network traffic spike alignment.

Hello Kyle, thanks to this scheduled task we're also starting to see positive results! Thanks so much!

James Garrison · February 15, 2023

Appears this issues workaround may no longer work as of 02/14/2023. Anyone else seeing this behavior again?

Dennis van der Velde · February 16, 2023

23 hours ago, James Garrison said:

Appears this issues workaround may no longer work as of 02/14/2023. Anyone else seeing this behavior again?

We've just started with a new build yesterday, thanks for the heads up and we will test this soon.

Dennis van der Velde · February 20, 2023

On 2/15/2023 at 10:31 AM, James Garrison said:

Appears this issues workaround may no longer work as of 02/14/2023. Anyone else seeing this behavior again?

With our newest build we have not seen issues yet, we have seen that there is some timing involved in disabling this task, have you found anything yet?

Kyle Stewart · February 20, 2023

13 hours ago, Dennis van der Velde said:

With our newest build we have not seen issues yet, we have seen that there is some timing involved in disabling this task, have you found anything yet?

Hi Dennis,

Can you share your observations on timing? Also, are you managing Microsoft Defender updates via policy and if so, what source & method? MMPC, WSUS, SMB share, no auto-updates?

Dennis van der Velde · February 21, 2023

14 hours ago, Kyle Stewart said:

Hi Dennis,

Can you share your observations on timing? Also, are you managing Microsoft Defender updates via policy and if so, what source & method? MMPC, WSUS, SMB share, no auto-updates?

No auto updates for Defender, we use sccm/wsus.

We now wait till the task gets registered before we seal the disk, if we don't do this we saw that when users logged on, the task in some cases still registered and then ran, but this might be something that is environment specific.

Dennis van der Velde · May 15, 2023

Hello all,

We've just seen that defender 4.18.2304.8 once again enables this task, this can happen at any time and is causing some trouble for us again.

Very slow boot times

Question

Link to comment

17 answers to this question

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Create an account or sign in to comment

Create an account

Sign in