Jump to content
  • 1

Long logon times, especially with Elastic layers


Nick Panaccio

Question

We're looking at logon times (VDI launch to usable desktop) at around 1 minute when Elastic layering is disabled, which I'm a little concerned about. However, when we enable Elastic layering (Application only) in the image, logon times increase by an additional minute - and this is with no applications actually even being assigned elastically. Here's a snippet from the relevant log from some testing that I did yesterday:

 

2018-02-05 11:57:40,571 INFO 5 LayeringWindowsService: *** Citrix Layering Service has started as a service. build 4.7.0.13796 ***
2018-02-05 11:57:40,930 INFO 5 Program: *** Citrix Layering Service has started as a service. build 4.7.0.13796 ***
2018-02-05 11:57:46,539 INFO 5 ScheduledEventsInstaller: registering Scheduled Events
2018-02-05 11:57:48,164 INFO 5 HandlerInstaller: Installing Command/Query Handlers
2018-02-05 11:57:50,164 INFO 5 MachineIdentityService: Reading Elastic Layer Mode...
2018-02-05 11:57:50,320 INFO 5 MachineIdentityService: Elastic Layering Mode = Session
2018-02-05 11:57:50,320 INFO 5 MachineIdentitySelector: Selected 'SessionDynamicLayeringService' for 'IDynamicLayeringService' from 2 options
2018-02-05 11:57:50,336 INFO 5 MachineIdentitySelector: Selected 'SessionDynamicLayeringService' for 'IDynamicLayeringService' from 2 options
2018-02-05 11:58:03,007 INFO 5 MachineIdentitySelector: Selected 'SessionDynamicLayeringService' for 'IDynamicLayeringService' from 2 options
2018-02-05 11:58:03,007 INFO 5 MachineIdentitySelector: Selected 'SessionDynamicLayeringService' for 'IDynamicLayeringService' from 2 options
2018-02-05 11:58:03,007 INFO 5 MachineIdentitySelector: Selected 'SessionDynamicLayeringService' for 'IDynamicLayeringService' from 2 options
2018-02-05 11:58:03,429 INFO 5 SessionDynamicLayeringService: Created SessionDynamicLayeringService
2018-02-05 11:58:03,445 INFO 5 MachineIdentitySelector: Selected 'SessionDynamicLayeringService' for 'IDynamicLayeringService' from 2 options
2018-02-05 11:58:03,445 INFO 5 ServiceLifecycleManager: Creating start up task...
2018-02-05 11:58:03,445 INFO 5 ServiceLifecycleManager: Start up task is running...
2018-02-05 11:58:03,445 INFO 5 ServiceLifecycleManager: Attempting to request the required registry and volume privileges
2018-02-05 11:58:03,523 INFO 5 DynamicLayeringService: Cleared the configured volumes
2018-02-05 11:58:03,539 INFO 5 SessionEventRegistrar: Registering for logon/logoff events.
2018-02-05 11:58:03,554 INFO 5 SessionEventRegistrar: Registered for logon/logoff events successfully.
2018-02-05 11:58:03,586 INFO 5 LayerConfigService: Starting volume checker...
2018-02-05 11:58:03,586 INFO 5 ActiveDirectoryInfoService: Starting AD Machine group polling
2018-02-05 11:58:07,992 INFO 5 ActiveDirectoryInfoService: Machine Group SID(s): Domain\Domain Computers, S-1-5-32-574
2018-02-05 12:02:23,500 INFO 11 RequestService: Starting request 1 on behalf of Domain\MyUser
2018-02-05 12:02:23,500 INFO 11 SessionEventHandler: Received logon event
2018-02-05 12:02:23,516 INFO 11 SessionEventHandler: logging in user Domain\MyUser (S-1-5-21-X)
2018-02-05 12:02:23,672 INFO 11 SessionEventHandler: A domain user is logging in
2018-02-05 12:02:23,672 INFO 11 DynamicLayeringService: Logging in user Domain\MyUser (S-1-5-21-X)
2018-02-05 12:02:23,688 INFO 11 WindowsId: Impersonating Domain\MyUser
2018-02-05 12:02:24,594 INFO 11 WindowsId: Finished impersonating Domain\MyUser
2018-02-05 12:02:24,610 INFO 11 DynamicLayeringService: Cleared the restricted user list volumes for sid (S-1-5-21-X)
2018-02-05 12:02:24,610 INFO 11 SessionDynamicLayeringService: Running on a session host thus there will be no user layer
2018-02-05 12:02:24,610 INFO 11 SessionDynamicLayeringService: Running on a session host thus the drivers are already running
2018-02-05 12:02:24,625 INFO 11 DynamicLayeringService: Layering Complete
---------== Layering Results ==-----------
User: Domain\MyUser (S-1-5-21-X), Machine: CTXSERVER01, ULayer: 4.7.0.13796
Assigned Layers:
------------------------------------------
2018-02-05 12:02:24,641 INFO 11 DynamicLayeringService: User Domain\MyUser logged in successfully.  No elastic layers were attached.
2018-02-05 12:02:24,672 INFO 11 SessionEventHandler: Finished processing logon event
2018-02-05 12:02:24,672 INFO 11 RequestService: Ending request 1 started on behalf of Domain\MyUser
2018-02-05 12:03:50,967 INFO 13 RequestService: Starting request 2 on behalf of Domain\MyUser
2018-02-05 12:03:50,983 INFO 13 HandlerHelper: Processing Query LogonStatusQuery->LogonStatusResult
2018-02-05 12:03:50,983 INFO 13 HandlerHelper: Finished Query LogonStatusQuery->LogonStatusResult
2018-02-05 12:03:50,999 INFO 13 RequestService: Ending request 2 started on behalf of Domain\MyUser

As you can see, there's a gap of roughly 1:30 in between 'request 1' ending and 'request 2' (which isn't really detailed) starting. No layers were assigned during this test, but all of my logs show similar behavior, with the long gap happening between those two lines.

 

Any ideas? Our current Windows 7 VDI (non-layered) in Prod has a logon time of about 20 seconds, and while I expect W10 to be longer, even the current 50-55 seconds is going to be a hard sell for our users.

Link to comment

25 answers to this question

Recommended Posts

  • 1

What other filter drivers are present in your system?  Running "fltmc" in a CMD As Administrator will show you.  If you have Elastic App Layers enabled, but not User Layers and you have no actual Elastic Assignments, then everything is strictly local to the booting machine.  Here's how it breaks down.

 

The boot disk contains two partitions, partition 1 is the boot partition which contains your whole composed C: drive.  All the layers are combined there, and it contains the main registry hives.  Partition 2 is the writable partition.  It's where every file creation, modification and deletion go.  It also contains an additional hive which we call RSD (Registry Splitter Driver), in which we store all the registry modifications as well.  Again, these are two partitions on the same virtual disk.

 

When the user logs in, we're virtualizing filesystem reads and writes using a Windows minifilter driver, and we decide where a particular file-open has to happen.  There are a few possibilities in this space:

 

1. Create new file: file created on writable partition

2. Open existing file for read: if file is on writable partition, open it there.  Otherwise open on boot partition.

3. Delete file: place "delete token" on writable partition indicating the file is deleted, overwriting the file on writable partition

4. Open existing file for read/write: if the file is on writable partition, open it there.  Otherwise, copy file from boot to writable partition first, then opn it on writable partition.

 

The only one there that's possibly slow is 4, where we have to make sure the entire file is on the writable partition before you're allowed to open it to modify it.  But unless your login process is routinely modifying enormous files during login, I doubt that's what's going on.

 

When the user logs in, we're also virtualizing registry reads and writes.  If you run Regedit, you'll see the RSD hive is mounted.  What our unirsd driver does is intercept each and every registry operation, and apply basically the same logic as we have for filesystems.  Every read and write has to check both the main registry hives and the RSD to determine who receives the operation.  I suspect this is where the slowness is, because the volume of registry operations during a login absolutely dwarfs the volume of file-open calls.  So either something in your system is generating a much larger registry volume than other customers, or something in your system is slowing down those registry operations.  Like if you've got an AV which is also intercepting all registry operations, maybe together they and unirsd are creating the huge delay.

 

Why am I telling you this?  Maybe it will suggest to you additional variables you can test in your specific environment - if you know of security software which might be suspicious, or if fltmc suggests somebody else might be involved at the filesystem side.  Start by eliminating anything you can think of that might be contributing, and we'll see what we can learn from it.

  • Like 1
Link to comment
  • 1

Same here, uninstalling UPM didn't change anything for me. For the sake of clarity, this is what I've got installed in my current image for Elastic testing:

 

OS Layer:

Windows 10 1703 x64

  • .NET 3.5 [Enabled in Windows Features]
  • .NET 4.7.1
  • BGInfo 4.21.00
  • CMTrace
  • Fujitsu 9.21.1202
  • Microsoft Silverlight 5.1.50907.0
  • OrgPublisher 11.0.3256.1
  • Adobe Shockwave Player 12.3.1.201
  • Citrix Optimizer:
    • Citrix_Windows10_1703.xml
    • App_Layering_Supplement.xml

Platform Layer:

  • Citrix Provisioning Services 7.17
  • Citrix Virtual Delivery Agent 7.17 LTSR
    • Citrix Personalization for App-V – VDA

    • Citrix User Profile Manager

    • Citrix User Profile Manager WMI Plugin

  • Citrix Receiver 4.9.1000 LTSR

  • VMware OS Optimization Tool:

    • Custom template based on Login VSI's #VDILIKEAPRO Windows 10 template

    • This optimization script doesn't affect this delay, and started before I even incorporated the VMware OSOT tool into the mix

  • Like 1
Link to comment
  • 0

I believe the logon was complete at this line

2018-02-05 12:02:24,672 INFO 11 SessionEventHandler: Finished processing logon event

 

Windows 10 does take substantially longer to logon than Windows 7.

 

Maybe Gunther can jump in and let us know whats happening when it says

2018-02-05 12:03:50,967 INFO 13 RequestService: Starting request 2 on behalf of Domain\MyUser.  If nto we will try to find out from the developers.

Rob

Link to comment
  • 0

Had to ask engineering.  Rob is correct.  We return control to the login process at the end of Request 1.  Request 2 is our system tray application checking in, and that app is started by explorer when you get your desktop showing.  So request 2 starting is actually an indicator that the entire login process is complete.  Our part finished at the end of request 1, and we handed control back to UserInit, so something else was happening for those 85 seconds.  Our registry virtualization might be making it slower, but we're not actually doing any active processes during those 85 seconds.

Link to comment
  • 0

Hi Nick,

 

Did you end up getting to the bottom of this issue as I'm also seeing long log in times with a Win 10 box with elastic layering on?

 

Just spun up a Win 10 Ent 1709 on ELM 4.10 to test it out and see exactly the same problem, I also have user layers enabled to test this feature so this is the only elastic layer being attached.

 

Currently the image is just OS, 7.15.1000 VDA, Citrix quality indicator and WEM 4.6 client, no other apps installed.

 

Seeing the same pause between end of request 1 and start of request 2, about a minute of nothing.

 

Director is showing a 111sec log in time and I feel every second of it, it's painful! I can see it's broken down into 53 secs being profile load and interactive session 49 secs are the two that stand out.

 

Can see the user layer disk on the ELM storage share and have made some changes to test and it appears to be working.

 

For some perspective we have log in times on our Server 2016 shared session boxes down to 20 secs according to ControlUp insights using UPM and redirected folders, no elastic layers though.

 

All of this is on top of Nutanix kit, so we have good performance.

 

So it's a night and day difference, wondering if anyone else has long log in times or can set an expectation for me on what we could/should be seeing with the above config.

 

Thanks

 

 

Link to comment
  • 0

Unfortunately, no, we never got to the bottom of this. I created a layer that ran Process Monitor at boot time to capture everything, but ProcMon consumed so much CPU time that it made narrowing down the issue completely impossible.

 

I can replicate this with no actual layers published elastically, and just having elastic layering enabled in the image template (user or app, doesn't matter).

Link to comment
  • 0

I've been testing elastic layering (Application only) with literally no App Layers at all - OS [W10 1703 x64] and Platform [VDA/PVS/UPM 7.17] only - and still see the long delay (adding 30 seconds to 1 minute to the logon process). Disabling UPM doesn't change anything, either (that was a shot in the dark). As far as fltmc goes, this is the output for this specific vDisk:

 

Filter Name                     Num Instances    Altitude    Frame
------------------------------  -------------  ------------  -----
storqosflt                              0       244000         0
wcifs                                   2       189900         0
upmjit                                  2       182400.1       0
FileCrypt                               0       141100         0
unifltr                                 2       136600         0
luafv                                   1       135000         0
npsvctrig                               1        46000         0
CFsDep2                                 3        42001         0
Wof                                     3        40700         0
FileInfo                                4        40500         0

 

Link to comment
  • 0

Spent some more time on this after having a look at gunthera's very informative post.

 

Disabled Windows 10 Defender completely in the image and tried again, same result unfortunately, a 1 min delay between end of request 1 and the start of request 2

 

The filter drivers running are 

Filter Name                     Num Instances    Altitude    Frame

------------------------------  -------------  ------------  -----

FsDepends                               5       407000         0

storqosflt                              0       244000         0

wcifs                                   1       189900         0

upmjit                                  2       182400.1       0

CldFlt                                  0       180451         0

FileCrypt                               0       141100         0

unifltr                                 2       136600         0

luafv                                   1       135000         0

npsvctrig                               1        46000         0

Wof                                     4        40700         0

FileInfo                                5        40500         0

Will continue looking at the issue and report back.

Link to comment
  • 0

Well, upmjit is the UPM filter, so it's still installed for both of you.  I'm not aware of any problems between EL and UPM, but it can't hurt to make a new platform layer and make sure the optional UPM component is not installed with the rest of the VDA.  Everything else is Windows or us, even CldFlt which I had not seen before today.

Link to comment
  • 0

Thanks for the suggestion, have tried that a new platform layer without the UPM client and no change to login time. Had a poke around the net and Cldflt appears to be a Windows Cloud Files Filter Driver.

 

Interesting thing is the time between request 1 and 2 seems to be 1 min 1 sec each time. Seems to be more than a coincidence.

Link to comment
  • 0

None of that looks obviously problematic.  Are your OS Layer optimizations from the original Gold, or did you add them after import, as a new version of the OS Layer?  If you have a version of the OS layer from before those optimizations were run, it would be worth publishing with that, to test whether this is a self-inflicted wound where our own optimizations are causing problems.  Hopefully the VMware optimizations really don't apply.

 

One other thing you can do is try uninstalling software from your published image.  Like maybe CMTrace hooks into something that causes a huge delay.  Uninstall that in the published image and try again, just to see if it matters.  We really don't know where this is coming from, so I'd want you to methodically try pushing every button you can find in the system.

Link to comment
  • 0

Citrix Optimizer was run in one of the OS Layer versions. I just tested another image with the base OS and Platform layers, which did not include any Citrix or VMware Optimizer, and only included the base Citrix PVS/VDA/UPM installs, and Receiver 4.9 LTSR - same results. That's basically just a vanilla OS with the Store apps removed, and the Citrix components.

Link to comment
  • 0

I'm seeing the same as Nick, stripped back the image to OS layer, Win 10 Ent 1709, with Citrix optimizer applied with app layering template as well, platform layer is just 7.15 LTSR CU1 loaded, without UPM client, Receiver version is 12.9.0 for Mac.

Edited by Matthew PETTIGROVE
Clarified upm client wasn't loaded
Link to comment
  • 0

I don't know why I never checked on this before, but I just noticed that on my Elastic image (same image described 2 posts up) the Virtual Disk Status is showing the cache mode as Cache on server, even though it's configured as Cache in device RAM with overflow on hard disk (1GB) on the vDisk in PVS. Is this normal/expected?

 

My Personality.ini file appears to confirm the Cache in RAM w/ overflow:

[StringData]
WriteCacheSize=6
$DiskName=ElasticTest_2018-04-30_08-54-10.vhd
$WriteCacheType=9
[ArdenceData]
_EnablePrinterSettings=0
_DiskMode=S

Link to comment
  • 0

The PVS console shows cache set to Device RAM w/ overflow, but the target itself shows server. Looking at the VDI, my 6GB vCache drive is filled thanks to dedicateddumpfile.sys. I'm going to edit the vDisk in Private mode to disable the creation of that to free up the space, which I now believe is causing the cache mode to change. This is my QA environment, so for Prod I was planning on using 15GB vCache drives, since this is where our 2GB pagefile is placed. I'll update this thread once I'm done testing.

Link to comment
  • 0

Anything new on this case?

Seeing same issue here:

Win10 1709 / Applayering 4.11 / XenDesktop (no UPM) and PVS 7.15LTSR 

 

Without elastic layering enabled i can get logon times between 20-25 seconds. With elastic layering enabled but no layer assign to a user the logon time increases up to 40-50 seconds.

Link to comment
  • 0

There's actually not a lot we can do.  If you look through your own C:\ProgramData\Unidesk\Logs\ULayerSvc.log, watch the time we spend actually handling the login and disk attachment.  From the log snippet that started this thread, for instance:

 

2018-02-05 12:02:23,500 INFO 11 RequestService: Starting request 1 on behalf of Domain\MyUser
2018-02-05 12:02:23,500 INFO 11 SessionEventHandler: Received logon event

...

2018-02-05 12:02:24,672 INFO 11 SessionEventHandler: Finished processing logon event
2018-02-05 12:02:24,672 INFO 11 RequestService: Ending request 1 started on behalf of Domain\MyUser

 

Between starting and ending the request, a little over a second was spent as we processed the login, read the files from the share, attached any assigned layers (there were none in this example), and activated our drivers.  When "Ending request 1" happens, we hand control back to the login process, and it goes about its business doing whatever else is attached to logins.  From then on, we're not directly responsible for the time it takes to login.  It's something else in the image that just really doesn't like our filesystem or registry virtualization.

 

All we're really able to do is try to pare down your environment to see if we can isolate whatever it is that hates our drivers.  Remove any unnecessary layers first.  Then consider publishing to the hypervisor without the platform layer, and manually join the domain into an OU that has no policies and inheritance blocked.  Elastic Layering is only enabled when a domain user logs in, so you can't test completely off the domain.  And then take a hard look at any installed software in the OS layer itself.  I'd tell you to take a look at policies in the OS layer, but it's nearly impossible to get a clean list of everything, because there are so many places to hide them.

 

But that's the kind of thing we'd be looking for: see if we can find anything else that is also required to make the login slower.  And having done that, well, it's not clear what we can do about it.  But at least we'd know what specifically is running materially slower in the presence of our virtualization.  Operations, including logins, will always be at least a little slower with Elastic Layers.  And logins happen to be a remarkably registry-intensive operations.  In the long run, we're investigating architectural changes that might help incrementally, but those aren't coming quickly.

Link to comment
  • 0

Hi,

 

Did anyone find a solution to this problem? I'm seeing exactly the same problem in our Windows 10 Pro login.

 

The login process in overall takes about 1min 20s (requiring that the VM is up), of which some 20s-30s is spent waiting for the windows exporer icon to appear in the task bar. If a machine needs to be started from scratch, it'll take over 3 minutes to the point when user can actually start working.

 

This is what is seen in the log:

2019-02-01 11:12:31,489 INFO 8 RequestService: Ending request 1 started on behalf of Domain\user

2019-02-01 11:13:01,067 INFO 9 RequestService: Starting request 2 on behalf of Domain\user
 

So there is the 30s gap in between.

 

BR,

Mika

 

 

 

 

 

 

Link to comment
  • 0

Same issue here:  Windows 10 1809, VDA 1811, ELM 1902, PVS 1811, non-persistent images;  Elastic layering on.

 

019-04-15 11:27:51,184 INFO 6 SessionEventHandler: Finished processing logon event
2019-04-15 11:27:51,184 INFO 6 RequestService: Ending request 1 started on behalf of DOMAIN\USERNAME
2019-04-15 11:28:23,824 INFO 8 RequestService: Starting request 2 on behalf of DOMAIN\USERNAME
2019-04-15 11:28:23,840 INFO 8 HandlerHelper: Processing Query LogonStatusQuery->LogonStatusResult
2019-04-15 11:28:23,871 INFO 8 LogonInfoService: User DOMAIN\USERNAME is not a local user

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...