Jump to content

    Recently, I worked with a healthcare customer experiencing excessive logon times (around 50 seconds) that far exceeded the industry norm of under 30 seconds. This article details the troubleshooting process we used to identify and resolve the issue. Although this specific issue will not be common, the troubleshooting methodology can be applied to other logon time issues.

    Starting with Director

    In a prior article (How to Speed Up Logon Times) I mentioned using Director as an invaluable tool for understanding the logon time landscape and narrowing the focus for troubleshooting. In this case, Director's dashboard showed a site-wide logon performance of 55 seconds.

    AD_4nXemnPEDMF6QaXV-C_alPTyEEz1yvvxG3Fp-

     

    Drilling down into the “Trends > Logon Performance” section allowed us to quantify times for the logon sub-phases by hovering over the bar graph:

    AD_4nXc8j7azdQyo99iTq4BhLquDfSRsoJrYubAU

     

    Here, we can see an average of the sub-phases, which are described in detail in this article:

    • Brokering: 0.09s
    • HDX Connection: 5.32s
    • Authentication: 0.04s
    • GPOs: 5.57 seconds
    • Logon Scripts: 1.47s
    • Profile Load: 4.78s
    • Interactive Session: 13.17s

    Often, we can easily target one of these top areas to improve logon times, such as long profile load or GPOs. If so, we focus on leading practices in those respective areas. However, in this particular scenario, each segment is pretty fast. Also, note that Interactive Session consists of multiple phases which overlap with GPOs so we can kind of exclude that 5.57 seconds since it’s double counted. So adding up the individual times doesn’t make much sense because the total (24.87s) is much lower than the 52 or 55-second average logon duration. What’s happening?

    Focusing on the Middle Rather Than the Worst

    Under Trends > Logon Performance, it is tempting to look at the default sort by longest logon to shortest logon and start trying to address the worst offenders. 

    AD_4nXevSgExxvNSiQk-zGzmn4pjLPQ2q1ohloR-

    In my opinion, this isn’t the right approach. We actually want to address the “typical” user and bring the average time down. Then, we can start looking at outliers. To better illustrate this, I ran a Custom Report in Director, exported the logon time duration for a given period, and graphed it in Excel. I prefer to focus on those 80-90% of the sample set and bring their logon time down rather than chase what’s happening on the right. So instead, I would like to filter on a 2-hour sample set in Director and sort by Logon Time and start looking at the data that way.

     

    AD_4nXdx2WPZeOmNXkF1X-ur-WvJ2sk_rSZXKrHw

    Utilizing the New Director Timeline View

    DaaS and CVAD 2311+ have a new logon timeline view under User Details view > Session Logon, which helps visualize what is happening. By selecting some sample users, we determined that there is a 25-second gap between HDX Connection and Profile Load and that Authentication is only a blip in that. We also spot-checked several other users and saw the same behavior. On the other hand, a couple of users did not have the same gap. Now we have something valuable: comparing a “working” vs “non-working” scenario.

    AD_4nXe5Q4pr_0VBhhj6IPo2hxIfFhpXiQkxNT7O

    Testing Some Launches

    With a test user account, we tried to see what was happening. As we launched the app, we could see Workspace App would sit on this ‘Welcome’ message for 20 or so seconds, which gave us some clues.

    AD_4nXe9m4tuB1H52_tSKMCHDrRQrIWOeFkeGszb

    We also tried performing an RDP to the server, and interestingly enough, we saw the server sitting there for around 20 seconds as well.

    AD_4nXf1KcAlf9Qt4b0JJG52X9Z8dQsuAfuWvg4w

     At this time, we think this isn’t an endpoint issue but rather a server-side problem. We can try out a few things:

    • Collect logs and engage Citrix and Microsoft support
    • Build a new clean image and test, and add applications until the delay occurs again

     

    Just as importantly, we ruled out some paths:

    • Don’t need to mess around with profiles because this delay is pre-profile
    • Don’t need to adjust GPOs because those also kick in after
    • Don’t need to change the Workspace app version because this seems to happen with both with Citrix and RDP

    Detailed Logs Needed

    We decided to collect logs for the “working” and “non-working” scenarios to get more granular:

    The best course of action is to work with vendor support to analyze the logs through support cases and get closer to the resolution. Here, we decided to do some DIY investigation and see if we could come up with anything.

    Combing through Logs

    Since this pointed at a VDA-side issue, we explored Windows Performance Recorder to look at the Windows-level process flow. I’m a beginner using this tool, but after setting up WPA and looking at the process timeline, we can zoom into the area between the start and when the published app starts loading. In this case, I can find a common .exe that runs briefly each time (mpnotify.exe) and refine the timeline:

    “Non-working” Session:

    Line # 
2 
3 
4 
5 
6 
7 
8 
Lifetime 
Transient 
Process 
WmiApSrv.exe (9592) 
smss.exe (7884) 
csrss.exe (3516) 
winlogon.exe (4456) 
fontdrøhost.exe (10872) 
LogonUl.exe (7812) 
dwm.exe (8344) 
MultimediaRedirector.exe (2876) 
Duration (s) 
Modern Application Frie... 
Hosted Services 
479.265959900 
51.618783800 
0.071257900 <None» 
79.015530000 
78.973299600 <None» 
78.720149000 
32.547814100 
78.679026800 <None» 
0.083036000 
Command Line 
miApSrv. exe 
e 00000124 000000b4 
SharedSection= WIndows=On SubSystemType=Windows ServerDll=basesn.', 1 ServerD 
winlogon.exe 
"fontdrvhost.exe" 
"LogonUl.exe" /flags.•Ox2 /stateOOxa1 b14855 /state1:Ox41c64e6d 
"dwm.exe" 
MultimediaRedirector.exe -i

    “Working” Session:

    Line # 
2 
3 
4 
5 
6 
7 
8 
9 
10 
Lifetime 
P rocess 
Transient 
svchost.exe (2252) 
svchost.exe (10864) 
smss.exe (10252) 
csrss.exe (4548) 
winlogon.exe (1 1172) 
fontdrvhost.exe (100) 
LogonUl.exe (5624) 
dwm.exe (8644) 
MultimediaRedirector.exe (9924) 
Duration (s) 
Modern Application Erie... 
Hosted Services 
211.648919400 
9.645437200 <None 
20.671688600 <None» 
0.065526000 <None 
33.995332900 <None» 
33.958807000 <None» 
33.728242300 "Nones 
11.269667300 <None» 
33.671361400 <None» 
0.081080600 <None» 
Command Line 
-k LocalSystemNetworkRestricted -s WPDBusEnum 
-k netsvcs -p -s BITS 
00000138 000000b4 
SharedSection= Windows—On SubSystemVpe=Windows ServerDll:: 
winlogon.exe 
"fontdrvhost.exe" 
"LogonUl.exe" /flags:Ox2 /stateO:Oxa la96855 /state1:Ox41c64e6d 
"dwm.exe" 
MultimediaRedirector.exe -i

    Okay, so for some reason, processes like csrss.exe, winlogon.exe, LogonUI.exe, dwm.exe, and CtxGfx.exe are 30 seconds or 40 seconds longer. It doesn’t really point to an issue, but by comparing through WPA, I could also narrow the focus a little bit more.

    Next, I pulled up procmon to try and find anything relevant. Procmon is also a haystack of data, so trial and error on various filters is the only method I know of:

    • Filter out “Buffer Overflow”
    • Exclude processes like procmon.exe and others that don’t seem like offenders.
    • Filter to only file system or registry
    • Narrow the timeline if possible, like only events more mpnotify.exe
    • Try Including only specific processes (maybe winlogon.exe or LogonUI.exe in this case)
    • Search identifying information in the Path or Details

    Resolution

    As luck would have it, I tried out a filter to search the Details field to contain the username and something caught my eye. There was a process that retrieved a registry value with the user’s email. I then broadened that filter to include that entire registry path. We can even use procmon’s Activity Summary to look at the timeline, which shows that lsass.exe is spending 22 seconds longer doing something between non-working and working. 

    AD_4nXfJA8oXO2rjBd8Sbs-PljXfmGc1S-NSBh79

    By now, we’ve really narrowed our attention to what’s happening, which is an issue related to Entra ID authentication. It turns out this customer had hybrid joined their VDA to Entra ID, although this wasn’t necessary for their use case. By unjoining their servers from Entra ID, the logon time delay went away, and all users accessing the application shaved 25 seconds from their logon time!

    Conclusion

    Will this particular issue resolution help you? Likely not. The reasons for long logon times are plentiful; sometimes, they can be addressed by big hitters like Profile load, GPO, logon scripts, or resource constraints. However, an obscure issue will often require a deep dive into the particular environment. I hope walking through this particular case study can be instructive for others as they seek to improve logon times.

    • Like 1

    User Feedback

    Recommended Comments

    GuillaumeH

    Posted

    Thanks for sharing, interesting to follow the troubleshooting steps.



    Create an account or sign in to comment

    You need to be a member in order to leave a comment

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now

×
×
  • Create New...