Jump to content
  • 0

Login (Profile loading) issue


Kaoru Goto1709158517

Question

Hello. I’m hoping that I can get some help for this issue.

 

About 2 months ago, users started having a logon issue when accessing the published desktop. They get a message “The Group Policy Client service failed the logon. Access is denied.” and can not start the ICA session.

 

It is happening to any users across all 17 XenApp servers. It usually happens between 9:30am to 3:3pm. When it’s quiet time, the users do not get the logon issue at all. They can eventually launch the desktop session when they hit a different XenApp server not having the issue (when the issue occurs, it does not affect all the servers at the same time).

 

Environment Info

  • OS: Windows Server 2008 R2 (latest windows patch installed)
  • Citrix: XenApp 6.5 (Hotfix rollup 7 installed)
  • 4 vCPUs and 16GB of memory installed in each XenApp servers.
  • 9 to 11 sessions in each server.
  • CPU, Memory, and Disk usage are not very high.
  • The number of the users have not increased recently.
  • The users have not changed any work patterns recently.
  • The users use a roaming profile configured by GPO.
  • vSphere 6 infrastructure.

 

Here is the troubleshooting I did so far.

  • Event IDs 1511,1509,1508,1502,1500 appear in Application Logs

Example: Event ID 1502

“Windows cannot load the locally stored profile. Possible causes of this error include insufficient security rights or a corrupt local profile.  DETAIL - Insufficient system resources exist to complete the requested service.”

 

I think this would indicate a system resource depreciation (non paged pool and paged pool usage). This is a typical issue with 32 bit OS but XenApp servers are 64 bit.

 

  • When the issue starts happening, we can re-produce the issue by RDP to the servers too. A temporary profile is loaded instead of the roaming profile. So most probably the Citrix components are not the root cause? When using a local user with a local profile (no roaming), I get Event ID 1542 “Windows cannot load classes registry file. DETAIL - Insufficient system resources exist to complete the requested service.”
  • The user profiles were reset. It didn’t resolve the issue.
  • The file server, hosting the user profile has been rebooted. It didn’t resolve the issue.
  • The file server was using Intel NIC. Replaced it with VMXNET3. It didn’t resolve the issue.
  • Based on the article, https://support.citrix.com/article/CTX132648, Default Hive key has been cleaned up. It didn’t resolve the issue.
  • Based on the article, https://support.microsoft.com/en-au/help/2871131/the-size-of-the-hkey-users-default-registry-hive-continuously-increase, installed the hotfix (KB2871131) and compressed DEFAULT hive key. It didn’t resolve the issue. The registry size seems to be OK.
  • Replaced DEFAULT profile folder in case of corruption. It didn’t resolve the issue. If the issue is caused by the corrupt default profile, I would expect the issue occurs every time the users login. Also I don’t expect the corruption happens in every XenApp server.
  • Confirmed that ESXi servers hosting the XenApp servers have no resource contention.
  • While this is for 2003 servers, https://support.microsoft.com/en-us/help/935649/error-message-when-you-try-to-log-on-to-a-windows-server-2003-based-te, I tried it anyway. PoolUsageMaximum was set to 60, then changed to 40. It didn’t resolve the issue.
  • Some users online mentioned setting the following registry keys have fixed the issue. Tired those but did not fix the issue.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management

SessionPoolSize (DWORD) – Decimal (64)

SessionViewSize (DWORD) – Decimal (104)

 

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TermService

LogoffTimeout (DWORD) – Decimal (120)

 

  • I don’t see .bak key created in HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList when the temp profiles were created.
  • Memory was increased from 16GB to 24GB temporarily for testing. It didn’t resolve the issue.
  • Based on “Insufficient system resources exist to complete the requested service” message, I suspected that non paged pool and paged pool depreciation. The paged pool usage is about 2.4GB.  Non paged poo usage is 200MB. This is well below the limit of 64 bit OS.
  • Increased paged pool size in use by running NotMyFault to 5GB. But unable to re-produce the issue. It does not seem that the issue is caused by the excessive use of paged pool.
  • Office Scan is used for antivirus. Changed the setting described in https://success.trendmicro.com/solution/1059055-preventing-performance-issue-on-citrix-or-terminal-server-environment. Enabled “Do not allow users to access the client console from the system tray or Windows Start menu” to reduce the system resource usage in the XenApp server. It didn’t resolve the issue.
  • Based on https://success.trendmicro.com/solution/1101956-rdp-access-to-the-terminal-server-freezes-when-using-a-roaming-profile#, disabled TCP chimney offload in the XenApp server. It didn’t resolve the issue.
  • Completely uninstalled Office Scan in the XenApp server. It didn’t resolve the issue.
  • Checked the GPO changes, nothing was really changed before the issue started happening.
  • In case there is user authentication performance issue in Domain Controllers, checked ATQ threads in DCs. There is no performance issue.
  • Captured the process using process monitor when working and not working. Compared them but could not find any clear difference or information indicating a cause of the issue.
  • Checked User Profile Service Operational log but apart from loading temp profile, no other useful information.
  • When the XenApp server 1 was having the issue, I vMotioned it to another ESXi host where the XenApp server 2 without the issue was hosted. But XenApp server 1 still had the issue after vMotioned. So vMotion or a specific ESXi host is not the root cause.
  • Just before the issue started happening, it seems that Adobe Reader and Adobe Refresh Manager were updated. Found there was a scheduled task created by Adobe Refresh Manager and it is being triggered every time the users log in. Disabled the task, along with other scheduled tasks. It didn’t resolve the issue.
  • Uninstalled recently installed windows patch. It didn’t resolve the issue.
  • Based on https://support.citrix.com/article/CTX138329, set the registry key to increase the Kerberos max token size. It didn’t resolve the issue.

 

Does anyone has any other idea what may be the root cause?

Link to comment

1 answer to this question

Recommended Posts

  • 0

Hello,

i had the same problem with 2 servers in a farm.

w2k8R2 XA6.5 HRP7

The servers are hardware with 16 cores and 256GB RAM. I got the events 1500,1502 and 1508 last week, when the 191. user was loggin on. But i know, that i had half a year before 220 users on the servers.

Additional i got in system log the event 7023 "windows Modules installer" was stopped - not enough ressources as premonition.

Your post was the best summary of what i can do, so i shouldn't test anything else.

The event 7023 was going away, when i increased the paging file from 32GB to 64GB.

The events 1500,1502 and 1508 falling away, after i had compressed the registry.

I had compressed not only the printers-key and default-profile. I had compressed the whole registry with Eusing "Free Registry Defrag".

The software-key was 200MB  before, and 130MB after compression.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...