Jump to content
Updated Privacy Statement
  • 0

WEM 1808 agent not launching consistently at logon


Nick Panaccio

Question

I've got a sporadic issue with the WEM 1808 agent in our Windows 10 1709 VDA (XD 7.15 LTSR CU2) that's driving me up the wall. For background, this is a non-persistent, random desktop (shuts down after each session ends) using PVS, CPM and App Layering 1812, with the local DBs redirected to our PVS WC drive. We've got two WEM brokers load-balanced via NetScaler, and the WEM DB sits on a SQL 2016 server with AlwaysOn AG.

 

Every once in a while, I'll hit a desktop where the WEM agent simply does not run at all, thus no agent logs are generated in the user profile for me to check. I have Debug enabled everywhere, so I at least have some logging. The odd thing is, I can hit that very same VDA 15 minutes later and WEM works just fine.

 

I've got the WEM agent installed in my Platform Layer, and am using the following batch file as a Startup script running under the SYSTEM account, per CTX226494:

@echo off
net stop "Norskale Agent Host Service" /y
net start "Norskale Agent Host Service"
net start "Netlogon"
cd "C:\Program Files (x86)\Norskale\Norskale Agent Host\"
AgentCacheUtility.exe -refreshcache
exit 0

The Norskale Agent Service on the VDAs all register the same events when this happens:

Warning: No Last Known Configuration Set detected -> Offline mode will not work
Informational: Group Policy Watcher Started
Informational: Agent Host Service Initialization Completed. Agent Service Version: 1808.0.1.1
Informational: Agent Command Processor service started
Warning: Failed to updated agent callback information. Will try update later.
Informational: Using legacy agent callback WCF service.
Informational: Agent Cache Synchronization Interface service started
Informational: Agent Messaging service started
Informational: Agent Proxy service started
Error: The creator of this fault did not specify a Reason.
Warning: Agent 'VDLPT0105' is not bound to any configuration set!
Warning: No Last Known Configuration Set detected -> Offline mode will not work

My Citrix WEM Agent Host Service Debug.log file is next to useless, as well. It never seems to detect the logon. If I check a working log file, I will see SensMonitoring.TrackLogonTime() and SensMonitoring.Logon() entries. The Broker event logs are just as useless, basically only showing the VDA checking in and then showing where it's bound to.

 

I've got a ticket open with Citrix right now, but figured it was worth posting here, as well. The tech said they have another case (didn't specify open or closed) where the client was deleting the LocalAgentCache.sdf and LocalAgentDatabase.sdf files as part of the Startup script mentioned earlier, but the only time I've ever seen that referenced was for upgrading to v4.7. At any rate, I did delete those files and attempt to refresh the cache, and did run into some errors, which I apparently forgot to save. I believe it mentioned a timeout. I then modified my Startup script to add a 15 second delay between starting the Netlogon service and the AgentCacheUtility, and I believe I was finally able to successfully refresh the cache that way.

 

So, has anyone seen issues like this in WEM? I've scoured the forums and haven't found this random inconsistent behavior with the agent not running at logon elsewhere.

 

Link to comment

14 answers to this question

Recommended Posts

  • 0

I think my assumption was correct - the hard-coded broker config doesn't appear to be the issue. Instead, it looks like the Norskale Agent Service starts so fast that the computer GPO hasn't fully applied yet. I removed the broker from the registry in my vDisk and let the GPO handle it all, and WEM was still broken, never actually launching. The Norskale event logs showed "Invalid Broker Connection Settings!" logs, which would make sense if it didn't have an address to point to at the time.

 

So what I did was add a 30 second timeout in my RefreshCache batch file that runs at system startup:

 

@echo off
timeout 30 /nobreak
net stop "Norskale Agent Host Service" /y
net start "Norskale Agent Host Service"
net start "Netlogon"
cd "C:\Program Files (x86)\Norskale\Norskale Agent Host\"
AgentCacheUtility.exe -refreshcache
exit 0

So far, all of my testing with this change has been successful. That delay is enough to let the GPO apply and grab the correct broker address.

 

Either way, it looks like I have two solutions - a) hard-code the Prod WEM broker in the registry (don't care much about QA, since I'm the only one who uses that environment), or b) add a timeout to the RefreshCache script. I'm unwilling at the moment to change any policy settings like Always Wait for Network, etc. The whole point of WEM was to speed up logons.

 

I chose option b), and after adding that 30 second delay, WEM has worked flawlessly in both QA and Prod (UAT).

  • Like 1
Link to comment
  • 0

Were you able to confirm that the 15 second delay does indeed fix the issue every time? Outside of that, is there any sort of GPO errors around machine startup time? I've seen some oddities around NTP etc which stops the GPO's from applying appropriately on boot, which in turns stops the script running?

Link to comment
  • 0
12 hours ago, James Kindon said:

Were you able to confirm that the 15 second delay does indeed fix the issue every time? Outside of that, is there any sort of GPO errors around machine startup time? I've seen some oddities around NTP etc which stops the GPO's from applying appropriately on boot, which in turns stops the script running?

I didn't move forward with the 15 second delay because Citrix support got back to me and wanted me to try a few things (including reinstalling WEM with the following switches: WaitForNetwork=1 SyncForegroundPolicy=1 ServicesPipeTimeout=120000). I didn't want to go that route quite yet, because I believe those settings basically enable "Always wait for network...", which I don't have enabled in this VDI to speed up the logon process. I'm trying to find others who have gone this route, but so far the only thing I've found is a recent James Rankin blog on disabling it. We do use Folder Redirection, though do it through WEM.

 

I enabled WCF logging and sent the logs to Citrix. One thing that I may have done wrong was hard-coded my QA WEM broker in the registry of my vDisk, using a GPO to set the Prod broker VIP. The WCF logs do show both addresses being contacted at one point or another. That said, I see the same thing in sessions where WEM actually did run.

 

As for GPO errors, very rarely I'll come across a VDA where group policy processing fails with "The processing of Group Policy failed. Windows could not determine if the user and computer accounts are in the same forest. Ensure the user domain name matches the name of a trusted domain that resides in the same forest as the computer account.", and as a result basically everything is broken.

Link to comment
  • 0

Part of me thinks it could be that entry in the registry. I edited my vDisk and changed the broker to my Prod VIP, then applied it to my QA VDAs where a GPO sets the QA broker. The first 4-5 logons were broken, with WEM never actually running. The next 3-4 logons have worked fine. It's basically what I was seeing in Prod, whereas before I couldn't get QA to break, not once.

 

At the same time, I build a new vDisk for Prod, where the Prod broker VIP was baked into the registry, even though a GPO also sets it to the same VIP. My very first logon, WEM didn't run at all. The next 2-3 logons to that VDA worked fine. So, who the hell knows at this point.

Link to comment
  • 0

Interesting. I worked with a friend who was hard coding his wem broker value into the registry and he was having similar issues, apply it via GPO and it was perfectly fine.

 

We also learnt that it was safest to have GPO applied to the master image with the wem start up script and broker defined so that it always had a cache of the policy applied on fresh boot - leads me to always have a dedicated wem GPO for all my builds 

 

I agree on those time out settings, avoid having to set those unless you 100% need to 

Link to comment
  • 0
On 1/4/2019 at 1:53 AM, James Kindon said:

Interesting. I worked with a friend who was hard coding his wem broker value into the registry and he was having similar issues, apply it via GPO and it was perfectly fine.

 

We also learnt that it was safest to have GPO applied to the master image with the wem start up script and broker defined so that it always had a cache of the policy applied on fresh boot - leads me to always have a dedicated wem GPO for all my builds 

 

I agree on those time out settings, avoid having to set those unless you 100% need to 

That's an interesting point. While I do set the broker in my master image, and then launch VUEMUIAgent.exe before finalizing the layer (I want to say that I saw this in your blog or JG Spiers'), the computer object generated by App Layering for this layer is not bound to any configuration set. So what you're saying is that I should bind that computer to my Prod WEM config set before finalizing the layer? Or, in my case, I should do this when I edit the layered vDisk since I have to run Trend Micro's TCacheGen utility, as WEM cache is directed to the D: volume which isn't present until the final vDisk is generated.

Link to comment
  • 0

That would have been George, I don't do much with App Layering at all.

 

Its more having the GPO delivered setting baked in rather than having a a config set defined - it me be completely irrelevant given that it makes little to no sense, but we definitely saw a difference in behavior

 

Wonder if having the redirected cache hurts you at all?

Link to comment
  • 0

I had a follow-up call with Citrix again today, and the takeaway was basically me removing the hard-coded broker config and letting the GPO do its thing. I asked why that would matter, since if WEM is attempting to contact a wrong broker (current state) but can't, it doesn't work, so why would it be different if WEM had no broker to contact? I'm going to test tomorrow, but fully expect the same broken behavior.

 

I'm on the fence with redirecting the WEM cache. At this point, even with the cache up to date, it still seems to be broken, so changing it back likely doesn't matter. We'll see how tomorrow goes.

 

As always, I appreciate the assistance, James!

Link to comment
  • 0

This was working flawlessly for a while, but now in my QA environment, I’m having consistent issues with specific VDAs. It looks like the RefreshCache.bat file doesn’t actually complete, and remains “Running”, so WEM simply doesn’t run again. It actually appears to be hung up on the timeout command. I haven’t been able to get this to happen on other VDAs, despite the fact that they’re identical, right down to the vDisk image.

 

A coworker of mine notified this in Prod this week (no recent changes), as well. So I’m probably going to query all online VDAs to see if that task is stuck running on others.

 

I may modify the batch file, removing the timeout command, possibly replacing it with a ping that lasts for 30 seconds. Or I could replace the scheduled task and have it delay the start 30 seconds to compensate. Either way, this is frustrating. Maybe I’ll bring this up at Synergy in a few weeks...

Link to comment
  • 0

In case anyone comes across this, it looks like changing the timeout command to a ping resolved the issue:

 

@echo off
ping localhost -n 30 > nul
net stop "Norskale Agent Host Service" /y
net start "Norskale Agent Host Service"
net start "Netlogon"
cd "C:\Program Files (x86)\Norskale\Norskale Agent Host\"
AgentCacheUtility.exe -refreshcache
exit 0

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...