Jump to content
Welcome to our new Citrix community!
  • 1

PROCESS_HAS_LOCKED_PAGES blue screen


Andrew Gresbach1709152664

Question

Has anyone else started to see random blue screen crashes stating PROCESS_HAS_LOCKED_PAGES (or DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS) since updating to the latest 1812 version?   We're all of a sudden getting 5-6 of these every day now and I can't think of anything else really that I've updated in my images other than this update (started initially after I only upgraded ELM so I thought maybe it was because I had not spun out a new image after the update but happening since then as well.   It may not be related at all to App Layering but wanted to see if anyone else had run into it because i'm a bit stuck.  Given we're MCS I have no way that I can come up with to snag the .dmp file when it happens since it clears when we restart it.  I opened a ticket this morning as well but wanted to check in here too while I was at it

 

image.thumb.png.7a8d9b9aca53f6d32e976d7ff0283b2e.pngimage.thumb.png.1e41e86d3db76227e785fcf8da79c957.png

Link to comment

Recommended Posts

Boot up the master image, set "halt on BSOD" (OK, I've forgotten the actual setting name), shut it down, make a new snapshot, and update your catalog.  Then while you're sitting there looking at the halted BSOD screen on yur provisioned, suspend the machine and get the VMSS and VMEM from the datastore.  Resume it and reset.

 

That said, we've fixed a BSOD post-1812 (that existed pre-1812), so once I see your case, I'll see that you get the 18.12.0.17 patch.  I don't know if it's yours - people report it as "Chrome causes a BSOD" - but I always want to be analyzing a crash based on the very latest code.

Link to comment

You can disable the software_reporter_tool in chrome with registry entries!!  :-)

 

We experienced the exact same bluescreen at our site and MS crash analysis found the software_reporter_tool.exe to be the culprit. (Running Win10 1709 here and app layering appliance version   We actually had the SwReporter folder excluded from UPM since early last fall..  We had no issues with our image with October MS updates, and November was fine at initial release.  Then about the second week of December the blue screens appeared.  (as a non-persistent with minidump and immediate reboot, we actually missed the bluescreen at first, but found the OS crash in our hypervisor logs.  enabled full dumps and sent that off to MS).

Once the image appeared in our November image and were scrambling to get our customers in a stable place, we moved back to our October image, which did NOT fix the issue.

We had chrome 68 in both releases.  (note chrome 71 did NOT fix the issue).

 

ANYWAY, my main point is that we found a set of registry entries that actually disable the swreporter tool completely and works like a charm.  As a built in solution, it'll fly better with the boss.

 

Registry:

https://www.askvg.com/how-to-disable-or-remove-chrome-software-reporter-tool-in-windows/

 

https://www.chromium.org/administrators/policy-list-3

https://www.chromium.org/administrators/policy-list-3#ChromeCleanupEnabled

https://www.chromium.org/administrators/policy-list-3#ChromeCleanupReportingEnabled

 

HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Google\Chrome

Values:

ChromeCleanupEnabled  reg_dword =  0

ChromeCleanupReportingEnabled  reg_dword = 0

 

Theory(conspiracy) ...:

Since we all seem to have been hit with this around the same timeframe..  and Windows Updates don't seem to exactly line up for all of us (and we had NOT updated our appliance version..

One thing that my co-workers and I noticed is that with the directory gone..  Google Chrome appears to bring it back at a random interval after user login (turned out to be some time after the user had launched chrome and we at one point advised users not to use chrome)....

So assuming for the moment that it -checks in with the mothership- and downloads the software reporting tool content from Google directly!?

IF that is true (and we haven't proven it, need to hit up our network guys or do a trace), perhaps Google updated the tool code on their site.  I'll leave that for you all to ponder, and if we find out if that's the case, I'll share that back here.

 

In any case, I hope the registry entries prove useful as a clean way to disable this lovely chrome 'feature.'

ty, jw

Link to comment

Interestingly we are on an older version of chrome (65.x)and haven't updated recently.  Literally the only change in our environment recently has been updating the os layer for Nov/Dec updates in mid december.  We are however seeing the issue in our Windows 10 Desktop on AppLayering 4 (19.1) which seems to contradict what we are being told that it would be fixed by then.

Link to comment

As an FYI, we believe (tentative) that we have traced this back to Google Chrome's "Software_reporter_tool.exe"  If you can cause chrome to crash you can basically be assured that your VM will crash as well.  We are experimenting with disabling and deleting the tool altogether.  I will post back and let you all know how it goes for us.

Link to comment

Are you sure you're still seeing a STOP 0x76 or STOP 0xCB crash on 19.1.0?  We really do believe that the bug is fixed in 19.1.0, so if you're still seeing it (double-check C:\ProgramData\Unidesk\Logs\ulayersvc.log for the last version banner), we'd really like a full memory dump to look at.

 

I have another V2 customer who thinks they may be seeing this triggered by something in a recent Adobe Reader update as well.  So, another possibility to try for people who just need to get back to a stable configuration.

Link to comment

So after looking at the ulayersvc.log file it still shows 18.12.0.16917.  The ELM is updated, what I was not clear on was did we need to do an update to the client as well? 

 

Regardless, at least in our case SO FAR it seems like removing chrome's software_reporter_tool is doing the trick.  It may be just anything that tries to run under high memory utilization causes the issue though.

Link to comment

 

FYI - We just had this same issue with this BSOD for unidesk 4.x but not we are not on 19.1.0 version yet.  This was happening on Win7 and Win10 1703 VDIs.

 

Chrome's software reporter tool was causing the issue.

 

Blocking "C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\SwReporter\VERSION\software_reporter_tool.exe" from running fixed this issue.

 

Quick Fix - I created a Windows GPO "User Configuration" logon script that deletes the "C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\SwReporter" folder then put a blank "SwReporter" file in that folder's place.  This blocks chrome from installing the software reporter tool and prevents it from running.

 

 

 

 

Link to comment

Hate to tag onto this thread but... I'm gonna anyway.

 

We are using Horizon 7.3.1 and Unidesk 2.10 with Win7 and we are experiencing the exact same issue since applying the Nov/Dec Windows updates.  We are migrating to Win10 on App Layering 4 and we are already at 1901 for that platform.  Any suggestions to ease the pain on the Win7 desktops on Unidesk while we work on our migration?  It's become a major pain the last few days.

Link to comment

We ran into this exact issue with Chrome crashing win7 desktops. We tracked it to google chrome and advised our client to stop using it until we got a fix, we tried upgrading chrome deleting profiles, and could not pin point it till I cam across this post about the software reporter tool. This has caused issues for us for months, but really came to a head here at the end of January and really hurt our business process for about a week. 

 

We implemented the registry entries, and blocked permission to the swreporter folder in the google profile before we could get chrome back up and happy.

Link to comment

Note, we also just discovered that the latest App Layering appliance versions contain a fix for the 'deeper cause' for this blue screen.  I don't know the exact details, but seems to be some interaction between app layering, Chrome sw reporter tool, and the OS.  App layering is a complex thing to accomplish.  I suspect its no easy task (for the programmers) to account for every possible variable. 

 

I think the important thing is to get tickets open early with Citrix for these sorts of things.  I think most of us in IT tend to beat head against wall, and ask Citrix/Microsoft, etc later.  Thankfully the ticket with Microsoft (once opened) got us to the cause very quickly from the memory dump analysis.  The ticket with Citrix revealed later, that they had a way to resolve it with a fix in the app layering product.

 

Anyway, the point is, that it actually does seem to help when we all band together and open cases for issues, so their team can make correlations.  Like "oh hey, these several customers all reported this.."  I think if a few more of us (costumers) had reported it, it might have reached a head faster..

 

Just for confirmation..  try downloading the latest appliance from Citrix and see if you can turn the Chrome SW Reporter tool back on.  We're getting ready to try it ourselves (as we had to take at least the last version to fix an MS Word saving issue), and we heard of them having a fix for this Chrome sw tool too.  There's now an opportunity to give it a go.

 

Thanks! and I'm glad the reg entries helped.  :-)

Link to comment

Double-check that you're looking at an image that was published after the ELM upgrade.  You can check \ProgramData\Unidsk\Logs\ulayersvc.log, at the startup banner, to make sure, or look at the Properties of something like \Windows\system32\drivers\unifltr.sys.  But we're pretty sure we've fixed all of those as of 19.01.  If you're really getting them still, we need a full memory dump of one of these so we can analyze it in a support case.

 

If it's something other than 0x76, then upgrade to 19.02 (we always want you at the latest before engaging Engineering) and get us the full memory dump from that crash.

Link to comment

Yup my latest published image should be up to date (confirmed and the unifltr.sys shows v19.1.12109.0).     So it seems like the majority of the time the desktop is just flat locking up (shows unregistered in Director,  console window shows the usual Windows 7 lock screen but you cannot do anything with it like you could w/ a responsive one), cannot access anything remotely on it (c$, services, etc).   The only way to recover is a forced restart in this case.   I have seen the bluescreen i just posted a couple of times but not as often as just the freeze up as i watched closer.  It does seem to happen to 4-5 desktops per day though (and for some reason when it happens to someone it tends to pick on that user more than others.  I am testing by just clearing their user layer (rename their vhd so it recreates) and monitoring if that helps but its been fairly random on who it hits and when.  

 

I just upgraded to 19.02 tonight and will publish a new image so that we're up to date.  We are AHV so is there anything I can do when its locked up like this that I can do to capture any useful information?

Link to comment
On 2/26/2019 at 9:17 PM, Andrew Gresbach1709152664 said:

Yup my latest published image should be up to date (confirmed and the unifltr.sys shows v19.1.12109.0).     So it seems like the majority of the time the desktop is just flat locking up (shows unregistered in Director,  console window shows the usual Windows 7 lock screen but you cannot do anything with it like you could w/ a responsive one), cannot access anything remotely on it (c$, services, etc).   The only way to recover is a forced restart in this case.   I have seen the bluescreen i just posted a couple of times but not as often as just the freeze up as i watched closer.  It does seem to happen to 4-5 desktops per day though (and for some reason when it happens to someone it tends to pick on that user more than others.  I am testing by just clearing their user layer (rename their vhd so it recreates) and monitoring if that helps but its been fairly random on who it hits and when.  

 

I just upgraded to 19.02 tonight and will publish a new image so that we're up to date.  We are AHV so is there anything I can do when its locked up like this that I can do to capture any useful information?

 

Hi SocietyInsurance,

 

Did you get this issue resolved by upgrading to 19.02?  I'm seeing the same behavior that you are describing.  I had this issue regularly before 19.01 and I thought we had it fixed but it seems it's now morphed into the issue you are describing.  I'm seeing it on Windows 2016.  I've opened a ticket with Citrix but I can't seem to crash the server on demand in order to capture the Blue Screen message.  Unifltr.sys is 19.1.11772.0.  Layering Service Build 19.1.0.17251

Link to comment

Unfortunately no......i still have a case open on this as well.   We're on Nutanix so not able to use the usual tricks to suspend a blue screened desktop and we're having an issue generating a crash dump for some reason (it shows the blue screen but never finishes writing the dump file....will stay indefinitely on that screen ) so not sure what else we can do .  its not a huge number of occurrences but we get a few every day lately of the blue screens and more even of just locked up desktops

Link to comment

Thanks Rob.    Thats pretty much what I thought too but thought MAYBE with an ELM upgrade it did some sync w/ our application share and changed something.   It was a long shot but  like I said it was the only thing that had changed during that stretch so wanted to start here (and because you guys are usually more helpful than other vendors )

 

I dont suppose you have any ideas how we could even capture these dmp files w/ MCS or get any more clarification what is causing it?  I enabled a reg key per a microsoft page so we'd get more detailed info (screenshot #1) but thats pretty fuzzy even

Link to comment

My Windows 10 machines have been crashing a lot lately with the same BSOD message .

 

PROCESS_HAS_LOCKED_PAGES (76)
Caused by a driver not cleaning up correctly after an I/O.
Arguments:
Arg1: 0000000000000000, Locked memory pages found in process being terminated.
Arg2: ffffd00af173e780, Process address.
Arg3: 0000000000000092, Number of locked pages.
Arg4: 0000000000000000, Pointer to driver stacks (if enabled) or 0 if not.
    Issue a !search over all of physical memory for the current process pointer.
    This will yield at least one MDL which points to it.  Then do another !search
    for each MDL found, this will yield the IRP(s) that point to it, revealing
    which driver is leaking the pages.
    Otherwise, set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory
    Management\TrackLockedPages to a DWORD 1 value and reboot.  Then the system
    will save stack traces so the guilty driver can be easily identified.
    When you enable this flag, if the driver commits the error again you will
    see a different bugcheck - DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS (0xCB) -
    which can identify the offending driver(s).
 

  My issues started I believe after the Windows 10 November patches (Windows 10 v1703) and have continued with the December security patches.  I have not upgraded the ELM to 1812 (still on version 4.15.0.25).  I am currently testing rolling images back to Windows 10 October patches and also testing an upgraded gold image to Windows 10 v1803. 

 

ELM 4.15.0.25

Vmware Horizon View 7.3.2

Vmware ESXI 6.5 u1

Windows 10 v 1703 with latest security patches.

Link to comment

thank you for the help too!  in our case its Windows 7 so not sure if that changes your theory or not but maybe a similar patch in both OS's in November?  weird that we didnt start seeing it until JUST recently when we've had those updates out for a bit

 

Glad to hear yours is happening w/out the 1812 update though (sorry its happening though haha).  any chance it could be something w/ the 4.15.0.25 update?   really would rather its not but just in case cant hurt to look since we're both having it occur w/ different OS's

 

I did enable that reg key you pointed out and got the screenshot i posted first which doenst give too much more info but i'll have to dig through what you suggested  to see if i can cross reference those numbers to a driver

Link to comment

Andrew,

If it were easily reproducible you can run debug from one machine to another to catch what happens.  But it would be difficult to do since its random.  If you reboot the machine I assume it goes back to working.  If not then it must be something captured in the user layer.

 

You might want to open a ticket with Microsoft.

 

Also you could try going back to your per November os layer and see if the problem goes away.  If it does maybe delete the newer os layers and create a new updated one.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...