Jump to content
Welcome to our new Citrix community!
  • 0

AppLayering Reliability going forward/Best Practices


Darin McClain

Question

We are current users of UniDesk 2.9.4.5 with about 130 end user VMs from UniDesk. This environment has been running for several years now but it got to the point a year or two ago where I stopped updating windows and other apps because of how unpredictable things got. Things got to the point where layer conflicts were common, causing machines not to boot at all and just sit at black screens, have audio driver conflicts, losing the ability to load USB drivers for even a common thumb drive, and even the UniDesk system tools (changing to different layers, rebuilding VMs, repairing VMs, re-installing layers) becoming useless to fix some of these issues. No joke, I tell people now it's less risky to manually uninstall office 2010 and install 2016 into their user layer rather than risk changing the layers and rebuilding the users VM. I often had to manually fix these issues too, or completely wipe the machine in order to resolve the issue. These issues were not from the end user either, they typically start when we tried to update or change layers/versions.

 

So, going forward with my new Environment, that I plan to use College wide for close to 1100 VMs, I just want to make sure I know everything I need to know to make a more reliable environment than my previous one. I don't want to have to leave things alone once their working for fear of breaking things when I want to run updates. The main reason I'm posing today is that I recently upgraded AppLayering to 1902. Then subsequently updated my OS layer with the latest windows updates (1803 to 1807 as well). After that I updated my office layer. I published new images and slowly updated my test pools. I just this past Saturday updated the main pool that has about 10 active test users on it. I have seen random reboots (probably blue screens) happen to two of them so far.  One about 2 hours into the work day while he was working on it. A second, for a different user, about 10pm last night when he wasn't even using it. I was not having reboots before pushing the layer updates.

 

So I'm hoping there's some tricks I don't know, like say, after doing an upgrade like 1902, and then the following windows updates from 1803 to 1807, should I be going to every layer to make a new version of it to help deal with the new version of windows? Stuff like this is what I'm after so that I can try to prevent reliability issues.

 

I added the event viewer logs from one of the two VMs I just described if anyone is up for a look. I don't see anything useful in them from my experience. Something called a Filter Manager was the last few events logged before the crash, is that the Citrix Filter Driver? And, are there other logs I should look into that might help pinpoint issues like this and future issues?

 

Darin

 

 

system.evtx

applications.evtx

Link to comment

5 answers to this question

Recommended Posts

  • 0

No, there's no special steps.  The testing you're doing is right.  Certainly you do not need to version ever layer so that it "sees" the updated Windows.  A layer only contains the things that changed during its creation, and shouldn't have any information specific to the layers it was based on.  Only if your software is known to be sensitive to specific Windows versions would you consider versioning or recreating them.  Whenever I have seen issues, it's the Platform Layer that needs to get recreated, and even that's pretty uncommon.

 

As for blue-screens, get us a full memory dump (and open a case for it).  That's always what we're going to want for blue-screens.  Set Windows to not reboot automatically (you can make that setting in an app or platform layer, or by booting up your master image for whatever your provisioning system is), and when it halts while looking at the blue-screen, suspend the machine.  That will dump a VMSS and/or VMEM file into the VM directory, which can be converted into a Windows memory dump.  But a full memory dump of the crash is what we're always going to want.

 

As for overall reliability, the issues you're having with Unidesk V2 are because of machine persistence.  As you say, you can just recreate the machine and it's OK.  The problems all come from trying to preserve your existing data while switching layers and versions.  If the User Layer has recorded data that is a mismatch for the new layers/versions, absolutely that will cause problems.  A major Office upgrade is a good example.  Sure you can switch the layers, but your existing desktops are full of Office 2010 data.  Office 2016 has no idea what to do about that, and will have a cybernetic fit trying to fix itself up.  Even the licensing will break.  It is in fact our best practice to do the upgrade directly in the desktop and then switch the layers.  We can't do anything like that automatically, because we simply don't have the detailed information about how Office works and how to do that upgrade for you.  Or downgrade, if you switch back to 2010.  And the upgrader that ran in the Office layer is fine for fixing up the Office layer, but it knows nothing about all your user profiles.

 

With App Layering, we require nonpersistent base machines.  When you do a major Windows update, you are replacing one nonpersistent image with another, so there is no history that is going to break your layers when they wake up in a new configuration.  As far as the layers are concerned, this is how it's always been.  Even with the new User Layer feature, the data doesn't show up until the user logs in, allowing Office be settle in and be happy in its booted image before some old data shows up.  Having no history to preserve generally makes replacing images much easier.

  • Like 1
Link to comment
  • 0

Thanks for the good information! I do feel AppLayering will be more reliable than UniDesk was. I am still worried about how often I'll have to restore User layers, or even wipe out user layers due to conflicts. Though this process will be slightly easier in Applayering to do at least. Do you guys have any ETA on when you'll be doing a re-install type feature for layers, that will clear out conflicting data in the user layer? Someone mentioned you guys had that planned for release in the future.

 

I'll work on changing my platform layer to stop restarts on blue screens to get some dumps. If I wanted to look into these dumps myself before opening a ticket, is there anything specific to AppLayering I should look for? I've not dealt with these dump files before, even outside the virtual environment, so feel free to share any info that might save your tech support some work. :)

 

Darin

Link to comment
  • 0

We're actually in the middle of working on a Reinstall-like feature for User Layers.  I think it's currently called User Layer Repair, but it does the same thing as V2 reinstall: finds collisions between app layers and user layers, and scrubs those out of the user layer. We're building the underlying functions now, and then we still have to develop the UI side.  So, no ETA, but it's not a matter of us deciding to do it; it's just a matter of us finishing it.  Months, not years.

 

But backing up, restoring, recreating, and anything else you want to do with User Layers is definitely much easier.  It's just a matter of copying the User Layer VHD files around.  You can keep multiple versions around if you want, and can just replace them any time you want.  All we do is look for the file with the right name.

 

As for analyzing the DMP file, all I do is a !analyze -v and see what it says, and see if unifltr.sys or unirsd.sys is on the stack somewhere.  That and the STOP code are what I'll punt up to Engineering.  Rarely, there can be a crash we're involved in where we're not even on the stack, but for those, I definitely want multiple  dump files for triangulation.

Link to comment
  • 0
On 4/3/2019 at 11:44 AM, Gunther Anderson said:

We're actually in the middle of working on a Reinstall-like feature for User Layers.  I think it's currently called User Layer Repair, but it does the same thing as V2 reinstall: finds collisions between app layers and user layers, and scrubs those out of the user layer. We're building the underlying functions now, and then we still have to develop the UI side.  So, no ETA, but it's not a matter of us deciding to do it; it's just a matter of us finishing it.  Months, not years.

 

But backing up, restoring, recreating, and anything else you want to do with User Layers is definitely much easier.  It's just a matter of copying the User Layer VHD files around.  You can keep multiple versions around if you want, and can just replace them any time you want.  All we do is look for the file with the right name.

 

As for analyzing the DMP file, all I do is a !analyze -v and see what it says, and see if unifltr.sys or unirsd.sys is on the stack somewhere.  That and the STOP code are what I'll punt up to Engineering.  Rarely, there can be a crash we're involved in where we're not even on the stack, but for those, I definitely want multiple  dump files for triangulation.

Hey Gunther,

 

I had been talking to Rob about a feature request just like that (the 'reinstall' option) not that long ago and saw this post tonight.   I think this was a few months ago so curious to check in to see how its progressing.  I just created a post about an office 365 issue pulling into User layer that something like this would have come in quite handy the last month.  This will be a great new feature once its ready!

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...