Jump to content
Welcome to our new Citrix community!
  • 0

GRID card boot loop.


Paul Fulbright1709158471

Question

Created the image master VM without a GRID card attached to test something, when I create the platform layer and attach a GRID card (in this case M60) to the platform layer packaging VM and then publish the layer to PVS it gets into windows and boot loops, it appears to be rebooting because it found new hardware (pci device ID's match, everything I can find matches, target devices are clones of the same VM I used to create the OS layer).

 

We use the following:

  • ESX 6.0u2
  • PVS 7.15
  • XenApp 7.15

Now some of our hosts run K2's and some run M60's.

 

If I can't attach the GRID cards at the platform layer without causing the underlying OS to boot loop then that means I cannot use one OS layer (and therefore one set of App layers) for both the K series hosts and M series hosts.

 

That is a pretty big deal breaker, is there any guidance anywhere on what does and does not get omitted from the platform layers? I would kind of expect that this should have worked but I have done it multiple times now and with Server 2016 the only way I can get this to work is to have the GRID card attached to the VM I install the OS to BEFORE I import the OS layer.

 

Which to me again seems to indicate that the problem is that something isn't being picked up in the platform layer that IS picked up when it imports the OS layer (guessing it doesn't make it through the filter driver?).

Edited by Paul Fulbright
Link to comment

18 answers to this question

Recommended Posts

  • 1

If I recall correctly the solution I came up with was to create a template in VMWare, assign that as the template to use in the ELM for the VMWare connector, then create the platform layer, install PVS, vda, GRID driver, join domain, reboot, install appsense, disjoin domain, reboot, delete the domain users profile, finalize then deploy my targets in VMWare using the same template I used to create the layer. For sanity sake the template itself didn't actually have the GRID card added as I use that template for everything, GRID and non-GRID, I just added a 4q GRID to it after spinning up the packaging and target VM's.

 

It is VERY specific about things and if you have problems you may also need to make sure the PCI device locations match as well.

  • Like 1
Link to comment
  • 0

You should not need to attach the GRID cards to the OS layer before import.  Whatever you put in the platform layer should be captured the same as it is in the OS layer, and anything we remove based on your platform layer selections, we would also remove from the OS layer.  (Specifically, we clone the OS layer, play in the app and platform layers, and then run the cleanup scripts (to remove any hypervisor tools for hypervisors you did not specify in the platform layer configuration) from the combined disk.)

 

If you're right, it wouldn't be the filter driver exactly, since that's responsible for filesystem modifications.  It would be our registry virtualization system (called RSD internally) which would have to have dropped something.  Or the finalize process which might be intentionally removing something in an effort to be helpful.

 

You might need to have two platform layers, one for each GRID card, though.  But a platform layer created with one card should be fine in a target machine identically configured.  You did say that your target machines are clones of your OS layer, but you didn't say anything about how the packaging machines are made.  Those might be subtly different.

 

Try explicitly setting a template in the vSphere Connector so that you have complete control about how the packaging machine is created.  Clone one of your target machines (or the template they're based on), delete any attached disks, add one of your GRID cards, and convert that to a template.  Then edit the Connector (or make a new one specifically for this platform layer) and select the template you just made as a template for VM creation.  Then version your platform layer using that connector.

 

Actually, that's an interesting side question: if you add a version to your existing platform layer, using your existing connector configuration, is the GRID card redetected again?  If you're right that data isn't getting captured, then it would be missing the next time you edited the layer.

 

Otherwise, open a case and we'll see what we can figure out.

Link to comment
  • 0

I have a template of the VM I created the OS layer from so what I did last night was clone a VM off it, drop the disks, make sure it had a GRID card, convert to template, set that at the template in the connector, create a new version of the platform layer, messed around a bit, made sure the drivers were working, ran the PVS Optimizer script, set the NIC settings and finalized, same loop.

 

I HAVE seen it reboot when it creates a packaging VM for the platform layer.

 

Forgot two specs to mention, Server 2016 and ELM 4.4.

Link to comment
  • 0

Has there been a solution to this issue.

Just getting into ELM and I am facing same problem with grid card.

 

Citrix recommends to install NVIDIA Drivers in Platform-Layer, but this is causing looping in the published image. The provisioned machines are discovering new hardware once launched.

The only way I could get it working is to have NVIDIA in OS-Layer.

 

Any other ideas?

Link to comment
  • 0

Check C:\Windows\inf\setupapu.dev.log.  Are you sure it's the GRiD card being detected?  And are you sure that the GRiD in your end-user VMs is identical in type and placement and whatever else you can configure (I don't actually know how they work) to what was attached to the platform layer packaging machine?

 

Link to comment
  • 0

The looping behavior only occurs once GRid Card is attached to the provisioned Clients, and quite difficult to access the setupapi.dev.log file due to the looping.

 

The GRiD profile on the provisioned clients are same as those used during the Platform layer preparation.

I do not experience any problem during the layer preparation, the whole issue occurs on published image.

 

As indicated, it seems to work when GRiD Card and driver installation are done in the OS layer, and it is exactly procedures tried in the platform-layer.

  • Create Platform layer
  • attached GRiD card
  • Install NVIDIA driver
  • Install PVS Tools
  • Install VDA Agent
  • Shutdown for finalization

I have tried the option of finalizing with and without GRiD card being attached, and the problematic is same.

 

Any other way I should go about getting it work in platform-layer?

 

Link to comment
  • 0
4 hours ago, Akinola Oke said:

The looping behavior only occurs once GRid Card is attached to the provisioned Clients, and quite difficult to access the setupapi.dev.log file due to the looping.

 

I think I'm missing something here about the boot loop.  What exactly are you seeing?  A Windows request to reboot after detecting new hardware is just that: a request.  The user still has to click "Restart Now".  That's what I'm expecting, so you could simply not click the button, and collect the setupapi.dev.log file instead.  RDP or the C$ share should be sufficient to get to the file before somebody reboots the machine.

 

However, Paul's approach above is certainly the best way to approach device weirdness: using a carefully crafted VM Template, make sure everything from the packaging machines to the end-user VMs always boots from exactly the same hardware.

Link to comment
  • 0

Well, crumbs.  I guess that does strongly suggest it's the graphics card, since other devices are optional.  The only other thing I could do to press for that file (which is only going to tell us what we already know) is to make the machine persistent, let the reboot happen, and capture the file after the successful reboot.  But your templates approach above is going to be where he winds up anyway.

 

Link to comment
  • 0
On ‎4‎/‎20‎/‎2018 at 3:23 PM, Gunther Anderson said:

 

I think I'm missing something here about the boot loop.  What exactly are you seeing?  A Windows request to reboot after detecting new hardware is just that: a request.  The user still has to click "Restart Now".  That's what I'm expecting, so you could simply not click the button, and collect the setupapi.dev.log file instead.  RDP or the C$ share should be sufficient to get to the file before somebody reboots the machine.

 

However, Paul's approach above is certainly the best way to approach device weirdness: using a carefully crafted VM Template, make sure everything from the packaging machines to the end-user VMs always boots from exactly the same hardware.

Users are not getting any dialogue as the looping does not afford access possibility. It is a continuous rebooting, even when in direct console or RDP. Eventually I was able to get the attached log files

 

I eventually worked my way around it before seeing your reply. What I did was to set the disk to “private” and observed that the automatic reboot occurred only once, then system remained alive. Then return it back to “Standard”. It seems I might have to add this additional step to my procedure.

 

However, I didn’t have to do such if NVIDIA driver is installed in OS-Layer. Just provisioned the published disks without any further configurations.

NVIDIA_setupapi.dev.log

NVIDIA2setupapi.dev.log

NVIDIAsetupapi.dev.log

Link to comment
  • 0

I see one thing I really don't expect in the logs: "Sysprep Respecialize".  What Sysprep mode are you specifying in your image?  For MCS and PVS, the answer should be "None", not "Generalize Offline."  And you're not running Sysprep in your OS or Platform layers, right?

 

What differentiates your three log files?  What should I be looking for in them?  And what exactly does your reboot cycle look like (assuming you can see the console at all and it's not blacked out, as vGPUs often do)?

Link to comment
  • 0
22 hours ago, Gunther Anderson said:

I see one thing I really don't expect in the logs: "Sysprep Respecialize".  What Sysprep mode are you specifying in your image?  For MCS and PVS, the answer should be "None", not "Generalize Offline."  And you're not running Sysprep in your OS or Platform layers, right?

 

What differentiates your three log files?  What should I be looking for in them?  And what exactly does your reboot cycle look like (assuming you can see the console at all and it's not blacked out, as vGPUs often do)?

I am not running any “Sysprep”, except if this is incorporate in “ELM”.

Just created a VM and manually installed OS and updates. Run “citrix optimizer”, used “ELM OS Machine tools” to finalize and import into ELM.

The imported OS-Layer is then used to prepare the Platform and App Layers respectively.

 

All logs are from different problematic images.

NVIDIA_setupapi.dev.log and NVIDIA2setupapi.dev.log were extracted when Images were set to Private. As to NVIDIAsetupapi.dev.log, the Image was still in “Standard” but I was able to quickly extract the file through C$-Share before the automatic reboot.

 

I believe we are looking for the cause of the looping. As I said, I do not experience such situation when NVIDIA is in OS-Layer and I do not have to set it to “Private” first. So, what could be causing the Installation of NVIDIA not registering correctly in “Platform Layer”.

 

Yes, the console is normally blacked out once vGPU is present, but I can see the looping process when the machine is restarting. Unfortunately, the looping process does not provide any RDP possibility.

Link to comment
  • 0
On ‎20‎.‎04‎.‎2018 at 3:17 PM, Paul Fulbright1709158471 said:

If I recall correctly the solution I came up with was to create a template in VMWare, assign that as the template to use in the ELM for the VMWare connector, then create the platform layer, install PVS, vda, GRID driver, join domain, reboot, install appsense, disjoin domain, reboot, delete the domain users profile, finalize then deploy my targets in VMWare using the same template I used to create the layer. For sanity sake the template itself didn't actually have the GRID card added as I use that template for everything, GRID and non-GRID, I just added a 4q GRID to it after spinning up the packaging and target VM's.

 

It is VERY specific about things and if you have problems you may also need to make sure the PCI device locations match as well.

Paul, thanks for the tips.

I actually follow same steps but maybe not in same order. However if GRID driver is being installed in "Platform Layer" as indicated in citrix documentation (https://support.citrix.com/article/CTX225997), I believe the packaging machine has to be finalized with it still being in domain member. A

 

I guess you might have set the Image to “Private” after publishing to perform some additional configurations before making it available to users. This could have been what resolved the looping issue.

Link to comment
  • 0
On ‎19‎.‎05‎.‎2018 at 3:56 PM, Brett Molitor1709157015 said:

I've found this same issue to be true in an environment I'm working on. I don't like that the answer is to have to boot one time in private for every time I spin up a new version of a publishing image. To me, it takes what is convenient and flexible about the App Layering solution and actually makes it tedious and clunky. 

 

I share same view, this shortcoming is killing the convenieces and flexibilites "App Layering" is to provide.

I went around the issue by installing GRID compoent in "OS-Layer", though not as suggested by citrix.

Link to comment
  • 0
8 hours ago, LEVI PINGEL said:

I'm going to clone a new VM from the template with the original virtual disk ID to see if that eliminates the issue.

 

I was also able to boot in Private mode to complete the hardware changes and reseal the VHD.  Dealing with VHD is identical to maintaining vhdx for PVS.  Not as intimidating.  

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...