Jump to content
Welcome to our new Citrix community!
  • 0

NVIDIA vGPU in Platform Layer - cannot get it working


Question

No matter how I dice it up, I cannot get app layering to work consistently with my NVIDIA M60 profiles on VMWare 6.5. It never seems to work if I install it in the Platform layer, it's working SOMETIMES when i install in the OS layer, but I have to make new versions of the OS layer to change between 2Q, 8Q, etc...

 

Here is my process:

  • Build windows 10 1709 Ent image - disable windows updates - remove Windows Store Apps using the scripts from Citrix - install app layering image prep agent - shutdown
  • Capture the VM as a new OS layer
  • Create new version of OS layer
  • Run Windows Updates
  • Installed BIS-F - configured Citrix Optimizer to run with default 1709 xml template
  • Use BIS-F to shutdown
  • Finalize the OS layer
  • Create Platform Layer based off of OS layer
  • Join to domain - restart
  • Attach M60_1Q vGPU profile to packaging VM
  • Install NVidia driver version 391.03_grid_win10_server2016_64bit_international on the packaging VM - restart
  • RDP into machine and install VDA 7.15.2000 as HDX 3D Pro - restart
  • Install PVS 7.15.3 Target - restart
  • Install WEM agent 4.6 - restart
  • Use BIS-F to shutdown
  • Finalize the Platform Layer
  • Create Image Template using the OS and Platform Layer
  • Publish image to PVS connector
  • Assign the vDisk to a VM with 1Q profile attached - machine boots up but all I ever get for a video driver is Microsoft Basic Display Adapter

 

Does anyone have any experience with this configuration to shine a light on what I may be doing wrong? Thanks in advance.

Link to comment

25 answers to this question

Recommended Posts

  • 0

Hi Brett,

 

Do you have elastic layering enabled in your image?  If so the first test I would try is publishing an image without it to see how it works.  Also since you are having issues after that I would try it without the BIS-F framework because that is introducing an unknown into the testing mix.  For the issue of it not working with different profiles you might want to introduce each profile into the layer you are installing the NVIDIA drivers into.  Meaning boot the packaging machine with each NVIDIA profile you want to support so that windows and the NVIDIA drivers have seen them all before booting with PVS.  lastly it would be interesting to see if booting in private mode helps at all.

 

Rob

 

Link to comment
  • 0

So - booting it into private mode after one restart installed the driver. It did NOT capture the license server that I configured. I added that back in, shut it down and put it back into standard mode. Booted up the target and 1Q was functional. I then shut it down, added a 4Q profile and that worked too...

 

This is progress, but does that mean elastic layering cant be used?

Link to comment
  • 0

I doubt your skipping it, but it's not on your list.. Are you installing VMware tools in to your OS Layer? Make sure you install the latest 10.2.5 version as it has contains some display drivers fixes.

 

I am using vmware/view with vGPU (P40's) and elastic layering w/o (that specific) issue.

 

If you look in Device Manager do you see the nvidia gpu? Does it have any errors there, likely driver not loaded correctly?

 

I know in my view environment, I have to re-add the vgpu to my published machine before importing into View. After publishing the template from app layering, the devices dont stick.

Link to comment
  • 0

Good point. I am not skipping the VM tools install. I am installing that right away in the OS layer. Thanks. devmgmt shows no NVidia driver when this happens, its just the Microsoft Basic Display Adapter. You cant repair or manually install the driver even.

 

I tested Rob's suggestion to install all the drivers into the platform layer and then to try EL enabled image again in PVS. For the record, I experienced the same issue. Not only that, but a side issue is that the start menu corrupts when I have elastic layering enabled. If I create an image without EL enabled, the start menu is fully functional.

 

To your point about needing to re-add the vgpu into your machine before importing it; that is likely the equivalent to what I need to do in PVS. Basically, I need to publish the image. Then I need to change it to private mode, let the vGPU detect and restart then I can shut it down. Then I can set it back to standard mode and attach vGPU profiles. That's fine because it technically works...but having to do that every time for every image, every time I need to republish makes this solution very difficult to sell to my client. Add to that, the fact that EL breaks the start menu, at least in all of my testing on 1709 with UPM enabled....

 

I am going to open a case on this one, because it seems like a bug to me.

Link to comment
  • 0

Hi,

 

I had a similar issue but with Xenserver. I ended up creating a cmd file that contains:

 

pnputil /add-driver C:\NVIDIA\369.95\Display.Driver\nvgridsw.inf /install

 

and it is run from kmsetup.cmd.

 

I added it under COMMANDS TO RUN EVERY BOOT with

 

    REM Load NVidia drivers
    If EXIST NVidiaMount.cmd (
         echo !date!-!time!-kmssetup.cmd:Call NVidiaMount.cmd >> NVidiaMount.txt
         Call NVidiaMount.cmd >> NVidiaMount.txt
    )

 

Not sure whether this will help, I hope it will.

 

Csaba

 

Link to comment
  • 0
12 hours ago, Csaba Keresztessy said:

Hi,

 

I had a similar issue but with Xenserver. I ended up creating a cmd file that contains:

 

pnputil /add-driver C:\NVIDIA\369.95\Display.Driver\nvgridsw.inf /install

 

and it is run from kmsetup.cmd.

 

I added it under COMMANDS TO RUN EVERY BOOT with

 

    REM Load NVidia drivers
    If EXIST NVidiaMount.cmd (
         echo !date!-!time!-kmssetup.cmd:Call NVidiaMount.cmd >> NVidiaMount.txt
         Call NVidiaMount.cmd >> NVidiaMount.txt
    )

 

Not sure whether this will help, I hope it will.

 

Csaba

 

This is pretty fantastic - thanks! I haven't had a chance to try yet but i know it may prove handy. This doesn't require a restart?

Link to comment
  • 0
On 5/18/2018 at 10:15 AM, Brett Molitor1709157015 said:

Good point. I am not skipping the VM tools install. I am installing that right away in the OS layer. Thanks. devmgmt shows no NVidia driver when this happens, its just the Microsoft Basic Display Adapter. You cant repair or manually install the driver even.

 

I tested Rob's suggestion to install all the drivers into the platform layer and then to try EL enabled image again in PVS. For the record, I experienced the same issue. Not only that, but a side issue is that the start menu corrupts when I have elastic layering enabled. If I create an image without EL enabled, the start menu is fully functional.

 

To your point about needing to re-add the vgpu into your machine before importing it; that is likely the equivalent to what I need to do in PVS. Basically, I need to publish the image. Then I need to change it to private mode, let the vGPU detect and restart then I can shut it down. Then I can set it back to standard mode and attach vGPU profiles. That's fine because it technically works...but having to do that every time for every image, every time I need to republish makes this solution very difficult to sell to my client. Add to that, the fact that EL breaks the start menu, at least in all of my testing on 1709 with UPM enabled....

 

I am going to open a case on this one, because it seems like a bug to me.

 

Did you ever open a case on this?  What came of it?  We are having the same issue on our Server 2016 deployment with the GRID GPU's.  Everything is good until we deploy the PVS disk and then it wants to reboot.  If we do a maintenance version or boot it in private mode it finishes it's restarts and then it works... I'd like to be able to skip needing to boot up the disk after every publish if we can avoid it.

Link to comment
  • 0

Yeah - the official answer is that it needs to be detected in private mode and then shutdown if you install it in the platform layer. You also need to do it again if you change the vGPU profile to a different profile. I ended up putting it in OS layer because it sticks there (most of the time - not all the time)....and you can't switch profiles without versioning your OS layer and attaching a new profile. Actually, it ended up that with this particular project, we went with traditional PVS images - not using app layering. The many caveats with layering proved to be more than the customer was willing to add to their administration duties.

Link to comment
  • 0
1 hour ago, Brett Molitor1709157015 said:

Yeah - the official answer is that it needs to be detected in private mode and then shutdown if you install it in the platform layer. You also need to do it again if you change the vGPU profile to a different profile. I ended up putting it in OS layer because it sticks there (most of the time - not all the time)....and you can't switch profiles without versioning your OS layer and attaching a new profile. Actually, it ended up that with this particular project, we went with traditional PVS images - not using app layering. The many caveats with layering proved to be more than the customer was willing to add to their administration duties.

 

When you say you ended up putting "it" in the OS layer, what are you referring to?  GPU drivers? vGPU profile?

Link to comment
  • 0
On 7/27/2018 at 9:30 AM, Mike Kelly1709153237 said:

 

When you say you ended up putting "it" in the OS layer, what are you referring to?  GPU drivers? vGPU profile?

Yeah - sorry i wasn't clear on that. I meant I installed the nvidia drivers in the OS layer with the specific profile attached that I wanted the image to use. If I wanted a different profile, I had to detect it in a new version of the OS layer. Platform layer testing required a private mode boot of the PVS disk in order to work.

Link to comment
  • 0
On 7/27/2018 at 2:20 PM, Brett Molitor1709157015 said:

Yeah - the official answer is that it needs to be detected in private mode and then shutdown if you install it in the platform layer.

 

The problem seems to be that the drivers from the platform layer are put into the layered image, but not the actual status of any of the devices, i.e. the hardware configuration. Therefore when the image is booted it has to re-detect everything. This is problematic as "Getting devices ready" causes high load at boot time and worse yet, you can connect to a session before it's even got all the devices ready. I've not seen any explanation for why this design decision has been made as naively you'd expect the contents of the platform layer to be added wholesale to the layered image. The platform layer itself has all the hardware detected and drivers configured as required, but it's being ignored.

 

On 7/27/2018 at 2:20 PM, Brett Molitor1709157015 said:

The many caveats with layering proved to be more than the customer was willing to add to their administration duties.

 

*nod* Having to boot in private mode before promoting to production and replicating the vDisk is already offputting and gets really quite problematic if you need to involve other hardware or hypervisors.

Link to comment
  • 0
On 5/31/2018 at 8:34 PM, Brett Molitor1709157015 said:

This is pretty fantastic - thanks! I haven't had a chance to try yet but i know it may prove handy. This doesn't require a restart?

 

Sorry for the late reply but I got no notification there was an update. Restart was not needed.

 

Since then I done a new version of the Platform Layer and this was not needed anymore. The driver version we are using is:

 

NVIDIA-GRID-vSphere-6.5-430.27-430.30-431.02.zip

 

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...