ELM performance Azure

Wojciech Kruczkowski · March 19, 2020

Hi,

We have ELM appliance on Azure to create W10 master image with UL enabled to deploy thousands of VDIs through MCS. We noticed several issues.

ELM performance - mostly write speed when uploading image back from azure to ELM appliance or composing layer. it drops to 5mb/s even to less then 1mb/s and sometimes timeouts. Uploading image from ELM to Azure is quite fast 50mb/s. We use Azure connector with standard storage account, we didn't notice any improvement using premium storage account. ELM size on Azure Standard D4s v3 with premium SSD repo disk.

Is there any way to improve the performance, specially write speed? Importing layers taking ages in our case.

One more thing, how to manage page file in Azure? Leave it on temporary D: drive when creating layers? or set it to C: drive at OS layer?

Rob Zylowski · March 19, 2020

I have been told that performance is much better when the appliance in Azure uses a 1 TB disk rather than 512 because the disk transfer rate goes up by 100 mb/s. But i have not tried this myself to see. for some reason in Azure I think Azure manages the page file location but im not sure about that.

Richard Buffone · March 19, 2020

My test ELM has a 1TB disk as Rob suggested and I see very good throughput at about 120 - 150MB/s. Is there a specific message you see when the task times out? Are you seeing a Server Busy message? If so then you are likely hitting the disk I/O limits and Azure is throttling it. Increasing the repository disk size to 1TB or larger should take care of that as it will allow for much more I/O at any given time.

For the page file, Azure handles this on its own. I would suggest leaving the default settings to use the D: drive. This will not effect the layering process.

Wojciech Kruczkowski · March 20, 2020

We extended disk to 1TB and no improvement at all. Below is the error message and and throughput during importing layer to ELM.

ELM perf - blue line shows when the import started, then during whole import it was 2-5mb/s max. Second screenshot shows the error.

IOPS on the last screenshot. 1TB premium ssd has 5000 IOPS so we are not even close to the limit.

Richard Buffone · March 24, 2020

It's possible the network is the slow down. Please connect to the ELM with two ssh sessions. On the first session please run the following:

sudo dstat -d --disk-tps

On the second please run:

sudo iptraf-ng

Select "Detailed interface statistics" and then "eth0".

If iptraf-ng is not installed, please install it using:

sudo yum install iptraf-ng

Now start a layering process, adding a version to an OS layer would be a good test. Pay attention to the current step being run per the App Layering web console and then watch the disk r/w rate from dstat. When the disks are being copied out to the storage account or being copied back in from storage, watch what the rate is on iptraf-ng and how it compares to the disk rate. If the disk copying steps are fast but the network copy steps are slow then we know it is a network bottleneck. If the disk steps are slow no matter which step we are running, then I would suggest opening a case with Microsoft to understand why the VM is not performing up to the specs of the underlying hardware. We have no internal optimizations that would need to be applied and the ELM will simply try to run as fast as possible based on the hardware.

One side note, I assumed you installed the ELM on a recent version 1910, 1911 or 2001. If this was an ELM that was installed on 4.12 or an earlier version (whether you upgraded the appliance or not), you will have the older slower network interface on the appliance. This can be swapped out for the Azure accelerated networking interface for better throughput, or simply install a new ELM based on a more recent version.

Please see here for how to swap the network interface on an App Layering installed at 4.12 or earlier:

https://support.citrix.com/article/CTX237606

Wojciech Kruczkowski · March 25, 2020

HI,

Thx for reply. We done further investigation and seems that Network throughput is lower then it should be. I have built new ELM appliance on a new subscription and region and performance was much better. Flat 100mb/s write speed, except compositing (40mb/s). Expanding disk to 1TB did not improve performance at all. Enabling write/read host caching dropped performance by 30%. We also enabled managed disk for repo and OS disk, however we are not able to check if that affects theperformance?

In parallel we face issue with logon times with Full user layers enabled, logon time is increased by 20-30sec. The same image - UL disabled and persistent VDI - logon time 12 sec, non persistent random VDI with UL enabled - around 1 min logon. Looks like scheduler service timeouts - stopping ->starting.

Sign In

ELM performance Azure

Question

Wojciech Kruczkowski

Link to comment

5 answers to this question

Recommended Posts

Rob Zylowski

Link to comment

Richard Buffone

Link to comment

Wojciech Kruczkowski

Link to comment

Richard Buffone

Link to comment

Wojciech Kruczkowski

Link to comment

Create an account or sign in to comment

Create an account

Sign in

Discussions

Netscaler

Citrix

Tech Zone

Community Articles

Resources

Events

Education