Jump to content
Welcome to our new Citrix community!
  • 0

XenServer 7.5 - NFS performance... could be better

Matthew Weiner


I'm curious what kind of numbers others are seeing in practice on NFS.  I have a group of a mixture of Dell R730s and R740s, all Broadcom 10 gig NICs talking to a Dell FS8600 10 gigabit NAS on a flash hybrid SAN.  I can get up to about 120-150 MB/s through the control domain if I, say, copy or move a VM, usually anywhere from 60-90 in the VM, and if I just do a straight DD from /dev/zero to the NFS mount in dom0 I can get over 300.  Of course it's probably "cheating" if I just write a string of zeroes.  Dom0 has 16 gig of RAM, servers aren't CPU constrained at all during testing.


Being pure 10 gigabit and predominately on SSDs, I figured I should be able to go faster, but these numbers seem to correlate with what I've seen other people say they're getting so I'm just curious for my own knowledge what I'm getting vs. what I theoretically "should."

Link to comment

14 answers to this question

Recommended Posts

  • 1

Tobias, agreed but where we'd rather be is not actually requiring our customers to need to go in and be manually tuning things. It's better for everyone if the system has enough smarts in it to adapt to the environment, the new Linux kernel gives us a lot of that and we've made changes of our own to the storage IO components to adaptively manage resource use vs performance.


I should have pointed out that the data above was a basic install, straight out of the box, with no additional tuning applied to the XenServer (the storage server is heavily tuned to extract every bit of performance out of the SSDs but that's an external thing as far as XenServer is concerned).

  • Like 1
Link to comment
  • 0

There are a number of tweaks you can do to improve 10 Gb performance. Here''s what I use. Some of these may be less effective on XS 7.X, but they sure helped under 6.5:


# customizations

# modify below to match your 10 Gb NICs
ifconfig eth4 txqueuelen 300000
ifconfig eth5 txqueuelen 300000
# modprobe tcp_cubic
sysctl -w net.ipv4.tcp_congestion_control=cubic
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216
sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"
sysctl -w net.core.netdev_max_backlog=300000
#add the following to help with jumbo frames and 10 Gb Ethernet:
sysctl -w net.ipv4.tcp_mtu_probing=1


I put these into /etc/rc.local

I'd be interested to see your before and after metrics. These can be changed on and off on the fly, BTW.

One other point: I don't really care for Broadcom NICs, and am using exclusively Intel for 10 Gb connections.


Link to comment
  • 0

You should, depending on block size, queue depth and concurrency be able to hit close to wire speed both in Dom0 and inside the guest. In general bigger block sizes, deeper IO queues and more concurrency will give you higher throughput than smaller block sizes and shallower IO queues.


A typical test that we run is for


Blocks Sizes: 512b, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1M, 2M, 4M

Queue depth: 1, 64

Job concurrency: 1, 2, 4


With storage that is faster than the network you end up with data which graphs out as a ramp up that tends towards the 10g wire speed as block sizes increase, increasing the queue depth and/or job count moves the ramp up towards the smaller block sizes.


Mark, Citrix Hypervisor Storage Engineering.

Link to comment
  • 0

Tobias makes a good point on tuning but ideally we'd expect the newer 4.x kernel in XenServer 7.5 to have better defaults anyway.


There is a good description of the options here - https://cromwell-intl.com/open-source/performance-tuning/tcp.html


Tobias - Note that you should probably be setting 


net.ipv4.tcp_mem = 1638400 1638400 1638400


or similar in order to accommodate your increased read and write buffer sizes.

Link to comment
  • 0

I'd be very interested in NFS parameter optimizations. And some of the other TCP thresholds might have to be adjusted, as well. That particular settings opens up a lot of "headroom" and likely depends on similar adjustments elsewhere. Many thanks in advance, Mark; it's an very interesting area an with so many parameters and so many inter-dependencies, it's hard to hit upon a combination that's close to ideal, especially with 10 Gb or faster NICs.



Link to comment
  • 0

Thanks, everyone.  I'll try some of these numbers and see what works best for my environment.  I'll also have to see what this NAS is capable of.  I did a pure 10 gigabit line speed SMB benchmark test on it and only got about two, so I'm going to have to have a conversation with Dell.  I know the two protocols do have a lot of overhead, but I still feel that with 16 cores of CPU, 96 gigs of RAM, and 8 10 gig connections, it should certainly be better.


The additional TCP/IP memory specified by net.ipv4.tcp_mem = 1638400 1638400 1638400 actually appears to work better for me than the defaults.  Not just in NFS, but in general.  Possibly because I increased the dom0 RAM quite a bit.  These servers all have quite a bit of memory and I have had issues with dom0 memory exhaustion in the past so I set them all to 16 gig.


Funny you say you have had issues with Broadcom vs. Intel 10 gig NICs.  I decided to go Broadcom in this server order because I had the opposite issue, I always felt like my Intel NICs were "driving with the brakes on" so to speak and never performed as well as I felt they should.  In vSphere they seemed alright, but XenServer I never felt like I got the performance they ought to deliver.  I always had tremendous luck with Broadcom gigabit NICs so I decided to give their 10 gigabit ones a try.


Link to comment
  • 0

I have Intel NICS and they seem to do the job for me. I only have 10Gb for storage, so moving VM's still falls back over 1Gb links.

Performance testing is as much voodoo as it is science. There are so many different ways/kinids of tests and all of the caching 

at various places can skew your numbers. You just have to establish a baseline and go for it.  I mostly use Windows VM's so I

use Crystal BenchMark to hit my C drive on Windows 10 which is provisioned over 10Gb links to my Tegile SAN. I get about 

330MB/Sec reads and 100MB/Sec writes on it.




Link to comment
  • 0

I promised to do this yesterday but time got away from me.


Benchmark tests on XenServer 7.6 NFS SR on 10gb networking to SSD storage.


CentOS 7.5, 4GB, 4vCPU,


Random Write

Description, 512b, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1m, 2m, 4m
Job count 1 - QD 1,1626,3460,5372,11410,16557,5495,8113,4502,51668,58555,94629,147821,243751,309772
Job count 1 - QD 64,1814,7334,20974,82170,100568,90216,136388,353701,684489,950830,924146,957909,956722,962963
Job count 2 - QD 1,4940,10017,15625,31543,61486,117176,212054,308920,510634,710918,894331,981640,959983,939244
Job count 2 - QD 64,27541,53852,103987,200830,356566,596115,848285,851060,978316,954520,939261,949149,972896,954149
Job count 4 - QD 1,9205,17657,28285,57795,101986,193881,346847,531450,773202,926117,925658,937227,947647,942464
Job count 4 - QD 64,26534,52582,100709,195533,350616,573559,839565,826653,941959,928941,934213,958228,979855,993689


Random Read

Description, 512b, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1m, 2m, 4m
Job count 1 - QD 1,3272,5857,11894,23745,44546,61648,104957,176115,167683,61202,44883,379380,793492,614156
Job count 1 - QD 64,31120,60825,122239,230133,404268,637630,865729,851942,917228,934143,903215,927052,924123,908222
Job count 2 - QD 1,5313,10590,21014,42130,77890,139287,225783,366176,530246,652067,805052,909550,911842,938018
Job count 2 - QD 64,29739,58346,117589,228823,400269,639579,854140,852870,909434,910938,897033,889479,901377,912276
Job count 4 - QD 1,9802,18430,32956,65366,124186,232446,408620,579127,761299,868748,946394,947105,1005920,910786
Job count 4 - QD 64,29815,58436,114240,222899,398828,621872,844214,831626,943367,892948,920189,927536,911233,961590


Windows Server 2016, 4GB, 4vCPU


Random Write

Description, 512b, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1m, 2m, 4m
Job count 1 - QD 1,1047,2223,1398,10727,19817,40180,62434,103596,175221,335671,403808,519166,651895,738935
Job count 1 - QD 64,6349,18198,51601,190149,344629,576624,819575,800348,786842,858241,867343,869018,876273,884907
Job count 2 - QD 1,3776,8784,14839,29552,59713,106726,191291,282457,458539,628194,784685,922195,934657,931116
Job count 2 - QD 64,6261,18303,51658,191263,339814,563082,817079,810828,891706,881979,908600,919828,921597,916651
Job count 4 - QD 1,6183,14319,25823,52749,94138,178601,318291,469704,559863,824649,924219,932703,939366,911538
Job count 4 - QD 64,6304,18435,51769,190169,339232,575284,831966,806484,876924,882263,884962,904529,906493,912717



Random Read

Description, 512b, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1m, 2m, 4m
Job count 1 - QD 1,1930,4269,9106,18443,29917,36933,65581,127247,205038,80248,134024,345632,185493,353499
Job count 1 - QD 64,27778,56113,109374,220614,381110,611408,848405,876960,938078,895546,910435,874805,933275,897849
Job count 2 - QD 1,4639,8917,16502,33041,62890,119063,196816,332272,473733,659084,782157,858436,892539,909166
Job count 2 - QD 64,26579,53032,104711,200964,355429,572840,828700,796704,834748,840796,836705,864667,902990,907086
Job count 4 - QD 1,8348,15825,29393,59012,110720,203237,350964,506929,682170,819743,911385,901539,912426,883836
Job count 4 - QD 64,28351,57182,110328,215316,376348,606075,826243,774148,854977,870245,872032,884693,896974,882668



Test server is a Dell R430 with Intel X520 10g NIC, storage is from a Dell R730XD with 128GB Ram, 12x Samsumng 850Evo SSDs and dual bonded Intel X710

Link to comment
  • 0

Very interesting, thanks for posting this, Mark. Some vendors say you really ought to use their recommendations for the storage appliance block size (32k, for example) and also depending on other factors, real I/O chunks may be restricted to just 4k blocks, anyway, so the big block sizes won't help you.


I'd really like to see Citrix come up with some better network parameter settings for the default XenServer install. Years ago, some of you folks also wrote articles on optimization and how to tweak your servers. It'd be really helpful and appreciated to see something like that again. This article is from 2013!!!: https://www.citrix.com/blogs/2013/12/02/xenserver-performance-tuning-top-5-recommended-guides/ Plus, that article recommends using dd for benchmarking, which IMO is a horrible utility that fails to take into account cache and such.


This article by Rok Strniša is one of the better ones I've ever seen: https://wiki.xen.org/wiki/Network_Throughput_and_Performance_Guide

This is the sort of thing that need to be revisited and published as an update.



Link to comment
  • 0


Also agreed, and the key phrase is "not required": out-of-the-box should be pretty darn good so that those who are uncomfortable with the system internals do not need to do so.  Some hints on further optional (another key word) tuning for getting the most out of individual configurations would still be helpful.  And, yes, there are a ton of parameters and tweaking any one can cause issues with others (as seen above); it is a bit of an art, but can make a big difference overall.


And of course, just because an array is composed of SSD drives doesn't make it automatically super great to begin with; the RAID configuration, controller, network, and other factors all weigh in on performance. I actually had an array with spinning disks that handled writes better than an array of pure SSD drives.



Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Create New...