• View Communities
    • Citrix Developer Network
      The place for unfiltered straight talk on Citrix products. Blogs, code downloads, best practices, APIs, and more can all be found here.
    • Citrix Ready Community Verified
      Does it work with Citrix? Application compatibility questions are a thing of the past with the new Citrix Community Verified site.
    • Blogs
      Learn the latest from the Citrix employees who are building application delivery infrastructure technologies.
    • Blogosphere
      The Citrix Blogosphere is a window into the thousands of conversations taking place about Citrix and Application Delivery.
  •  Sign In
The Citrix Blog
Personal Blog
Simon Crosby
Related Tags
Version 1 by Simon Crosby
on Apr 13, 2009 01:11.


 
compared with
Current by Simon Crosby
on Apr 13, 2009 01:12.


 
Key
These lines were removed. This word was removed.
These lines were added. This word was added.

View page history


There are 4 changes. View first change.

 In a [recent post|http://community.citrix.com/blogs/citrite/simoncr/2009/03/31/XenServer+and+Nehalem+-+A+Perfect+Pairing] I posted some data to show that we are getting terrific performance results for XenServer and Intel Nehalem based servers.  In the first formal set of tests we found that the bottleneck on performance lies in the fact that the hypervisor still has to perform I/O on behalf of all guests, and so the system scaling limit is the rate at which we can scale the internal I/O stack.  I postulated that we would get some impressive numbers for  Nehalem based platforms using IOV enhanced 10Gb/s NICs, and contacted our friends at  Solarflare, asking if they would help to run some numbers using their 10Gb/s NICs, which offer a powerful direct hardware-to-guest acceleration path that avoids the necessity for the hypervisor to process I/O on behalf of the guests - allowing guests to interact with the hardware directly.Below is a summary of the initial findings for the  the Nehalem tests using XenServer 5.0 and Solarflare I/O acceleration.  Thanks to Steve Pope of Solarflare for his help.  It turns out that with a smart I/O architecture such as the Solarflare offload stack, when guests interact directly with I/O safe hardware, we can dramatically change the system performance, and basically saturate a 10Gb/s link, in both directions at the same time\! :
  
 Here's how the experiment is set up.  We have 2 physical servers, A and B, connected back to back with Solarflare 10G Ethernet gear. Each server is running XenServer 5.0 Update 3 with a single 8 logical core Nehalem CPU.
  
 To create a traffic workload between the servers we ran  NetPerf TCP_STREAM pairs between Linux RHEL 5 guests (each pair spans server A and server B) and measured the aggregate throughput both with and without acceleration.\\
  To create a traffic workload between the servers we ran  NetPerf TCP_STREAM pairs between Linux RHEL 5 guests (each pair spans server A and server B) and measured the aggregate throughput both with and without acceleration.
 \\
  
*Non-accelerated{*}The configuration used 4 guests transmitting from A to B and 4 guests from B to A.  Raw results were: 
  *Non-accelerated*
  
* (A -> B) 1094 + 1068 + 1046 + 1128 = 4336 Mbps
  The configuration used 4 guests transmitting from A to B and 4 guests from B to A.  Raw results were: 
 * (A \-> B) 1094 + 1068 + 1046 + 1128 = 4336 Mbps
 * (B \-> A) 1019 + 1028 + 1050 + 1021 = 4118 Mbps
  
 +Total:+ *{+}8.45{+}* +Gbps;+ Bottleneck: Hypervisor CPU
  
 In other words, we confirmed the hypothesis that there is plenty more system capacity but that the hypervisor is I/O bottlenecked on behalf of the guests.\\
  In other words, we confirmed the hypothesis that there is plenty more system capacity but that the hypervisor is I/O bottlenecked on behalf of the guests.
 \\
  
 *Accelerated*
 As previously, the configuration used 4 guests transmitting from A to B and 4 guests from B to A.  Raw results were:
 * (A->B) 2355 + 2318 + 2296 + 2289 = 9258 Mbps
 * (B->A) 2285 + 2295 + 2315 + 2350 = 9245 Mbps
  
 +Total:+ *{+}18.50{+}* +Gbps+
  
 In the accelerated scenario we have basically maxxed out bidirectional I/O on a single 10Gb/s link, with only 4 guests\! This is awesome.  I should mention also that the Solarflare architecture is remarkably clean and avoids much of the pain of dealing with SR-IOV (which deserves a full post in its own right, and I'm half way through noodling on).