• View Communities
    • Citrix Developer Network
      The place for unfiltered straight talk on Citrix products. Blogs, code downloads, best practices, APIs, and more can all be found here.
    • Citrix Ready Community Verified
      Does it work with Citrix? Application compatibility questions are a thing of the past with the new Citrix Community Verified site.
    • Blogs
      Learn the latest from the Citrix employees who are building application delivery infrastructure technologies.
    • Blogosphere
      The Citrix Blogosphere is a window into the thousands of conversations taking place about Citrix and Application Delivery.
  •  Sign In
The Citrix Blog
Personal Blog
Simon Crosby
Related Tags
posted by Simon Crosby

In a recent post I posted some data to show that we are getting terrific performance results for XenServer and Intel Nehalem based servers.  In the first formal set of tests we found that the bottleneck on performance lies in the fact that the hypervisor still has to perform I/O on behalf of all guests, and so the system scaling limit is the rate at which we can scale the internal I/O stack.  I postulated that we would get some impressive numbers for  Nehalem based platforms using IOV enhanced 10Gb/s NICs, and contacted our friends at  Solarflare, asking if they would help to run some numbers using their 10Gb/s NICs, which offer a powerful direct hardware-to-guest acceleration path that avoids the necessity for the hypervisor to process I/O on behalf of the guests - allowing guests to interact with the hardware directly.Below is a summary of the initial findings for the  the Nehalem tests using XenServer 5.0 and Solarflare I/O acceleration.  Thanks to Steve Pope of Solarflare for his help.  It turns out that with a smart I/O architecture such as the Solarflare offload stack, when guests interact directly with I/O safe hardware, we can dramatically change the system performance, and basically saturate a 10Gb/s link, in both directions at the same time! :

Here's how the experiment is set up.  We have 2 physical servers, A and B, connected back to back with Solarflare 10G Ethernet gear. Each server is running XenServer 5.0 Update 3 with a single 8 logical core Nehalem CPU.

To create a traffic workload between the servers we ran  NetPerf TCP_STREAM pairs between Linux RHEL 5 guests (each pair spans server A and server B) and measured the aggregate throughput both with and without acceleration.

Non-accelerated

The configuration used 4 guests transmitting from A to B and 4 guests from B to A.  Raw results were: 

  • (A -> B) 1094 + 1068 + 1046 + 1128 = 4336 Mbps
  • (B -> A) 1019 + 1028 + 1050 + 1021 = 4118 Mbps

Total: 8.45 Gbps; Bottleneck: Hypervisor CPU

In other words, we confirmed the hypothesis that there is plenty more system capacity but that the hypervisor is I/O bottlenecked on behalf of the guests.

Accelerated
As previously, the configuration used 4 guests transmitting from A to B and 4 guests from B to A.  Raw results were:

  • (A->B) 2355 + 2318 + 2296 + 2289 = 9258 Mbps
  • (B->A) 2285 + 2295 + 2315 + 2350 = 9245 Mbps

Total: 18.50 Gbps

In the accelerated scenario we have basically maxxed out bidirectional I/O on a single 10Gb/s link, with only 4 guests! This is awesome.  I should mention also that the Solarflare architecture is remarkably clean and avoids much of the pain of dealing with SR-IOV (which deserves a full post in its own right, and I'm half way through noodling on).

Labels

xenserver xenserver Delete
grp-cto grp-cto Delete
lang-eng lang-eng Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Aug 31

    Max Olivas says:

    Is the hardware you used on the Citrix HCL for Xenserver?

    Is the hardware you used on the Citrix HCL for Xenserver?

Anonymous says:

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account. You can also Sign Up for a new account.