I'm running some performance tests to understand the behavior of Xen and NUMA allocation. The brief results summary is: I have demonstrated using the various sysbench memory benchmarks that memory performance is significantly impacted (~30%) if the vCPUs are either left unpinned or if I pin half of the vCPUs in a VM to different physical CPUs. That all makes sense. The result that doesn't make sense to me is that if I run the test with all vCPUs pinned to one physical CPU, then I switch them all to be pinned to the other physical CPU, I see the same memory benchmark performance.
The system under test has 2 20-core (40 hyperthreaded) Intel 6230 processors, and for the test I created 2 VMs, each allocated 40 vCPUs and I ran the sysbench tests on both VMs simultaneously in order to exercise all cores and NUMA channels at once.
In the last test my expectation was that I would see worse performance than in the 50/50 split test, but the performance was the same as it was in my initial test. I also did a test where I kept the vCPUs pinned each to their separate physical CPU, but every 2 seconds I swapped which CPUs they were pinned to (each call to sysbench used 10 second runs, so I would expect 4 swaps during each test), but still it ran at the same rate as the first test when I kept the vCPUs pinned to separate physical CPUs for the whole test run.
This is leading me to wonder how memory is allocated to VMs. My original goal with this test was to see if I could cause a VM's memory to be moved to a different NUMA channel if I changed the vCPU pinning. I thought that this would require a shutdown of the VM to release the memory from use, but that doesn't seem to be the case. Our VMs use fixed memory allocation. The only explanation I can come to is that reserved physical memory is allocated to VMs on demand when something in the VM tries to allocate memory and that sysbench allocates and releases memory frequently so when the pinning swap happened it basically had no effect.
I'm going to try another test now using a JVM which will reserve a large block of memory on startup to see if I see the same results.
Can someone shed more light on how I can optimize for NUMA allocation after a VM has been created? Or am I right about how sysbench works, and we only need to pin before the application under test allocates its memory?
Question
Dan Q
I'm running some performance tests to understand the behavior of Xen and NUMA allocation. The brief results summary is: I have demonstrated using the various sysbench memory benchmarks that memory performance is significantly impacted (~30%) if the vCPUs are either left unpinned or if I pin half of the vCPUs in a VM to different physical CPUs. That all makes sense. The result that doesn't make sense to me is that if I run the test with all vCPUs pinned to one physical CPU, then I switch them all to be pinned to the other physical CPU, I see the same memory benchmark performance.
The system under test has 2 20-core (40 hyperthreaded) Intel 6230 processors, and for the test I created 2 VMs, each allocated 40 vCPUs and I ran the sysbench tests on both VMs simultaneously in order to exercise all cores and NUMA channels at once.
In the last test my expectation was that I would see worse performance than in the 50/50 split test, but the performance was the same as it was in my initial test. I also did a test where I kept the vCPUs pinned each to their separate physical CPU, but every 2 seconds I swapped which CPUs they were pinned to (each call to sysbench used 10 second runs, so I would expect 4 swaps during each test), but still it ran at the same rate as the first test when I kept the vCPUs pinned to separate physical CPUs for the whole test run.
This is leading me to wonder how memory is allocated to VMs. My original goal with this test was to see if I could cause a VM's memory to be moved to a different NUMA channel if I changed the vCPU pinning. I thought that this would require a shutdown of the VM to release the memory from use, but that doesn't seem to be the case. Our VMs use fixed memory allocation. The only explanation I can come to is that reserved physical memory is allocated to VMs on demand when something in the VM tries to allocate memory and that sysbench allocates and releases memory frequently so when the pinning swap happened it basically had no effect.
I'm going to try another test now using a JVM which will reserve a large block of memory on startup to see if I see the same results.
Can someone shed more light on how I can optimize for NUMA allocation after a VM has been created? Or am I right about how sysbench works, and we only need to pin before the application under test allocates its memory?
Link to comment
6 answers to this question
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now