Jump to content
Welcome to our new Citrix community!
  • 0

CPU cores and sockets sizing for Windows Server 2016 MCS VDA


osgnetova_ctx

Question

Hi,

 

We have 6 ESXi 6.0  hosts with 2 sockets and 14 cores per socket  each and 512 RAM. There are 36 MCS provisioned VDAs Win2016 (XD 7.15 CU4.), 6-7 VMs on each host.

 

Each VDA has:

- 4 vCPU and 1 core per socket configuration,

- 40GB RAM

 

Earlier we had in average 5-7 users on each VDA and everything was OK. After much more users came, so it's 9-11 on each VDA we faced to high CPU usage.

 

The question is -  What is most optimal configuration for CPU and it's core and sockets for environment we have? Should performance be better if I change cores count, for instance for 4 vCPU I set 4 cores per socket or 2 cores per  2 sockets? I know about possible license issues. 

 

This environment was created to serve 360 users but in reality if there are 280-300 active sessions there is already high CPU usage, especially users who working with Chrome (eats sometimes above 50% CPU by 1 user)

 

Thank you for any advise!

 

Regards, Olga

Link to comment

11 answers to this question

Recommended Posts

  • 0

no problem, but also be aware of https://docs.citrix.com/en-us/tech-zone/design/design-decisions/single-server-scalability.html

citrix suggests 2:1 ratio of virtual cpu to physical cpu

 

if you review the numbers you gave and assigned 4 vcpu and 32 gb

if you were to max out the number virtual machines on a single host, 16 VM (64vcpu, and 512 gb)

 

you would have a over commit ratio of 64vcpu/28pcpu.

2.20

 

you should play with the ratios if you can to size the virtual machines for your workload.

 

you will find though that, if you have too much memory assigned to the machine and not enough cores.

when the server does get under load by users, your virtual machine vcpu cores will all be saturated and the virtual machine will not be able to keep up with the number of sessions allowed by the memory and user experience will tank (you would see this in ICA RTT as user would experience delays).

 

you want to maximize memory and vcpu usage for each virtual machine. i found I can have around 42 sessions on each machine with that sizing, but this is with a shared desktop. I found 4vcpu and 32gb ram the sweet spot. yours may be different.

 

you can also use load evaluator rules to mark servers as full to prevent new connections,

say 90% cpu utilization

       80% ram utilization

 

then examine the servers and ensure both memory and cpu usage are being maxed out inside the virtual machine when the servers are being marked as full

this would make sure you have the sizing right. this may be difficult to do in production. at the same time, the physical host ram and cpu usage would be maxed out when the servers are full that is the ideal scenario

  • Like 1
Link to comment
  • 0

Best is to pay attention to NUMA and try to get all your cores running on a single socket. I'd try 4 coresper socket first and if not enough, 6 or 8 cores per socket and make sure you start up those VMs with the more VCPUs first to make sure they can fit on one socket before launching the others. See for example https://www.mycugc.org/blogs/tobias-kreidl/2019/04/30/a-tale-of-two-servers-part-3 for tips and tricks.

 

-=Tobias

Link to comment
  • 0

vmware takes care of the cores versus sockets at the scheduler level now and configuring cores versus sockets inside the gui is strictly for licensing now.

for example windows 10 desktops can only have two sockets.

 

i would start at 4vcpu 32 GB, note you will have some cpu over commitment.

for example 9 machines have, a total of 36 cores assigned. when the servers are busy they will need to fight over the cpu time(you can check %rdy etc)

this will tell you if overcommittment is the problem.

 

normally its okay to overcommit,  you want to stay under 3:1 ratio if you can as a general rule.

the closer you are to 1:1 the less wait time you will have.

 

chrome high cpu usage can be caused by cleanup task

make sure to disable https://www.chromium.org/administrators/policy-list-3#ChromeCleanupEnabled

 

 

Link to comment
  • 0
8 hours ago, Tobias Kreidl said:

Best is to pay attention to NUMA and try to get all your cores running on a single socket. I'd try 4 coresper socket first and if not enough, 6 or 8 cores per socket and make sure you start up those VMs with the more VCPUs first to make sure they can fit on one socket before launching the others. See for example https://www.mycugc.org/blogs/tobias-kreidl/2019/04/30/a-tale-of-two-servers-part-3 for tips and tricks.

 

-=Tobias

Thank you, Tobias!

Link to comment
  • 0
2 hours ago, Adam Walker said:

vmware takes care of the cores versus sockets at the scheduler level now and configuring cores versus sockets inside the gui is strictly for licensing now.

for example windows 10 desktops can only have two sockets.

 

i would start at 4vcpu 32 GB, note you will have some cpu over commitment.

for example 9 machines have, a total of 36 cores assigned. when the servers are busy they will need to fight over the cpu time(you can check %rdy etc)

this will tell you if overcommittment is the problem.

 

normally its okay to overcommit,  you want to stay under 3:1 ratio if you can as a general rule.

the closer you are to 1:1 the less wait time you will have.

 

chrome high cpu usage can be caused by cleanup task

make sure to disable https://www.chromium.org/administrators/policy-list-3#ChromeCleanupEnabled

 

 

Thank you, Adam!

Link to comment
  • 0
2 minutes ago, Adam Walker said:

note, what you found also aligns with the general rule

see, https://www.citrix.com/blogs/2017/03/20/citrix-scalability-the-rule-of-5-and-10/

 

you have 28 pcpu cores x 10. is roughly 280 users.

you could easily fit more users with the ram you have, if you had higher core count processors.

 

Adam, thank you for so detailed answer :) There is a lot of new info for me.  I'll try to use both VMs configuration (32GB RAM, 4cores per socket for 4vCPU and the same but with 40GB RAM), thanks! 

 I'm  also thinking about NUMA  (it mentioned here  https://virtualfeller.com/2017/09/18/sizing-windows-2016-windows-2012-and-windows-10-virtual-machines/ ), I checked it on few servers with current configuration with tool coreinfo and I got  "No NUMA nodes" in the result. It is because I assigned less vCPUs to VMs  than pCPUs on host and NUMA technology is not used? Or it means I use only one NUMA node for all VMs on each single host? Is it good or bad? :) Sorry if it is silly question but I'm new in this topic and this topic is quite hardcore for me. The problem is also that except MCS VDAs on hosts we have Golden Masters VMs and few test servers (not provisioned, they have less RAM and CPU assigned). Should I assign to them also the same vCPUs count/cores per socket/RAM to have proper CPU sizing and better performance? 

Link to comment
  • 0

its numa on the hypervisor not on virtual machines.

see, https://blogs.vmware.com/performance/2017/03/virtual-machine-vcpu-and-vnuma-rightsizing-rules-of-thumb.html

 

based on the article your numa nodes would be per host

1 pNUMA with 14 cores and 256 GB

1 pNUMA with 14 cores and 256 GB

2 pNUMA per host

 

since your virtual machines are under the numa limit  (lower than 14 cores and 256) and dont span hosts should be good :).

follow advice in article,

 

  1. Always configure the virtual machine vCPU count to be reflected as Cores per Socket, until you exceed the physical core count of a single physical NUMA node OR until you exceed the total memory available on a single physical NUMA node.
  2. When you need to configure more vCPUs than there are physical cores in the NUMA node, OR if you assign more memory than a NUMA node contains, evenly divide the vCPU count across the minimum number of NUMA nodes.
  3. Don’t assign an odd number of vCPUs when the size of your virtual machine, measured by vCPU count or configured memory, exceeds a physical NUMA node.
  4. Don’t enable vCPU Hot Add unless you’re okay with vNUMA being disabled.
  5. Don’t create a VM larger than the total number of physical cores of your host

 

look at the following article

 

https://download3.vmware.com/vcat/vmw-vcloud-architecture-toolkit-spv1-webworks/index.html#page/Core Platform/Architecting a vSphere Compute Platform/Architecting a vSphere Compute Platform.1.019.html

 

as mentioned, vmware recommendation is stay under 3:1 while citrix is more conservative with 2:1. I would tend to go with citrix recommendation as xenapp is a heavy workload.

 

monitor the metrics if performance is poor,

http://www.gabesvirtualworld.com/how-too-many-vcpus-can-negatively-affect-your-performance/

 

look specifically at co-stop. you could bump up memory to 36 GB and 4vcpu, total of 14 VM per host.

right at the 2:1 ratio

 

play around a little till you find right number 

 

 

 

 

Link to comment
  • 0
On 10/27/2019 at 4:35 AM, Adam Walker said:

its numa on the hypervisor not on virtual machines.

see, https://blogs.vmware.com/performance/2017/03/virtual-machine-vcpu-and-vnuma-rightsizing-rules-of-thumb.html

 

based on the article your numa nodes would be per host

1 pNUMA with 14 cores and 256 GB

1 pNUMA with 14 cores and 256 GB

2 pNUMA per host

 

since your virtual machines are under the numa limit  (lower than 14 cores and 256) and dont span hosts should be good :).

follow advice in article,

 

  1. Always configure the virtual machine vCPU count to be reflected as Cores per Socket, until you exceed the physical core count of a single physical NUMA node OR until you exceed the total memory available on a single physical NUMA node.
  2. When you need to configure more vCPUs than there are physical cores in the NUMA node, OR if you assign more memory than a NUMA node contains, evenly divide the vCPU count across the minimum number of NUMA nodes.
  3. Don’t assign an odd number of vCPUs when the size of your virtual machine, measured by vCPU count or configured memory, exceeds a physical NUMA node.
  4. Don’t enable vCPU Hot Add unless you’re okay with vNUMA being disabled.
  5. Don’t create a VM larger than the total number of physical cores of your host

 

look at the following article

 

https://download3.vmware.com/vcat/vmw-vcloud-architecture-toolkit-spv1-webworks/index.html#page/Core Platform/Architecting a vSphere Compute Platform/Architecting a vSphere Compute Platform.1.019.html

 

as mentioned, vmware recommendation is stay under 3:1 while citrix is more conservative with 2:1. I would tend to go with citrix recommendation as xenapp is a heavy workload.

 

monitor the metrics if performance is poor,

http://www.gabesvirtualworld.com/how-too-many-vcpus-can-negatively-affect-your-performance/

 

look specifically at co-stop. you could bump up memory to 36 GB and 4vcpu, total of 14 VM per host.

right at the 2:1 ratio

 

play around a little till you find right number 

 

 

 

 

Thank you very much! 

Link to comment
  • 0
20 hours ago, Adam Walker said:

were you able to successfully resolve the issue?

Hi Adam,  I got new hosts with different configuration than from my question so  I configured 24Vms with 3vCPU and 3core per socket, 40GBRAM with 10GB for RAM cache,  on 6 ESXi hosts (2 sockets, 6 cores per socket, 256GBRAM each host). I believe these VMs will serve 240 users concurrent sessions. At least now VM with 3vCPUs and 3cores per socket shows much better performance and density than VM with 4vCPU and 1 core per socket (with the same RAM settings).  Next week I plan to reconfigure 36 VMs from my question and set 4vCPUs/4cores per socket, also to change RAM cache from 8GB to 10GB following Citrix VDI design recomendations for heavy users (there are many heavy users in our environment). So in final I expect from 60VMs to be able to serve 600 users sessions.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...