Jump to content
Welcome to our new Citrix community!

Delivering superior performance on Linux with NetScaler ADC BLX


Guest Sara Austin
  • Validation Status: Validated

Delivering superior performance on Linux with NetScaler BLX

Submitted on: September 29, 2021

Author: Ioannis Dounis

 

NetScaler BLX is a software version of NetScaler  that delivers high performance and a rich set of features to your Linux server. Because it’s a Linux (daemon) process, getting the most out of it requires optimising aspects of your host system for the best performance. Check out this video to learn more about NetScaler BLX and how to deploy it.

NetScaler BLX gives you lightning-fast performance on your Linux server, along with extraordinary configurability. It operates on a custom user-space networking stack, which means performance is unaffected by continuous switching between kernel and user-space contexts. Also, NetScaler BLX, as a form of NetScaler , is distinguished from other products because it enables the user to configure protocol and feature-level options in great detail, without changing host system settings.

For example, by creating one or more TCP profiles, you can set any TCP protocol option specifically for each individual service (e.g., per load balancing server). This level of granularity removes the burden of changing system-wide settings, which can lead to configuration errors and, possibly, different behaviour between different Linux system kernels.

NetScaler BLX comes with and without DPDK support. In this post, the first in a series, we’ll look at how to optimise a Linux server for deployment in DPDK mode. Enabling DPDK for BLX means that packets reach BLX’s user-space networking stack directly, without Linux kernel processing.

You can learn more about deploying NetScaler BLX in DPDK Mode in our documentation.

Step 1 – Considering NUMA topology

The first step in our optimisation journey is to understand our server’s NUMA topology. (You can learn more about NUMA here.

We’ll need to identify the NUMA nodes of the data plane network interfaces that we’ll assign to DPDK drivers, then configure the NetScaler BLX to use them. We will use this information in Step 2, where we’ll choose the set of CPUs our BLX worker processes will run on.

With the NIC still in kernel mode (not yet assigning a DPDK poll mode driver, as described in BLX DPDK documentation linked earlier), we can easily get the NUMA node of our NIC(s) by examining the following:

/sys/class/net/$NIC_NAME/numa_node where $NIC_NAME is the Linux name of our NIC

(i.e., eth0, eth1, ens1)

We’ll allocate one network interface for NetScaler  BLX’s data plane — named “ens1f0” — on our Linux server. Examining the above file, we get:

adc-blx-1 (1).png

So, our NIC belongs to NUMA Node 0. If we get a value of -1 instead, we operate on a non-NUMA machine and we can use any combination of CPUs without any NUMA-related impact on performance.

If there are multiple data NICs, you should take into account all NUMA nodes and split the NetScaler BLX worker processes’ CPU affinity between them accordingly.

We have observed a stunning 20 percent performance improvement just by taking into consideration the NUMA locality!

Step 2 – Isolation of Cores.

Now, we can decide which set of CPU cores to isolate for our NetScaler BLX worker processes. Isolating a core means that no other user-space process will be scheduled by the Linux kernel to run on it, and our BLX worker processes get full attention.

Please note, only BLX worker processes will execute on isolated cores. Other BLX processes will be scheduled to run on different system cores.

Let’s begin by identifying the NUMA node for each CPU of our server using the “lscpu” command:

adc-blx-2.png

That’s a lot of output about our system’s CPUs, but we’re just interested in NUMA node{n}. We can see that on our 24-core system, cores 0-11 are in node0 and cores 12-23 are on node1. Our NIC belongs to node0, so we’ll need to select as many CPUs from this set as possible because we’re going to pin BLX worker processes and CPUs on a 1-1 basis. If we’re deploying BLX with five worker processes, we’ll use five CPUs from node0. We’ll pick [0-4] and then isolate them.

We can then use isolcpus, a kernel command line parameter, to isolate our CPU set. Using grub, we can set this by altering GRUB_CMDLINE_LINUX setting in /etc/default/grub, adding or modifying the isolcpus option to isolcpus=0,1,2,3,4. More details are available at https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html and https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html.

adc-blx-3.png

Next, we’ll generate a new grub configuration file pointing to the correct path for our system. For example, on an EFI CentOS-based server, you’ll get:

grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg

Then we’ll reboot our system for the isolation to take effect.

Please refer to your Linux distribution guide for distribution-specific tools and paths.

Step 3 – Setting BLX Main Worker CPU Affinity

After our server reboots, we can verify that our command was executed as a kernel command line parameter by issuing the command cat /proc/cmdline. The exact setting is configured in /etc/default/grub.

adc-blx-4.png
Click image to view larger.

Having isolated the appropriate cores, we can use them to set the affinity for our BLX worker processes.

As described in this BLX DPDK deployment article, for BLX in DPDK mode, we’ll need to edit

/etc/blx/blx.conf

We’ll set CPU affinity by manipulating this field:

worker-processes: -c

The -c option of this fields represents a hexadecimal bit mask of the CPU we want to set the affinity on. In our case, for CPUs [0-4] we can calculate the bit mask as follows:

adc-blx-5.png

Each hexadecimal digit represents four CPUs because it’s composed of four binary digits. Setting a binary digit to “1” specifies that we’ll assign BLX worker processes to it. In our example, a 24-core system, we’ll have six hexadecimal digits representing our bit mask; we want to set the affinity to cpus[0-4] for a five-worker process BLX, which means setting a value of “1” to bits [0-4].

Our final hexadecimal bit mask is 0x1f and the option is set as "worker-processes: -c 0x1f".

Summary

NetScaler ADC BLX gives you the feature-rich NetScaler ADC as a standalone process on Linux machines. With the right system optimisations in place, your BLX deployment can enable you to utilise your server resources to help you lower latency and maximise performance.

In our next post, we’ll look at optimising our BLX DPDK deployment at the network protocol level according to an organisation’s specific needs, and we’ll demonstrate the unique, fine-grained configurability of NetScaler ADC!


User Feedback

Recommended Comments

There are no comments to display.



Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...