How to do Connection Draining in GCP

Overview

Connection draining is an important feature of autoscaling support that handles client connection termination gracefully when a scale-in happens.

Google Cloud Platform(GCP) does not expose the required life cycle hooks and APIs for scale-in control to the third party load balancers like NetScaler to implement connection draining in the GCP environment.

This article discusses a potential solution to achieve Connection draining with NetScaler on Google Cloud Platform.

Backend autoscaling in NetScaler

Backend auto scaling solution using NetScaler comprises a server farm. This server farm is a managed service offered by Google (cloud) and is called a managed instance group(MIG).

Every MIG is associated with a scaling policy that the customer defines and is used by the cloud in determining scaling decisions to provide elasticity to the group. Typical parameters that determine the load are CPU utilization and throughput.

This MIG is front ended by the NetScaler which acts as a load balancer. A scale-in operation involves shutting down one or more servers from the farm when perceived load is below the threshold levels and scale-out involves adding one or more servers to the farm when load is above the threshold levels.

NetScaler today periodically polls the backend server farm to get an update on whether there is a change in the server farm and makes changes to the local load balancing configuration based on the latest state of the servers it queries.

What is connection draining?

Connection draining is a process that ensures that existing and in-progress requests to a server are given time to complete when the policy engine decides to remove the server from the server farm.

This is typically accomplished by delaying the shutdown of the server by various means so that existing connections being served by the server are completed.

The other aspect of connection draining is that new connections which get hashed to the server being taken away need to be hashed to a new server.

Problem with scale-in operation on GCP

When a server in the backend pool is being shutdown as part of a scale-in decision, NetScaler does not get to know until the server is actually shut down. This is because the scale in and scale out operations are completely managed and executed by Google.

As a result all the existing sessions that are currently being handled by that particular server can be terminated abruptly resulting in client server connections being reset. Some applications (like gaming, finance) are sensitive to such loss and require connection draining so that connection terminations are handled gracefully.

Ideally if GCP provides life cycle hooks so that in the event of a scale in operation, NetScaler gets to know before the server is being shutdown and GCP waits for NetScaler to give a go ahead we can solve this problem by introducing a pre configured or pre determined wait time.

Unfortunately GCP does not provide any such hooks to achieve graceful termination. Though GCP internally implements connection draining for native load balancers.

Delayed shutdown of the application server

One way is to introduce some delay into the shutdown process of the applications server so that NetScaler gets some time to absorb the change and redirect new connections to a different server while server gets extra time to server the connections.

We can introduce a delay in the shutdown of an instance by adding custom code in the modules that manage the startup and shutdown of the system, for example Systemd on Linux. Systemd units can be modified to add delays and alter shutdown/startup sequence of the systems. However, we need to make sure that the appropriate backend process (i.e. nginx, apache, etc.) keeps running during the period where the connections continue to be served. Otherwise connections will be reset and delaying shutdown does not serve the purpose.

Note: In the example below, the Apache HTTP server, controlled by the httpd service systemd unit file, is used as an example. Please adjust the solution accordingly if using a different service.

There are two ways we can achieve this.

Method 1

The systemd Unit corresponding to the httpd service (typically apache.service) can be modified to add a ExecStop directive to add a delay up to 120 sec.

ExecStop directive executes a command that follows it. If there is a sleep command, it would be executed prior to stopping the apache service. Please note that positioning of the directive is important here. The directives are executed in order.

Please see the sample code below.

[unit]Description=The Apache HTTP ServerAfter=network.target remote-fs.target nss-lookup.targetDocumentation=https://httpd.apache.org/docs/2.4/[service]Type=forkingEnvironment=APACHE_STARTED_BY_SYSTEMD=trueExecStart=/usr/sbin/apachectl start+ExecStop=/bin/sleep 120ExecStop=/usr/sbin/apachectl stopExecReload=/usr/sbin/apachectl gracefulPrivateTmp=trueRestart=on-abort[install]WantedBy=multi-user.target

Method 2

We can have a shutdown script defined as part of the metadata section of the GCP VM console. These shutdown scripts are plugged into the shutdown process of the server. However, GCP plugs in the shutdown script defined as part of the metadata after the http service in the shutdown sequence.

As a result, http service gets shut down and the shutdown script will be executed. To mitigate this problem, the shutdown sequence of the services/Units needs to be altered so that shutdown scripts are executed before the http service.

A sample shutdown script and code are as shown below.

[unit]Description=The Apache HTTP ServerAfter=network.target remote-fs.target nss-lookup.targetBefore=google-shutdown-scripts.serviceDocumentation=https://httpd.apache.org/docs/2.4/         [service] Type=forkingEnvironment=APACHE_STARTED_BY_SYSTEMD=trueExecStart=/usr/sbin/apachectl startExecStop=/usr/sbin/apachectl stopExecReload=/usr/sbin/apachectl gracefulPrivateTmp=trueRestart=on-abort    [install]WantedBy=multi-user.target

Conclusion

Though small, both the above approaches require altering scripts/configuration of the image which is typically in the administrative domain of the customer. With both these approaches, shutdown could be extended up to 120 sec.

Please note that true connection draining may still not be achieved within this time window, in case of a workload that has longer lived connections.

For full connection draining support, NetScaler needs to have full control on the shutdown timing of the backend servers. However the current approach can be used to prevent abrupt termination of short lived connections.

Sign In

How to do Connection Draining in GCP

Overview

Backend autoscaling in NetScaler

What is connection draining?

Problem with scale-in operation on GCP

Delayed shutdown of the application server

Method 1

Method 2

Conclusion

User Feedback

Recommended Comments

Create an account or sign in to comment

Create an account

Sign in

Trending Content

Discussions

Netscaler

Citrix

Tech Zone

Community Articles

Resources

Events

Education