Jump to content
Welcome to our new Citrix community!

Actions Based on Server 'FIN' Response


Recommended Posts

Been hunting around and i suspect its just a knowledge shortfall on my part but before i wander too far down the rabbit hole figured it might be worth an ask on here first....

 

I have a VPX LB Entry that is configured on TCP, and has a single server in its service group that forwards onto a Windows Service - end to end all works.

 

I have protection configured on the above, with a secondary server, which has the same service running, and again end to end all works fine. 

 

The issue i currently have is trying to reduce the delay between traffic changing between the primary server to secondary (and vice versa).

 

We are currently testing with a clean shutdown of the Windows service that looks like its been configured to send up a 'FIN' response, this appears to be being passed back to the client and then after a short time out period which is approx 5mins the connection fails over to the other node. 

 

My question really here would be:  Is there a way for the VPX to read the 'FIN' generated by the service shutdown and use this at all - perhaps to handle the enable / disable of the primary/secondary nodes? 

 

I've stumbled on the AppExpert > Responder > Policies which i think is the area to look into but have very limited exposure currently to these functions. 

 

Any help / advise would be welcomed,

 

Thanks, Wayne

Link to comment
Share on other sites

I think we need more information on what you want your load balancing to do and what the condition for the failover to the second service needs to be.

 

What monitor are you using on your service?  And are you trying to load balance or failover between the services?

 

It sounds like you  have lb_vsrv_1 going to your first service and then lb_vsrv_2 going to the second service, and you have lb_vsrv_2 configured as backup vserver to lb_vsrv_1 (I'm assuming the "protection" you refer to is the NS's own protection settings on the virtual server. If its something else, we would need more information and you could share your lb vserver/service and backup config to help clarify this.

 

In this case, all traffic will go to lb_vsrv_1 and therefore svc_1 only, until the service is DOWN/FAILS and then the requests will be handled by lb_srv_2/svc_2 only when lb_vsrv_1 is unavailable (if again, the "backup vserver" method is what's configured.)  If you aren't using failover and you are trying to load balance, then we need to understand your lb method/persistence/persistence timeout requirements.

 

First, check your monitors:

If a "FIN" is received, usually this just means a connection is closed and not a "down" state indicator. So your traffic will continue to lb_vsrv_1/svc_1 until the lb vserver is DOWN (meaning all services it is bound to is down).  If the service is not being detected as down soon enough or under the correct condition, you need to look at the monitor in use on the server.  For example, if you are relying on a PING monitor, then the service (port) not responding would NOT fail the monitor.  If you have the tcp-default monitor, and you stop the application service, but another service is still running on the port so that the syn/syn-ack is still valid on the service port, then the monitor will not fail and the service may remain up.  So, we may just need to make sure the monitor is better able to detect the up/down state of the service you are stopping to ensure failover is faster.  But other settings such as persistence timeout or connection timeouts might need to be adjusted as well to meet the application requirements.

 

Next, the other settings:

In addition, if your tcp app has long-lived connections, the existing connections may be trying to honor graceful shutdown, meaning new requests would have faileover but the original service still appears up to the existing connections.  On the virtual server are two properties under "traffic settings":  Down State Flush and TROFS Persistence.

If down state flush is enabled, then when the service goes down existing connections should fail over to the new service; if this is disabled, then existing/established connections will still try to go to their original service until their transactions complete and new connections will go to the non-down services.  If the application must complete transactions first, then disabling this can be a bad idea.

So, in this case you have to decide if down state flush should be or or not:  https://docs.citrix.com/en-us/netscaler/12/load-balancing/load-balancing-advanced-settings/graceful-shutdown.html. (these particular settings are summarize at the bottom of the TROFS section starting at Disabling A Service). 

 

Related to this is the TROFS monitor and TROFS persistence setting.

You might need to use a TROFS string with the TCP-monitor:  https://support.citrix.com/article/CTX219926 and then set the vservers -trofspersistence value 

Your persistence might be overriding the down state for existing connections, whereas the TROFS flag might help disable the persistence to force failover sooner:  https://docs.citrix.com/en-us/netscaler/12/load-balancing/load-balancing-advanced-settings/enabling-or-disabling-persistence-session-on-trofs-services.html

This value isn't as clearly explained in the admin guide as I would like, but on the  lb vserver TROFS persistence can be ON or OFF under the Traffic settings as well.

First, a TROFS monitor has to be bound to the service (so for you this would require a TCP-ECV). 

If TROFS persistence is enabled, then persistence is honored while the services is TROFS (transitioning out of service).

If TROFS persistence is disabled, then the persistence is ignored and it should failover immediately when the services is in a TROFS state (again, test first)

 

Personally, I would check your regular monitor first and make sure it is monitoring the right things and the service/vserver is going up/down when you stop the backend application. And verify the easy stuff like persistence type and timeout as these might need to be tweaked first.

 

Then, I would look at the regular down state flush settings on the lb vserver.

And then finally see if the TROFS settings help.

 

IF I misinterpreted your intended config, then none of this may be helpful ;)

 

Responder policies can take a request and redirect it to a new location based on some criteria in the request. But it  probably wouldn't be used in this case as you described it without additional information.  Content Switch can also be used to direct traffic to specific lb vservers; but again, if this is truly a failover scenario like you describe above, probably not what you need yet.

 

 

 

 

Link to comment
Share on other sites

Mihai / Rhonda,

 

Thank you both for your replies, i'll try and expand a bit more on the config below: 

 

LB_INFServer_Primary

VIP: 192.168.29.131

Protocol: TCP (have also tried as *ANY*)

Port: 7357 (have also tried this as *)

 

Protection > Backup VS:  LB_INFServer_Secondary

Protection > Connection Failover: Stateful (Have also tried Disabled)

Traffic Settings > Client Idle Time-out: 9000 (Have also tried with 20)

 

 

LB_INFServer_Secondary

IP: Non Addressable

Protocol: TCP (Again also tried as an *ANY*)

Traffic Settings > Client Idle Time-out: 9000 (Have also tried with 20)

 

 

Service Groups, each have 1 member and other than monitors are the same:

SG_INFServer_Primary // SG_INFServer_Secondary

Member:   

SG_INFServer_Primary: INF01 / 192.168.28.177:7357

SG_INFServer_Secondary: INF02 / 192.168.28.165

 

Monitors:

Both: Standard TCP-7357 Monitor

 

SG_INFSever_Secondary Only

Reverse Monitor pointing to 192.168.28.177:7357

 

 

Background on intent / aim: 

 

We have 2 interface servers that both have a mirror of Windows services running, the intent is that the primary server handles all the load to ensure we have a single place to check logs / track errors etc.  We have a secondary server configured purely as a backup, so only gets used when Primary server is patched or down with a failure. 

 

This works fine from the VPX point of view (even before i added the reverse monitor). The minor niggle is trying to reduce the delay that seems to come from the service switching which appears to be between 5-10mins. I believe this is because the service when and/or data stream is sending a 'FIN' command (and/or something else - im still reviewing logs at the moment) back which is reaching the client and shutting it down, then after the 5-10mins elapses it retries and carries on sending data to the other server.

 

So recovery and re-initialisation is automatic and no intervention is required its just working out if we and remove or reduce the delay, i did find i was able to get the Primary > Secondary delay very low (just need to remember the config that got this result - Todays job), but Secondary back to Primary has always been a minimum of 5mins. 

 

I'll start working through both replies and see if i can gather any more data / try new things and if needed go back to basics but wanted to get some additional config noted down (hopefully that helps?)

 

Thanks again, Wayne

Link to comment
Share on other sites

Unless this is an actual stateful connection, you probably don't need to enable stateful connection handling as that requires packet information to be replicated to HA partner so partner NS can take over without packet interruption during ha failover (only works if traffic is actually stateful).  But will not impact the backup vserver/primary vserver fallback you are going for.

 

I would still look at which type of monitor you are using and see if that doesn't need to be tweaked for monitor type, probe frequency, timeouts, and down state timeout.

If its just a tcp syn/syn ack on a specific port, you are just seeing if the port works and responds with a syn ack during the test.  If you changed to a tcp-ecv or other monitor than you might be able to make the "FIN" be a disallowed condition or respond to other states than once the port fails. This along with other monitor settings may improve restoring lb_vsrv_1 to an up state quicker and returning the backup vserver to the primary.  There also may be betters ways of detecting which backend is active to make the environment respond the way you want.

 

On the lb vserver (primary), under protection methods do you have "disable primary when down" enabled? If so, this will keep traffic on the secondary when the primary recovers instead of failing back when the primary is online, unless an admin manually re-enables the primary.

 

Finally, you might want to look at that vserver property I mentioned for "down state flush" being on or off and you may need to set it consistently on both the primary and the backup vserver.  But only if down state flush won't pose a risk to your regular application traffic handling.  

 

If you're getting different behaviors from one to the other, compare the two lb vservers for differences, which you might be able to do more easily from the cli:

show ns runningconfig | grep <vserver1> -i

show ns runningconfig | grep <vserver2> -i

 

If you are using reverse monitors, show the actual monitor details you are using and how they are bound for more informed recommendations:

show ns runningconfig | grep <monitorname> -i

 

---

Now about your config above:

1) Primary LB is only bound to the primary service group and Backup lb vserver is only bound to the backup service group (so you have an active/passive config).

2) Reverse monitors involve negative logic, but it should never be the only monitor bound to a service or service group.

You need one monitor to confirm that the target destination is working and the reverse monitor is used when it is easier to identify what you don't want than what you do.

If multiple monitors are bound, then any one monitor condition failing will down the service.  For reverse monitors: a monitor probe that success is a failed condition so the service is DOWN; if the monitor probe fails, then the condition is treated as TRUE, so the service may be UP (depending on the status of other monitors).

 

Your primary servicegroup should have a condition that clearly identifies if the primary server is working; when this condition fails the servicegroup is down, therefore the primary vserver is down, and traffic will be handled by the backupvserver/backup servicegroup if it is in an UP state. No traffic will be sent to the backup, unless the primary is down.

 

The secondary, should usually only need the positive test, because once the primary returns to an up state, traffic will return to the UP primary vserver.  But make sure persistence is not set on either lb vserver (or you could have issues here.) Down state flush may still be needed.  But a reverse monitor shouldn't be required.

 

If you still want to use a reverse monitor on the secondary to DOWN the secondary when the primary is back online, you can...but you might have a small window where they are both down due to timings of the monitors.  Down state flush and other settings might still be needed to prevent long lived connections from sticking to the backup vserver after the primary turns back online.  Meaning the reverse monitor alone still may not solve the original problem.

I would keep it simple first and use only one monitor, before using the reverse monitor in this case (this may cause more issues than you intend):

Example:

In this case, on the second service group, you need two monitors (not just one):

mon1_tcp_7357 is a positive test monitor bound to the secondary servicegroup confirming the group actually works.

mon2_rev_checkprimary_7357  would be bound to the secondary servicegroup but have DestinationIP of the primary service and the REV flag enabled.  (When primary is working, down secondary, to force traffic hopefully off of secondary to return to primary.)

If mon2's probe succeeds (primary is up), then due to the reverse flag the monitor condition is reported as FALSE and downs service group 2. (MON1 doesn't matter in this case)

If mon2's probe fails (primary is down), then the reverse flag reports the monitor 2 condition is TRUE and as long as mon1 is also true (service group 2 is working), then service group 2 is UP and the secondary vserver can be used (because primary is down).

If mon1 ever fails, then the secondary servicegroup isn't functioning and it will be down anyway.

 


 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...