Jump to content
Welcome to our new Citrix community!

GSLB Dynamic RTT Method: when will the Citrix ADC appliance stop probing the client’s local DNS server?


Wenbin Zheng

Recommended Posts

In the Dynamic round trip time method of GSLB (https://docs.citrix.com/en-us/citrix-adc/current-release/global-server-load-balancing/methods/dynamic-round-trip-time-method.html), the appliance starts to probe the client’s local DNS server and gather RTT metric information after a client’s local DNS server accesses the site for the first time. I have two questions about the method:

1. Will the IP address of the client's local DNS server be sent to all other participating sites (via MEP message?) immediately after the accessing first time, so that all sites can start probing?

2. When will the probing be stopped?

Link to comment
Share on other sites

 

1) Will the IP address of the client's local NDS be sent to other participating sites (via MEP)?

Yes, initially. Update frequency is dependent on the RTT Tolerance value in the global GSLB parameters.  

First resolution, the LDNS IP is communicated to the other sites to return a value using one of the available probes.

After that sites will continue probing the LDNS until it times out for updated values. They will only communicate their change in metric if the timeout exceeds the tolerance value.

Example: You set a tolerance of 5 ms.  And Site A originally reported 27 ms latency and site B reported 48 ms latency and they swapped values.

Site A:  itself: 27 ms and site B 48 ms.   || Site B:  itself; 48 ms and site A: 27 ms.

As site A continues probing, it sees variance of 27, then 25, then 29 (changes of 2 ms each but never more than 5 ms from previous value. So no updates to partner.

Site A:  itself: 29 ms and site B 48 ms.   || Site B:  itself; 48 ms  and site A: 27 ms (old value, not updated).

When siteA changes from 27 to 35 ms, this excees the 5 ms tolerance value and it does update SiteB.

Changing the tolerance from 5 ms to 10 or 15 or 20 ms, makes it less reactive; but for sites that are separated by 30-50 ms of latency, this should be fine.

 

2) When will probing be stopped?  View the ldnsentrytimeout. CLI below; if in GUI, it would be in the GSLB parameters section. I don't remember default value.

View parameter:  show gslb parameter 

Change parameter:  set gslb parameter -ldnsEntryTimeout <secs>

 

Review the ldnsentrytimeout which is a positive integer in seconds.

If no other entries from this LDNS attempt resolutions, then once the timeout exceeds the ldns entries are "forgotten" cleaned from the table and no longer probed.

 

 

 

Link to comment
Share on other sites

Hi Rhonda,

 

Thank you for your reply. 

The document says that when a client’s local DNS server accesses the site for the first time, the Citrix ADC appliance selects a site by using the round robin method but not the dynamic RTT one. I'm concerned that the dynamic RTT method would not work well in some low frequency usage scenarios.

 

Consider about a case as below:

1. An app client initiates a request to its backend service for the fist time, which triggers the LDNS to access a GSLB site, then the appliance selects a site for it by the round-robin method since no RTT metric is found at the time;

2. From then on, GSLB starts to probe the LDNS for RTT values and exchange metrics between sites;

3. Later then, for a long time (maybe a couple of days or weeks) no more app requests within the same LDNS are performed, which will finally cause the ldnsentry timeout exceeds the ldnsentrytimeout setting, and all the ldns metric entries will be cleaned and the probing will be stopped as well;

4. After that, a second app request is initiated.

 

In this case, the dynamic RTT method doesn't work well as expected. Any advice?

Link to comment
Share on other sites

First one being round robin is correct; I forgot when I was looking up the timeout.

A given LDNS is likely an ISP serving on behalf of many clients anyway.  But your statement that if the LDNS times out, they have to begin the GSLB decision process again is correct.  

 

Your decision about whether this is acceptable behavior or not is driven by the following:

1) What is the location/distances of the datacenters involved?  What is the issue/risk of the less optimal datacenter being selected.

2) What is your user volume likely to be per ldns keeping things highly active or small user traffic so the ldns will timeout frequently?

>> These will help you see if you need Dynamic Method or static proximity.

 

If you have a scenario for two active/active datacenters that are relatively close and the variance between latency is low and you just want users going to "best" current datacenter as client-to-datacenter networking changes, but if the latencies are "close" but not horribly out of alignment, then it is okay to hit either. If the difference from the user to datancenter is 35 ms vs 45 ms is it a problem either way?  If it matters, don't use RTT.  

 

If these are separated by large latencies and cross geo, then you should only see the other datacenter when a significant change in latencies occur. 

 

The problem with Dynamic method rtt is that 1) its easy (no config) but 2) you have very little direct control.  The second concern is that if the  LDNS don't respond to the probe mechanisms and you'll be using the fallback method or round robin anyway.

 

Static Proximity mapping requires some up front work (but for public ips, a built in database is present), but you have more control of the datacenter selection.

Link to comment
Share on other sites

We have multiple datacenters in AMER, EMEA and APAC regions, and are building a global streaming platform that is designed to ingest media at a location near the media source so as to guarentee media transmitting quality. Meanwhile, small traffic is expected in the short term. So I guess static proximity would be a better option as per your comments?
 

Link to comment
Share on other sites

Thanks Rhonda!

 

One question about the static proximity method, if one of GSLB sites is detected down (for example, due to NetScaler heartbeat probing timeout on all app server instances within the site), will it be removed from the DNS candidate IP address list (so that only IP addresses of active sites can be selected and returned to the LDNS server)?

Link to comment
Share on other sites

Your question is covering both heartbeat probe behavior and the destination IP.   (Then your wording gets confusing.) So I'll tackle the two parts that make sense. IF this isn't the question you asked, clarify.

 

First: what happens when a gslb service is down. Short answer: that IP is not returned and the other gslb service(s) will be used instead.  There's more to the response, noted below.

For any GSLB method, the gslb vserver is used to spit out the appropriate DNS resolution at the time the query is made.

If multiple sites are active, by default only one IP is returned which is the preferred location.

If that site is down, then the next acceptable site is returned based on the lb config in use.

If no sites are available, then all IPs are returned (deault). Which I'll explain later.

 

Settings such as MIR (multiple ip response when up )and EDR (empty down response when down).  Change 

 

But first to answer your question, whether proximity or least connection or any method, when the query is made GSLB selects the best option of the UP services. If that method is not available the other site will be returned. Factors may be affected by exactly how gslb is configured.  Down gslb services are not included in responses (usually).

 

LDNS will move to the alternate location as part of failover.  If you have more than two locations, then how you configure active/active or active/passive would affect that final output. If you just have two, then your B location will be returned if working if A is unavailable.

 

 

MIR and EDR change behavior in the following ways:

Normally, only one IP is returned per resolution based on the preferred output.  MIR allows all valid IPS in an UP state to be returned with preferred location FIRST. That way if something caches a response, additional IPs are available if the client can use them.

Normally, when all services are down, GSLB returns a list of all possible services. That way if something caches, they can make multiple attempts to something until one of the locations is back up (everything down is clearly an outlier).  EDR allows the system to provide NO IPs if all services are down forcing any cached resolutions to have no values and forcing a new query to find a valid location.

 

Second, regarding the NS heartbeat probe that you mentioned:

GSLB services up/down states can be controlled by MEP the GSLB appliance to appliance communication which is what I believe you meant when you said the "probe".  

MEP reports remote service up/down state changes to GSLB partner sites so they know when a destination is down and do not hand out the IP that service represents. In this regards its like traditional load balancing, you only get sent to a service that is UP.

GSLB MEP also exchanges stats/metrics/gslb persistence decisions, in addition to remote GSLB site status. 

Example:  An appliance that is aware of its own gslb service (gsvc_A)  and its partner location (gsvc_B), will not send requests to gsvc_B if the service is down.

 

MEP also allows appliance A to detect that appliance B is down.

 

GSLB Services can also be configured with health monitors like traditional lb services.  This would mean an appliance A sends a probe to the remote vserver (VIP:Port) that the gsvc_B represents on the remote appliance. If monitors are bound to gslb services, then they trump MEP for service up/down status (though MEP still determines appliance state and other communications).

 

Link to comment
Share on other sites

Thank you for the detailed info. That's really helpful! Appreciate!

 

Just a little bit clarification on my question. In our backend, the NetScaler Load Balancer monitors the state of app servers within a site by initating HTTP healthcheck requests respectively. If no response from an app server is received, the Load Balancer will identify the server state as DOWN. If all app servers in the gslb service are DOWN, the gslb service will be identified as unavailable.

On the other hand, as you mentioned, the states of gslb services, along with other metrics, are exchanged between gslb sites through MEP.

GSLB Active-active Topology.jpg

Link to comment
Share on other sites

I'm not sure what you want clarified. So please be sure to restate, if none of this clarifies things.

 

Regular LB:  services represent backend destinations accessible by THIS appliance.  Monitors on regular lb services, go up and down and affect possible backend destinations for lb vserver decisions.

 

GSLB: GSLB services are the potential IPs that a FQDN/DNS resolution can be resolved too.  They mostly therefore point to possible traffic management entities such as LB vserver, CS vservers, and VPN vservers.  So a GSLB service up/down state represents a possible GSLB resolution aka a lb/cs/vpn vserver up/down state. (If for LB, all regular services are down, then the lb vserver is down, therefore the GSLB service pointing to /representing that vserver destination is also down).

 

So GSLB services reflect up/down states of both lb/cs/vpn vserver of either this appliance or the vservers on the remote appliance.

GSLB services up/down states are controlled by MEP by default; IF monitors are bound to gslb services, then the monitor on the gslb services reflect up/down state.

 

If your site 2 appliance has both lb services down, then the site 2 lb vserver is down. Then the GSLB service which points to site 2 lb vserver is DOWN.

That is either communicated to site 1 by MEP or site 1's monitor of the gslb service (which is a probe to the lb vserver in site 2) if set.

If both MEP and monitor in use, then monitors always win.  MEP is preferred usually, for the scenario you defined above.

 

Bottom line: while gslb uses structures similar to lb. They are built to do different end results.  LB services reflect a different destination than a gslb service.

 

 

 

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...