Jump to content
Welcome to our new Citrix community!
  • 0

Issues With XA 7.15 LTSR CU5 Local Host Cache Mode


Joshua Bilsky1709151488

Question

We have a fairly simple XA 7.15 LTSR CU5 environment which is distributed across 3 locations. Each location has a single Netscaler appliance, a single StoreFront server, and two Delivery controllers. The two delivery controllers at each site are configured in their own zones. There are also three delivery groups with VDAs at each site for the purposes of localizing traffic. VDAs and delivery controllers are running Server 2016 while the StoreFront systems are 2012 R2.

 

The Netscalers and Storefront servers at each site are configured for both local delivery controllers for XML/STA. The StoreFront configuration has both delivery controllers listed in failover order. Loadbalancing is not checked in the StoreFront GUI. The setup effectively allows access into the single farm from our three geographic locations.

 

This setup worked flawlessly until our primary data center hosting our SQL cluster had to be shut down due to a power issue. What I was expecting was that when the db as well as the delivery controllers, VDAs and SF servers were taken down at our primary site, the two other satellite sites would still allow access into resources at those sites via the LHC.

 

However we were seeing a lot of inconsistency with LHC mode. It seemed like sometimes users would login to SF, they see no app/desktop icons listed. Sometimes refreshing the page would cause them to show up. Even if the icons did show up and the user attempted to launch a local resource (as I know from the docs that you cannot launch a resource located in another zone with LHC active), sometimes the resource would fail to launch while other times it would succeed. When looking at the SF logs sometimes it would say there was no xml service available for the request. My understanding is that when the SQL db goes offline, a secondary HA broker should be elected for each zone and used by the StoreFront server for brokering based upon the LHC localdb but this just was not working reliability and I'm trying to understand why. Has anybody had experience with LHC mode being activated? What is your experience been? I'm just trying to understand what went wrong so that when this happens again (it will) that we'll be in a better state.

 

Thanks in advance

Link to comment

2 answers to this question

Recommended Posts

  • 0

Our configuration is different for multiple sites due to issues like this, and being designed before LHC. We have a complete Citrix Site in each Data Center, with its own DB. This may be overkill for your environment. That being said, I have only had LHC kick in one time successfully (a couple other times, it should have, it just didn't work at all, but those were the very early days of LHC). In that one instance, the DB went down unexpectedly and users couldn't log in for about 10 minutes until LHC took over. Since it wasn't planned, I spent all of my time working on getting the DB back online and not monitoring user connections, but once the VDAs checked in and things were working, I didn't hear of any real issues with connections. 

I am interested in hearing others experiences, and gets me thinking that I really probably should plan a test run sometime, if there were ever time available. 

Link to comment
  • 0

Definitely having separate sites and dbs is the way to go if you have the resources to do it.  I had waited for XA 7.12 to be released before upgrading our XA 6.5 farm just for the LHC feature.  We have a relatively small environment so having a single farm with zones makes more sense for us from a manageability and hardware aspect.

 

When the outage occurred yesterday, there was definitely some time before things stabilized before users could log in. I read about it taking 10 minutes but it seemed like a good 30 minutes for us until users could reliably launch sessions.  Things stabilized last night but then this morning we started getting reports of issues again.

 

Just trying to piece together a sequence of events from the logs and this is what I found:

 

4/20 2:24 PM connection to database was lost events 1201/3501 on delivery controllers
4/20 2:25 PM HA becomes active event 3502 on delivery controllers
4/20 ~2:34 VDAs register with secondary broker event 1012 on VDAs
The Citrix HA event 1201 on delivery controllers continued to 2:48 PM on 4/20
Confirmed login was successful at 3:23 PM and connections appeared to stabilize throughout the evening of 4/20
4/21 starting 5:31 AM alternating messages 1201 connection to db lost and 1200 connection restored from ha service through 9:54 AM -- during this time connections were intermittent, sometimes no icons were presented, other times icons were presented by launching an app would fail even for a VDA within the same zone, no real pattern to it but some users did manage to get logged on
4/21 9:55 AM Access to db was restored with events 3500/3503 reporting normal brokering operations resumed on the delivery controllers

 

During the time we had intermittent connection issues, StoreFront reported various errors:

 

The Citrix servers reported an unspecified error from the XML Service at address
All the Citrix XML Services configured for farm failed to respond to this XML Service transaction.
Failed to launch the resource 'Desktop $S3-6' using the Citrix XML Service at address

 

I'm not sure if the alternating 1200/1201 errors are meaningful in the sense that the delivery controller which was in HA mode was thinking the database was live but clearly it wasn't?  I would only expect to see those messages during the initial loss of connectivity to the SQL db and then again during restoration of connectivity to the SQL db.  From what I can tell, the VDAs did register with the elected secondary broker fairly quickly (with 10 minutes).  I thought about maybe the number of connections overwhelmed the broker but from what I read Citrix says its good for 5k sessions.  We only a couple hundred users and when the issue started back up this morning there was probably a steady stream of connection attempts but not so many that it should overwhelm a broker running in HA mode.  I'm just puzzled as to what was going on.

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...