Jump to content
Welcome to our new Citrix community!
  • 0

User Layer - Firewall Failover - Re-Attach Problem


Bertug Demir1709155795

Question

Hello everyone,

 

We're having an issue that end users whimper about every now and then. We have a cluster of Checkpoint R80.30 Firewall behind the VDI infrastructure (Virtual Apps and Desktops 7.15 CU4, VDA 1903). App Layering/User Layer version is 19.7.0. User profiles are residing on an EMC Isilon DFS share. Checkpoint sometimes failovers to the standby node due to random issues. This causes the active TCP connection responsible for the VHD mount to go down and the user profile detaches. I believe the failover (hence VHD detach) causes virtual machines to go into an unresponsive state and user profiles won't recover. Users won't be able to re-connect to their disconnected session unless the virtual machine is rebooted.

 

There's also another side to this story. We have also setup Microsoft FSLogix (version 2.9) containers for Outlook OST and Search Index files. They are also residing on the same file share. Outlook is also affected by the failover and go into an unresponsive state unless manually killed and re-launched by the user.

 

I can see from the logs that the outage lasted roughly 2-3 minutes. Neither of the applications were able to recover from this event. Outlook re-launch is easy, however VM reboot is not.

 

I've seen someone having exactly the same issue here;

 

https://social.msdn.microsoft.com/Forums/en-US/9fb5aad2-41ab-420a-b0f1-add557d6dd8a/fslogix-disk-reattach-causes-session-crash?forum=FSLogix

 

You can find the FSLogix re-attach events down below;

 

[23:21:58.228][tid:000010d8.000010dc][INFO]           Configuration setting not found: SOFTWARE\Policies\FSLogix\ODFC\ReAttachRetryCount.  Using default: 60
[23:21:58.228][tid:000010d8.000010dc][INFO]           Configuration setting not found: SOFTWARE\Policies\FSLogix\ODFC\ReAttachIntervalSeconds.  Using default: 10
[23:21:58.228][tid:000010d8.000010dc][INFO]           ===== Begin Session: Volume re-attach
[23:21:58.229][tid:000010d8.000010dc][INFO]            Attempting re-attach of volume: \\?\Volume{009fabab-0f5b-44f5-9553-a3b9e5349e61}\ for SID: S-1-5-21-1639082044-105522085-4547331-180714
[23:21:58.229][tid:000010d8.000010dc][INFO]            Acquiring mutex for reattach
[23:21:58.229][tid:000010d8.000010dc][INFO]            Mutex acquired
[23:21:58.229][tid:000010d8.000010dc][INFO]            VHDPath: \\DFSShare\ODFC_User.VHD
[23:21:58.229][tid:000010d8.000010dc][INFO]            Username: User
[23:21:58.229][tid:000010d8.000010dc][INFO]            Attempting re-attach as the user
[23:21:58.229][tid:000010d8.000010dc][INFO]            Retry Count: 60  Retry Interval (seconds): 10
[23:22:28.273][tid:000010d8.000010dc][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
[23:22:38.274][tid:000010d8.000010dc][INFO]            Retrying re-attach (1 of 60)
[23:22:38.305][tid:000010d8.000010dc][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
[23:22:48.307][tid:000010d8.000010dc][INFO]            Retrying re-attach (2 of 60)
[23:22:48.336][tid:000010d8.000010dc][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
[23:22:58.336][tid:000010d8.000010dc][INFO]            Retrying re-attach (3 of 60)
[23:22:58.362][tid:000010d8.000010dc][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
[23:23:08.363][tid:000010d8.000010dc][INFO]            Retrying re-attach (4 of 60)
[23:23:08.389][tid:000010d8.000010dc][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
[23:23:18.390][tid:000010d8.000010dc][INFO]            Retrying re-attach (5 of 60)
[23:23:18.420][tid:000010d8.000010dc][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
[23:23:28.422][tid:000010d8.000010dc][INFO]            Retrying re-attach (6 of 60)
[23:23:28.448][tid:000010d8.000010dc][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
[23:23:38.448][tid:000010d8.000010dc][INFO]            Retrying re-attach (7 of 60)
[23:23:38.478][tid:000010d8.000010dc][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
[23:23:48.479][tid:000010d8.000010dc][INFO]            Retrying re-attach (8 of 60)
[23:23:48.516][tid:000010d8.000010dc][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
[23:23:58.516][tid:000010d8.000010dc][INFO]            Retrying re-attach (9 of 60)
[23:23:58.548][tid:000010d8.000010dc][INFO]            Unsuccessful re-attach attempt.  Retry in 10 seconds.
[23:24:08.550][tid:000010d8.000010dc][INFO]            Retrying re-attach (10 of 60)
[23:24:08.615][tid:000010d8.000010dc][INFO]            Successfully opened VHD file
[23:24:08.894][tid:000010d8.000010dc][INFO]            Volume successfully re-attached
[23:24:08.894][tid:000010d8.000010dc][INFO]           ===== End Session: Volume re-attach

Link to comment

4 answers to this question

Recommended Posts

  • 0

Hello Koenraad,

 

You're absolutely correct. The firewalls are between the desktops and the DFS share. Our organization values security over any other functionality, that's why they want firewalls placed between any two nodes. What's even more amazing is that we have as little IP-based firewall rules as possible to reduce the risk of any user accessing unwanted space. All of the rules are identity based (Identity Aware). Of course, the traffic between virtual machines and the DFS share is an exclusion to this case as every user must access their profile at any cost.

 

The drawback is that there's lots of sensitive data residing on the same share along with user profiles. It's been set up that way and nobody felt the need to migrate the user profiles to an isolated share so that there won't be as high of a security concern.

 

The only solution that I can come up with is to migrate away all of the user profiles into a different share and then tell network engineers to route the traffic to bypass the firewalls. It seems to be the only way to not get affected by these failovers.

 

Do you think it's possible to not lose the session after a failover? Any fix or a configuration change?

 

 

Link to comment
  • 0

Hi,

 

Thanks for the explanation, that makes sense from a security perspective.
I think you've made the right analysis yourself, it would make sense to move the profiles off of the DFS share and into something that does not go through the firewalls, to avoid the failover outage.

 

As to your question on not losing connection to the profile disk during failover; I think it all depends on the time it takes to do the failover. From your initial post and also logs, it seems like it takes more than two minutes for traffic to start flowing again. Probably if this would only be a couple of seconds, the User Personalisation Layer/FSLogix would not complain and not lose connection to the profile disks.
I'm not sure how the Checkpoint firewalls are set up, but I think if you would use Virtual IP's (VIP), the failover should be almost instantaneous, not losing more than one 1 ping.
It would also make sense to investigate why the failover is occurring in the first place.

Best,

 

Koenraad

Link to comment
  • 0

losing file servers (or losing access to file servers) is a disaster for containers - anything in the middle that can get in the way should be squashed!

 

I don't know the exact thresholds and how long an outage can last before its game over in an environment that isn't inspecting traffic, but one that is is going to be far more nasty i would assume (think of the connection states and if they are replicated between firewalls etc - nasty)

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...