Jump to content

Netcaler HA Pair crashing/rebooting constantly


Robert Laing

Recommended Posts

We're running EOL code 12.1 60.19nc on our VPX instances on our SDX8900s. We were unable to upgrade before it went EOL due to "other issues". Now that those other issues are resolved, we are working to get the bundle upgraded. However, for the past 2 weeks, our Netscalers have been rebooting constantly...at least once per hour, if not more. This code has been running with no issues for 2 years.  The only relevant lines I can find in the logs are in the messages log. 

 

Nov  2 09:21:11 <local0.alert> nsppe: PE 1 (pid 17657) got signal 10 followed by signal 10; signal mask is 0x0 0x0 0x0 0x0
Nov  2 09:21:11 <local0.alert> nsppe: PE 1 (pid 17657) has encountered signal 10 while handling signal 10: return addresses from stack follow: 0x7fffffffffc4 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 
Nov  2 09:21:11 <local0.alert> nsppe: PE 1 (pid 17657) is exiting.
Nov  2 09:21:11 <local0.alert> [27]: pitboss Thu Nov  2 09:21:11 2023 proc NSPPE-01 (17657) failure. Therefore initiating nCore NetScaler restart according to policy setting (0x29b4)
Nov  2 09:21:11 <local0.alert> [27]: pitboss Thu Nov  2 09:21:11 2023 NetScaler restart may be delayed if collecting core dump for NSPPE-01 (17657)
Nov  2 09:21:11 <local0.alert> [27]: pitboss Thu Nov  2 09:21:11 2023 Pitboss declaring system failure: NSPPE-01 (17657) exited
Nov  2 09:21:17 <local0.alert> restart[19710]:   Nsshutdown lock released !   
Nov  2 09:21:17 <local0.alert> [27]: pitboss Thu Nov  2 09:21:17 2023 All monitored processes have exited, restarting nCore
Nov  2 09:21:17 <local0.alert> [27]: pitboss Thu Nov  2 09:21:17 2023 Initiating restart cmd (/netscaler/nsrestart.sh)
Nov  2 09:21:19 <local0.alert> restart[19726]: Restart:Shutting down NetScaler processes

 

Searching gave me 3 possibilities so far. 

Disable AppFlow, Disable CallHome and Disable EDT.  I had all 3 disabled and the reboots continue. AppFlow and CallHome, I disabled in the System->Settings->Configure Advanced Features. and EDT was disabled using the command nsapimgr -ys enable_ica_edtinsight=0

 

Sometimes the MGMT CPU gets pegged at 100% prior to the restart, sometimes its not high at all. 

 

Has anyone seen anything like this? I'd like to stop the constant reboots before we upgrade as the upgrade will take some time convert from 12.1 to 13.1 or 14.1, 

 

 

 

Link to comment
Share on other sites

45 minutes ago, Carl Stalhood1709151912 said:

Upgrade the VPX firmware to 13.1. I recently saw crash dumps starting in late October and 13.1 fixed it.

Mid to Late Oct is when ours started crashing. We're working on getting assistance with the upgrade as we were told a lot of the expressions and other syntax must be converted to newer formats (something I'm not proficient with) and it will likely be a few weeks before we can get it arranged. Any idea what was causing the crash dumps?

Link to comment
Share on other sites

What about converting to nFactor? I was told this is a requirement to upgrade to 13.1. If its not, then upgrading right away may be the next step. 

 

Also, we have 2 HA Pairs. One for internal traffic and one for External traffic from Outside. One of each Netscaler runs on each SDX. Only the External Netscaler is affected by this crash issue. It has the Citrix Gateway running on it and appfw where as the internal only runs VIPS for load balancing. Only the external Netscaler with the Gateway is crashing, the internal Netscalers have been running flawlessly. 

Link to comment
Share on other sites

  • 2 weeks later...
On 11/2/2023 at 12:36 PM, Carl Stalhood1709151912 said:

Upgrade the VPX firmware to 13.1. I recently saw crash dumps starting in late October and 13.1 fixed it.

The netscalers completely crashed a few days ago. Citrix Support assisted with the upgrade but they would only go to 13.0. No more reboots or crashes and we are running a supported version again. We did have to fix our AAA auth policies....we didnt change them but we did have to remove and re-add them before they would work. 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...