Jump to content
Welcome to our new Citrix community!

NSPPE-00 on secondary node keeps crashing after updating to 12.1 build 55.18


Recommended Posts

After upgrading from 12.1 build 55.13 to the new 12.1 build 55.18 firmware with the patches for CVE-2019-19781, the secondary node in our HA-pair keeps crashing and restarting.

Excerpt of console logging:

 

DATE <kern.info> REDACTED kernel: pid 1261 (NSPPE-00), uid 0: exited on signal 6 (core dumped)

DATE <local0.alert> REDACTED [25]: pitboss DATE All monitored processes have exited, restarting nCore

 

After the nCore restart has completed, NSPPE-00 crashes again and nCore restarts all over again.

 

So far we have had two case with this behaviour, one on a VPX instance and one on a MPX appliance. On the MPX appliance we had AppFlow running. We have disable AppFlow, but the issue persists. On the VPX we weren't using AppFlow.

 

Anyone else seeing this anywhere in the field?

 

Link to comment
Share on other sites

 

yup, same here. I opened a case and it seems to be related to a syslog policy that was bound to appfw global.

 

 

After I removed the policy everything was fine. Shortly afterwards there was a crash of the primary. Now it is stable for three hours.:100_pray:

 

 

Here is the information from my case:

  

Quote

However the issue occurs on a secondary box if there are syslog policies bound to “appfw global” 

Remove the HA configuration from Primary node (Node 1). After this step, both the nodes in the HA pair will be in standalone mode.

 

show ha node

rm ha node <node-id>

 

If there are any syslog policies that are bound to “appfw global”, execute this step. On Primary Node, run the following commands

 

unbind appFw global <syslog-policy>

bind system global <syslog-policy>


Note: Unbind all the syslog policies bound to appFw global and bind them to system global. 
This will ensure that the logs are sent to the external syslog server.

 

After Step 1, cyclic reboot of Secondary Node(Node2) should stop, as it will be in standalone mode. Once Node 2 comes up (in standalone mode), execute the following commands.

 

unbind appFw global <syslog-policy> 


Note: Unbind all the syslog policies bound to appFw global

 

Bring both the nodes back in the HA pair. Add the required HA configuration on Node 1

 

add ha node <node-id> <IP> 

 

After executing above steps, Node 2 will now be Primary and Node 1 will be Secondary.

 

Link to comment
Share on other sites

Yesterday we finaly managed to upgrade our HA pair to 12.1 build 55.18.

 

According to Citrix support is might be an issue with HA Sync due to the version difference after upgrading the secondary node. They adviced us to first save the configuration and then to disable HASync (set ha node -haSync DISABLE), so the secondary would not sync after the upgrade and reboot. We tried that, but in our case this did not work.

 

In the end we had to unbind the HA pair by removing the HA configuration from the primary node (rm ha node <node-id>). After that you end up with two standalone NetScalers.

We could then upgrade the former primary NetScaler and after the reboot it came up again this time as the secondary node in the HA pair, Now both nodes are running the same firmware, the issue is gone.

 

 

Link to comment
Share on other sites

On 1/26/2020 at 8:26 PM, Benjamin Toelle said:

 

yup, same here. I opened a case and it seems to be related to a syslog policy that was bound to appfw global.

 

 

After I removed the policy everything was fine. Shortly afterwards there was a crash of the primary. Now it is stable for three hours.:100_pray:

 

 

Here is the information from my case:

  

 

 

I had the same issue and unbinding the syslog policy bound to appfw global solved my issue too. Also, I did not face any other crash. 

p.s. Just in case,  I unbound all the syslog policies regardless of which entity they are bound to.

 

Link to comment
Share on other sites

2 hours ago, KFI Service said:

We have crashes without any syslog policies

 

Then you must be facing something else, I suppose. You can try "stay primary" and "stay secondary" ha status before upgrading the secondary node and see if any issue occurs when the upgrade is done. It was initially recommended to me by Citrix Support which I did not test as unbinding syslog policies bound to appfw global solved my issue.

 

On the primary node:

set ha node -hastatus STAYPRIMARY

 

On the secondary node:

set ha node -hastatus STAYSECONDARY

 

 

Link to comment
Share on other sites

  • 4 weeks later...

we had the same issue with one HA pair out of 4.

On all other 3 HA pairs the upgrade worked and last one the most used by users (Gateway feature), the VPX was crashing continuously after the upgrade to 55.18. We had to roll back the upgrade.

Raised a case with Citrix, hope they come with something soon.

All other 3 HA pairs have the same config. All our DEV HA netscaler didnt have any issue.

 

I believe is related to appflow, syslogs as these are high consuming processes and also in my case the crashing VPX is the most used one.

Increasing the resources could be an option but I dont have any hard evidence of this and is just my gut feeling.

I vaguely remember someone on twitter saying that this version is changing the SSL processing and they got crashes after upgrade. (VPX with no SSL chipset dedicated).

 

Has anyone else found any solution or got feedback from Citrix?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...