Jump to content

Cluster did not occur correctly Netscaler


Recommended Posts

Posted

When disconnecting one of the equipment (Node 2 - 10.254.X.X) to move the power cable of the same, when being reconnected, the synchronization of node 2 in the Cluster did not occur correctly. What procedures can I do to get my cluster back to normal, active-active.

NETSCALER.png

Posted

From the shell, browse to /var/log/ns.log and run the following command to get any events related to the cluster.  This should give you a better idea of what is causing it be be inactive and from there you can determine what action to take.  Feel free to post a snippet of the log or a scrubbed copy of the log if you can we can figure out what's going on.

 

cat ns.log | grep CLUSTERD

 

Here is a sample from my Cluster instance, where the cause of the node becoming unavailable was a backplane failure (caused by the NIC being disabled):

 

Quote

root@netscaler-b# cat ns.log | grep CLUSTERD
Aug  2 23:41:11 <local0.info> 10.10.10.248 08/02/2019:23:41:00 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789260078583 :  "10.10.10.248: CSM: p(ELECTION) n(RECOVERY) View VID(10.10.10.248:13) Leader 10.10.10.248 OVS 8 CVS((10.10.10.248-3)(10.10.10.251-0)(10.10.10.250-2) "
Aug  2 23:41:11 <local0.info> 10.10.10.248 08/02/2019:23:41:01 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789261626588 :  "10.10.10.248: CSM: p(SERVICE) n(CLAIMING)  "
Aug  2 23:41:11 <local0.info> 10.10.10.248 08/02/2019:23:41:01 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789261711583 :  "10.10.10.250: DVS: p(10.10.10.248 10.10.10.251 ) n(10.10.10.248 10.10.10.251 10.10.10.250 ) "
Aug  2 23:41:11 <local0.info> 10.10.10.248 08/02/2019:23:41:01 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789261798591 :  "10.10.10.248: NODE: p(FREE) n(VIEW_MANAGER)  "
Aug  2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789464715586 :  "10.10.10.250: DVS: p(10.10.10.248 10.10.10.251 10.10.10.250 ) n(10.10.10.248 10.10.10.251 ) "
Aug  2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789464715586 :  "10.10.10.250: DVS: p(10.10.10.248 10.10.10.251 10.10.10.250 ) n(10.10.10.248 10.10.10.251 ) "
Aug  2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789464715586 :  "10.10.10.250: HB: p(UP) n(BKPLANE_FAIL)  "
Aug  2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789464715586 :  "10.10.10.250: HB: p(UP) n(BKPLANE_FAIL)  "

Aug  2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789464715586 :  "10.10.10.248: DVS: p(10.10.10.248 10.10.10.251 10.10.10.250 ) n(10.10.10.248 10.10.10.251 ) "
Aug  2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789464715586 :  "10.10.10.248: DVS: p(10.10.10.248 10.10.10.251 10.10.10.250 ) n(10.10.10.248 10.10.10.251 ) "
Aug  2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789464715586 :  "10.10.10.248: CSM: p(SERVICE) n(CLAIMING)  "
Aug  2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789464715586 :  "10.10.10.248: CSM: p(SERVICE) n(CLAIMING)  "
Aug  2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789464715586 :  "CSM: CLAIMING time 148 ms "
Aug  2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789464715586 :  "CSM: CLAIMING time 148 ms "
Aug  2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT  3-PPE-0 : default CLUSTERD Message 0 1564789464715586 :  "10.10.10.248: NODE: p(VIEW_LEADER) n(FREE)  "

 

 

From the CLI (not the shell) you can run show cluster node 2 to see more information about the health status of that particular node, which may also help you narrow down the issue.  See the example below that also shows an issue with the backplane interface on this particular node.

 

image.thumb.png.e7c418934633a721547ee2040c18ed9b.png

Posted

Can you try forcing a configuration sync on node two?

 

From node 2, run force cluster sync

 

If that doesn't help, can you try disabling and then re-enabling the cluster instance on node2 by using the following commands from the NSIP of the node in question?

 

disable cluster instance 1

enable cluster instance 1

 

In the first screenshot I can see a BKPLANE_FAIL message followed by a LB_STATE_SYNC_INPROG message, but the status of node2 in the second screenshot is: 

Quote

 

NOT UP

Reasons: Service state is being synchronized with cluster

 

Did you happen to check the /var/log/ns.log file for any relevant events around the time node2 was brought back online?  Events before or after CLUSTERD events would be of interest if further investigation is required.

 

If disabling and re-enabling the node doesn't resolve the issue, would it be a problem for you to remove that node from the cluster and then re-join it to the cluster?  Not sure if that's necessary, but based on the information you've provided, those would be my next steps.

 

Also, I noticed the following in the output from your second screenshot:

 

Quote

Jumbo frames are not supported. To support jumbo frames, set MTU for backplane switch interfaces to be greater than or equal to 1532 bytes

 

While likely unrelated to your current issue, Citrix recommends that jumbo frames be enabled for the backplane.  Please see the following article (it's unrelated to your original issue, just wanted to provide it as a reference to enabling jumbo frames): 

 

https://support.citrix.com/article/CTX210360 - Latency observed when the traffic goes via NetScaler Cluster

 

Posted

 

Foi o Logs que conseguir.

 

 


Unable to get the logs.

 

Come on, I have this problem from 08/01. As per Citrix support I have to do the following steps:

 

-We have to change RPC password on Netscaler

-Other process would be to remove the IP address .12 and put it back in active mode

- Check Layer 2 connectivity between Netscaler .12 and switch and IP virtual cluster

- Force sync cluster to verify connectivity

 

Erro_SYNC.jpg

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...