RODRIGO SOUZA1709155672 Posted August 2, 2019 Posted August 2, 2019 When disconnecting one of the equipment (Node 2 - 10.254.X.X) to move the power cable of the same, when being reconnected, the synchronization of node 2 in the Cluster did not occur correctly. What procedures can I do to get my cluster back to normal, active-active.
Jim Grimm1709160134 Posted August 3, 2019 Posted August 3, 2019 From the shell, browse to /var/log/ns.log and run the following command to get any events related to the cluster. This should give you a better idea of what is causing it be be inactive and from there you can determine what action to take. Feel free to post a snippet of the log or a scrubbed copy of the log if you can we can figure out what's going on. cat ns.log | grep CLUSTERD Here is a sample from my Cluster instance, where the cause of the node becoming unavailable was a backplane failure (caused by the NIC being disabled): Quote root@netscaler-b# cat ns.log | grep CLUSTERD Aug 2 23:41:11 <local0.info> 10.10.10.248 08/02/2019:23:41:00 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789260078583 : "10.10.10.248: CSM: p(ELECTION) n(RECOVERY) View VID(10.10.10.248:13) Leader 10.10.10.248 OVS 8 CVS((10.10.10.248-3)(10.10.10.251-0)(10.10.10.250-2) " Aug 2 23:41:11 <local0.info> 10.10.10.248 08/02/2019:23:41:01 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789261626588 : "10.10.10.248: CSM: p(SERVICE) n(CLAIMING) " Aug 2 23:41:11 <local0.info> 10.10.10.248 08/02/2019:23:41:01 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789261711583 : "10.10.10.250: DVS: p(10.10.10.248 10.10.10.251 ) n(10.10.10.248 10.10.10.251 10.10.10.250 ) " Aug 2 23:41:11 <local0.info> 10.10.10.248 08/02/2019:23:41:01 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789261798591 : "10.10.10.248: NODE: p(FREE) n(VIEW_MANAGER) " Aug 2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789464715586 : "10.10.10.250: DVS: p(10.10.10.248 10.10.10.251 10.10.10.250 ) n(10.10.10.248 10.10.10.251 ) " Aug 2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789464715586 : "10.10.10.250: DVS: p(10.10.10.248 10.10.10.251 10.10.10.250 ) n(10.10.10.248 10.10.10.251 ) "Aug 2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789464715586 : "10.10.10.250: HB: p(UP) n(BKPLANE_FAIL) " Aug 2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789464715586 : "10.10.10.250: HB: p(UP) n(BKPLANE_FAIL) " Aug 2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789464715586 : "10.10.10.248: DVS: p(10.10.10.248 10.10.10.251 10.10.10.250 ) n(10.10.10.248 10.10.10.251 ) " Aug 2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789464715586 : "10.10.10.248: DVS: p(10.10.10.248 10.10.10.251 10.10.10.250 ) n(10.10.10.248 10.10.10.251 ) " Aug 2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789464715586 : "10.10.10.248: CSM: p(SERVICE) n(CLAIMING) " Aug 2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789464715586 : "10.10.10.248: CSM: p(SERVICE) n(CLAIMING) " Aug 2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789464715586 : "CSM: CLAIMING time 148 ms " Aug 2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789464715586 : "CSM: CLAIMING time 148 ms " Aug 2 23:44:24 <local0.info> 10.10.10.248 08/02/2019:23:44:24 GMT 3-PPE-0 : default CLUSTERD Message 0 1564789464715586 : "10.10.10.248: NODE: p(VIEW_LEADER) n(FREE) " From the CLI (not the shell) you can run show cluster node 2 to see more information about the health status of that particular node, which may also help you narrow down the issue. See the example below that also shows an issue with the backplane interface on this particular node.
Jim Grimm1709160134 Posted August 4, 2019 Posted August 4, 2019 Can you try forcing a configuration sync on node two? From node 2, run force cluster sync If that doesn't help, can you try disabling and then re-enabling the cluster instance on node2 by using the following commands from the NSIP of the node in question? disable cluster instance 1 enable cluster instance 1 In the first screenshot I can see a BKPLANE_FAIL message followed by a LB_STATE_SYNC_INPROG message, but the status of node2 in the second screenshot is: Quote NOT UP Reasons: Service state is being synchronized with cluster Did you happen to check the /var/log/ns.log file for any relevant events around the time node2 was brought back online? Events before or after CLUSTERD events would be of interest if further investigation is required. If disabling and re-enabling the node doesn't resolve the issue, would it be a problem for you to remove that node from the cluster and then re-join it to the cluster? Not sure if that's necessary, but based on the information you've provided, those would be my next steps. Also, I noticed the following in the output from your second screenshot: Quote Jumbo frames are not supported. To support jumbo frames, set MTU for backplane switch interfaces to be greater than or equal to 1532 bytes While likely unrelated to your current issue, Citrix recommends that jumbo frames be enabled for the backplane. Please see the following article (it's unrelated to your original issue, just wanted to provide it as a reference to enabling jumbo frames): https://support.citrix.com/article/CTX210360 - Latency observed when the traffic goes via NetScaler Cluster
RODRIGO SOUZA1709155672 Posted August 5, 2019 Author Posted August 5, 2019 Foi o Logs que conseguir. Unable to get the logs. Come on, I have this problem from 08/01. As per Citrix support I have to do the following steps: -We have to change RPC password on Netscaler -Other process would be to remove the IP address .12 and put it back in active mode - Check Layer 2 connectivity between Netscaler .12 and switch and IP virtual cluster - Force sync cluster to verify connectivity
Recommended Posts
Archived
This topic is now archived and is closed to further replies.