Jump to content
Welcome to our new Citrix community!
  • 0

Network setup lost after restart


di zhang1709161066

Question

We have a XenServer (version 7.1.2) cluster with 16 Dell R740xd servers and each has two Mellanox ConnectX-4 LX ethernet cards. We configured bond1 for eth0 & eth2 as XenServer management network. eth1 and eth3 are configured as iSCSI multipath network. 
We are facing a strange issue, if we add this machine into the cluster, and reboot it, then we will see this machine lost the management network after the reboot, and this machine cannot connect to pool master. The network status shows ‘<Unknown> <Unknown>-‘ under the XenServer Status Display page on the screen. See attached screenshots.
Meanwhile, we can also login to DOM0 via the IP address on the bond1(management IP).

 


We checked the xensource.log with lots of errors:
Jun 19 16:34:46 wxac6016 xapi: [error|wxac6016|0 |bringing up management interface D:94536c65cf05|master_connection] Connection to master did. I will continue to retry indefinitely (supressing future logging of this message).
Jun 19 16:32:03 wxac6016 xcp-networkd: [ info|wxac6016|13 |host.signal_networking_change D:0e8151f85b79|network_utils] Looking for inet in [6: xapi1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000#012    link/ether 98:03:9b:61:1b:b8 brd ff:ff:ff:ff:ff:ff#012    inet 10.40.38.237/24 brd 10.40.38.255 scope global xapi1#012       valid_lft forever preferred_lft forever#012]
Jun 19 16:32:14 wxac6016 xcp-networkd: [debug|wxac6016|1 ||network_monitor_thread] Error while sending alert BONDS_STATUS_CHANGED: Server_eror(HOST_STILL_BOOTING, [  ])#012Raised at file "lib/client.ml", line 7, characters 38-74#012Called from file "lib/client.ml", line 19, characters 56-109#012Called from file "lib/client.ml", line 2849, characters 6-87#012Called from file "networkd/network_monitor_thread.ml", line 42, characters 18-150#012Called from file "networkd/network_monitor_thread.ml", line 70, characters 5-51

 

 

If this machine is not joined the cluster, the network works fine after we reboot.

Any suggestions would be appreciated.

Link to comment

6 answers to this question

Recommended Posts

  • 0

Not likely ... that's not a hard enforced limit.  Is NTP properly configured o all hosts and they are within reasonable tolerances (check with "ntptstat -s")?

 

Not sure if this is related, I see this happen even when doing updates and rebooting; I have to do an emergency network reset and manually reconfigure the various iSCSI and NFS settings after rebooting and the host reconnects to the pool. I've seen this with 7.1 a number of times. It's very frustrating because it happens randomly and of course it's not easy to reproduce, hence getting Citrix to open a case on it would probably not lead to much. You could report it to https://bugs.xenserver.org/ and at least if others chime it, it might encourage them to investigate this internally.

 

When adding a host to a pool. multipathing should be disabled and the same primary management interface network used as with all others or confusion can set in. A single interface should always be used before adding a host to a pool, never a bond.

 

-=Tobias

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...