Jump to content
Welcome to our new Citrix community!
  • 0

vm network disconnected


tomeu sastre

Question

hi there.

I'm testing all sort of failure combinations

 

I have a cluster of 3 servers with HA and an SAN storage.

I disconnected the network form a host to the LAN, the network that gives access to the VM (core), so I was expecting that once xenserver noticed that a host had these NIC disconnected, the host will reboot and free the VM... but nothing happened.  I still see (two days from the test) that that host still has 2 VM working, but I cannot connet them and those vM cannnot see the LAN... should happened something here for resolving this ? If it where a production envelopment there will be VM that wouldn't be accessible to the users meanwhile other hosts would be working normally.

 

by the way, same test done on management interfaces and there has been a host reboot... and I really don't know why, because I could access to that host with the other management servers I expected the behaviour the other way.

 

could anyone explain me this procedures ?  thanks.

Link to comment

21 answers to this question

Recommended Posts

  • 1

Typically your management and VM networks as well as your storage networks are all on separate networks. Are your NTP servers local, I take it?

The management interface is what will determine if HA is working or not, together with the external storage heartbeat.  A failure of VM networking isn't going to be sufficient.

This article explains the process well along with how the different components interact: https://www.citrix.com/blogs/2008/09/17/peeking-under-the-hood-of-high-availability/

and for a general article on this, see https://docs.citrix.com/en-us/citrix-hypervisor/high-availability.html and https://support.citrix.com/article/CTX129721?_ga=2.196434287.239489374.1563134819-1753394139.1511496667

 

What does XenCenter show for your HA configuration status, in the normal as well as partially failed state?

 

-=Tobias

  • Like 2
Link to comment
  • 1

I assume the VM(s) in question have been set to restart if needed. Do you see a warning issues of the host failure? If it's only a partial failure - depending on your redundancy setup or lack of one - it may not be sufficient to trigger the VM being restarted on another host. There is enough capacity to take on the VMs on thae failed host? Are the hosts are proper;y synchronized with each other with NTP (check each with ntpstat -s" to see the offsets)? You heartbeat is pointing to some external storage device and you are licensed for HA?

What version of XenServer is running?

 

-=Tobias

  • Like 1
Link to comment
  • 1

That's not the way HA is deigned to work, as I understand it. It has to be an actual host failure, not a failure on the VM's network.

 

HA isn't perfect; your alternative for that situation is to write a script that monitors VM traffic and if detected, shuts down the VM on the one host and tries starting it on another.

 

-=Tobias

  • Like 1
Link to comment
  • 1

The management interface is critical for inter-communications among all hosts in a pool. They need to be aware of which VMs are running on which host, patch updates, changes to global network and storage settings, etc. Any changes to the configuration on a host has to be communicated to the pool master so that the DBs on each host can be updated, in case another host needs to be promoted to be the new pool master.

 

-=Tobias

  • Like 1
Link to comment
  • 0

hi there.

 

Thank you very much for your answer Tobias, its really full of knowledge !

 

I checked everything you said, and I think that there is a problem with the time synchronization as each servers says

"unsynchronised
   polling server every 8 s"

 

checked the NTP and now there 2 on each servers, also something should be wrong with the firewall rules or networking routes because I cannot reach the NTP servers (neither do a ping to web name but yes I can do to a IP)  so now I'm trying to solve this DNS problem.

Link to comment
  • 0

well, I noticed that only the master was able to resolve NTP, so I tried to rotate the others servers to the master role, and now all of them are able to resolve ntp... strange way of resolving it ?

 

now I'm going to do that disconnection VM network (Core) and see what happen to the VM that get isolated from the LAN.

 

something still wrong... now Core NIC its disconnected but now it still says that is synchronized.

 

my networks :

LAN : 192.168.222.0/24

 

Mgmt NIC : 192.168.222.71-73/24  bonded

Core NIC (VM) : 192.168.223.71-73/24 bonded

SAN : 10.0.2.71-73/24 bonded

 

may be should Core and LAN  be in the same subnet and management in a different one ?

Link to comment
  • 0

well tried changing the subnet order

 

Mgmt NIC : 192.168.223.71-73/24  bonded

Core NIC (VM) : 192.168.222.71-73/24 bonded

 

after a while I can see that there is synchronization, but as before, when I disconnect the NIC core the VM remain in the same host in a isolated state, I cannot reach them via ping (for example).

 

 

if I do the ntpstat it shows

[16:47 H2 ~]# ntpstat -s
synchronised to NTP server (150.214.94.5) at stratum 2
   time correct to within 48 ms
   polling server every 256 s

 

Link to comment
  • 0

Ok, thank you for your answers.

 

it make me clear how it works and what shall I expect in every situation. For that at this moment XS and the tests are working as expected as you have pointed me.

 

about the script I thought about it, but I preferred to be sure how HA was working.

 

thanks again for you time I really learned better how should I expect HA to work

Link to comment
  • 0
4 hours ago, Tobias Kreidl said:

That's not the way HA is deigned to work, as I understand it. It has to be an actual host failure, not a failure on the VM's network.

 

HA isn't perfect; your alternative for that situation is to write a script that monitors VM traffic and if detected, shuts down the VM on the one host and tries starting it on another.

 

-=Tobias

I will try to do something about that... don't know when, but I think it will be funny to do that.

Link to comment
  • 0
8 hours ago, Tobias Kreidl said:

Good luck with the next stage! It's always a learning experience and that makes it also interesting. :5_smiley:

 

-=Tobias

Thanks. I have a lot of thing to do, because I'm creating a Disaster Recovery CPD, but I  will stay in the to-do list. :)

Link to comment
  • 0

just another question about HA

 

why does the XS reboot a server when it lost its management interface though their VM are able to reach the network ?

 

by the way, I've been years using xenserver, but the first installation was made by a company and I always supposed that something was not correctly configured, so now, we're installing from scratch new hardware (3 hosts Xeon Silver 4110 and a SAN storage with SSD) and I want to check everything to be sure that we will not have strange results when a server goes down.

 

Once migrated to the new servers I will do the same with the "old" ones, my guess is that there is a switching problem in the SAN zone because there are 2 switch but they are not able to stack.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...