Jump to content
Welcome to our new Citrix community!
  • 0

Urgent server down


Nathan Budd

Question

Am in serious need of some insights here!

 

Have taken the time to update XenServer 6, trying to get it all the way. The most update, however looks to have broken it. This is a super critical machine and I've been on it for 6 hours now.

XenCenter cannot see/connect to the host.

The Host seems to be stuck in maintenance mode. When I try to exist maintenance mode from xsconsole, I get ("'NoneType' object has no attribute 'xenapi'",)

The host is pingable and can ping out.

I cannot run any xe commands, it returns: Error: Connection refused (calling connect ).

 

Log located here: (and attached):

https://drive.google.com/drive/folders/1lheKOEPsv5IavtjcPNXmVrlT20jtxmKF?usp=sharing

 

 

df result:

[root@xenserver ~]# df

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sda1 4127440 2916964 1000812 75% /

none 304492 48 304444 1% /dev/shm

/opt/xensource/packages/iso/XenCenter.iso

57296 57296 0 100% /var/xen/xc-install

 

 

xe-toolstack-restart gives the below:

[root@xenserver ~]# xe-toolstack-restart

Executing xe-toolstack-restart

Stopping xapi: [ OK ]

Stopping the v6 licensing daemon: [ OK ]

Stopping the memory ballooning daemon: [ OK ]

Stopping perfmon: [ OK ]

Stopping the xenopsd daemon: [ OK ]

Stopping XCP RRDD plugin xcp-rrdd-iostat: [ OK ]

Stopping XCP RRDD plugin xcp-rrdd-squeezed: [ OK ]

Stopping XCP RRDD plugin xcp-rrdd-xenpm: [ OK ]

Stopping XCP RRDD plugin xcp-rrdd-gpumon: [ OK ]

Stopping the XCP RRDD daemon: [ OK ]

Stopping the XCP networking daemon: [ OK ]

Stopping the fork/exec daemon: [ OK ]

Stopping the multipath alerting daemon: cannot stop mpathal[FAILED]thalert is not running.

Starting the multipath alerting daemon: [ OK ]

Starting the fork/exec daemon: [ OK ]

Starting the XCP networking daemon: . [ OK ]

Starting the XCP RRDD daemon: [ OK ]

Starting XCP RRDD plugin xcp-rrdd-gpumon: [ OK ]

Starting XCP RRDD plugin xcp-rrdd-iostat: [ OK ]

Starting XCP RRDD plugin xcp-rrdd-squeezed: [ OK ]

Starting XCP RRDD plugin xcp-rrdd-xenpm: [ OK ]

Starting the xenopsd daemon: [ OK ]

Starting perfmon: [ OK ]

Starting the memory ballooning daemon: [ OK ]

Starting the v6 licensing daemon: [ OK ]

Starting xapi: OK [ OK ]

done.

xensource.zip

Link to comment

Recommended Posts

  • 1

Thanks all for your the assists! We had a win in the end via XenServer support, who had some exceptional people get us over the line thusly:

 

"xapi has the ability to read and manipulate the hidden PCI devices. However, xapi expects any hidden PCI device addresses to be specified with the domain, i.e. xxxx:xx:xx.x

The PCI address mentioned in extlinux.conf file was in the format xx:xx.x , however, it should be in the following format : 0000:xx:xx.x .After making the change and rebooting the server, everything got connected, however, all the information was lost as state.db was moved and recreated.

We tried to move the original state.db , however, that didn't help. We were able to introduce the local SR and create a diskless VM and attach the disk."

 

From here I just ran an export/import into a new VM on a new host and we were golden. The fix was so small, literally just a few characters, but it was buried deep and even took XS support a good amount of digging and escalation to get the vhd to a state we could mount it.

Special shout out to Richa Rastogi at XenServer/Citrix support who's mad skills were absolute godsend! No one else got close, until Richa rolled in and knocked it out.

  • Like 1
Link to comment
  • 0

For context: This is a standalone host with local storage and 2 vms.

 

XAPI doesn't seem to be running, but when it reboot the service it says it is ok:

[root@xenserver ~]# service xapi restart
Stopping xapi:                                             [  OK  ]
Starting xapi: OK                                          [  OK  ]
[root@xenserver ~]#

 

EDIT: When logging into xsconsole, I get the following: "The underlying Xen API xapi is not running.  This console will have  reduced functionality.  Would you like to attempt to restart xapi?".: Saying yes changes nothing.

 

 

I'm not familiar enough to know what multipathing is.

 

NTP on this host looks good:

[root@xenserver ~]# ntpstat -s
synchronised to NTP server (27.124.125.251) at stratum 3
   time correct to within 348 ms
   polling server every 256 s
 

 

 

Edited by nathanbudd
Link to comment
  • 0
15 minutes ago, Tobias Kreidl said:

That's pretty far off, but maybe not so much of an immediate concern.  It still shows up as in maintenance mode in XenCenter?

 

Is this a pool master or slave? You could try "xe pool-sync-database" to see if that may help.

 

-=Tobias

 

Thankyou so much for taking the time Tobias!

 

Xsconsole tells me it is in maintenance mode, and cannot leave.

XenCenter cannot connect to it at all, I get the following:

image.thumb.png.3fc7cea2b24c16fddea50b2f6ab72bb6.png

 

This is a master, as it is stand alone, there are no slaves.

 

 

There are no pools, just local storage (thought I am very unfamiliar with Xen).

 

Running the above command gives:

[root@xenserver ~]# xe pool-sync-database
Error: Connection refused (calling connect )

 

image.png

Link to comment
  • 0

I'm connect to via SSH (I can also access physical console).

 

Network section says it has no config, btu now and then, it populate with the expected data.

 

image.thumb.png.9ecda425ba035930025cb922828a3875.png

 

I've tried reboots, but it does not help. Unless you mean a reinstall? I really can't risk losing the data between now and the last backup. Unless there is a way to just copy the newer VMs off the system?

 

Correct the VMs are not available at all presently.

image.png

Edited by nathanbudd
edit
Link to comment
  • 0

The data should not get lost in a reboot (in theory) but you have no network, so you may have to do an emergency network reset and reconfigure your network (hopefully you have nots on this).  I have seen this disappear from different XS installations and in some cases, a reboot will fix it, in others, that information somehow got lost and had to be reconfigured.

 

But before that, you could try to use the "Configure Management Interface" option and reset it to what value it had before.

 

-=Tobias

Link to comment
  • 0
1 minute ago, Tobias Kreidl said:

And what version of XS did you upgrade to? It's always good to try at lest to evaluate if your hardware will be compatible with a new installation, for example, if network drivers might not be compatible with your NICs.

 

-=Tobias

This was a simple (seeming) version 6.5 incremental provided via the XenCenter "install updates" utility.

Link to comment
  • 0
12 minutes ago, Tobias Kreidl said:

The data should not get lost in a reboot (in theory) but you have no network, so you may have to do an emergency network reset and reconfigure your network (hopefully you have nots on this).  I have seen this disappear from different XS installations and in some cases, a reboot will fix it, in others, that information somehow got lost and had to be reconfigured.

 

But before that, you could try to use the "Configure Management Interface" option and reset it to what value it had before.

 

-=Tobias

I have tried, but it just tells me:

<no interfaces present>

 

Is it worth doing an emergency network reset?

 

I'm really confused how one autoupdate can cause such huge issues.

Link to comment
  • 0
2 minutes ago, Alan Lantz said:

I guess a bigger question is are your VM's running okay? Do  they run over other interfaces than your management 

interface ? Is your management interface bonded ? 

 

--Alan--

 

 

Thanks Alan,

 

The VMs are not OK. They are completely offline and I need them going any way I can. I'm unfortunately not sure what you mean by a "bonded" interface.

 

Link to comment
  • 0
4 minutes ago, Tobias Kreidl said:

I have unfortunately seen this before.  You probably have no other choice now but to do an emergency network reset at this point, but it will likely require still a reboot. Your data should still not be affected. I have had to do this several times and did not lose any data in my case.

 

-=Tobias

Thanks Tobias,

 

Do you know if there is any way I can copy the existing data from this Hosts's Local SR? (Even though it's telling me the SR is unavailable). Is there perhaps raw data files I can copy out out or something? A backup that I can run?

 

I really appreciate your help!

Link to comment
  • 0

If you see no NICs, from the CLI, can you run "ifconfig" and see if anything shows up at all? If not, again, I worry that there may be a driver incompatibility. Rebooting may or may not help in this case.

 

If you did an upgrade, you should in principle be able to reinstall and choose the revert to the previous version option, which might be the fastest way to get you at least up and runnign again and give you more time to consider your upgrade options.

 

-=Tobias

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...