Jump to content
Welcome to our new Citrix community!
  • 0

Storage Repository not appearing in /dev after issues with SAN storage


Iacovos Zacharides

Question

Hello,

 

Today, we attempted to upgrade our Nimble CS235 SAN from firmware version 3.9.1.0-619834-opt to 5.0.8.0-677726-opt.
The upgrade started normally, failover from controller A to B was successful, installation and reboot on controller A was successful, but at some point in the upgrade, (I believe it was the point where failover back to the updated controller A was supposed to occur), our Xen 7.1 Pool that is using the volumes on this SAN became unresponsive, causing all the VMs on all 3 hosts to power down, and the 3 hosts in the pool to reboot unexpectedly.


After a few minutes we were able to login to the pool, where A kicked in, and our VMs started to boot as per the HA plan.

In the XenCenter Alerts section, we saw that all 3 hosts were fenced during that time, but the pool recovered after all hosts rebooted.

When the pool recovered, all the iSCSI Storage Repositories recovered, with the exception of the heartbeat SR, of which only 1 session from each host recovered (normally there are 2 sessions from each host to each SR).
We had to run iscsiadm -m node -T <targetname> --login to restore the missing session of the hearbeat SR on each host.

Another thing we noticed is that after the update, no Xen VDIs exist on one of the volumes/SRs. The reported size in XenCenter is 4MB/2TB used but on the SAN side, the reported size used is 755GB/2TB.


However, we are unsure if we previously had any VDIs on that volume, or if Xen HA has any mechanisms to move VDIs from one SR to another in cases were a SR is unavailable.
We've opened a case with HPE/Nimble to investigate, but we think it's wise to ask the Citrix community as well.

 

It's also worth noting that after the pool recovered fully, the specific volume/SR (Nimble2-01) appears mounted and available on all hosts, and we can view it using pvdisplay and vgdisplay and we can also view the MGT LV on the specific VG, but we do not see the specific volume/SR mounted in /dev on all hosts as we do with the rest of the volumes/SRs. Specifically, this VG only appears in the pool master /dev, but not on the other 2 pool members, even tough vgdisplay lists the VG properly.

 

So at this point, our questions are:
1: Xen HA has any mechanisms to move VDIs from one SR to another when HA is enabled and the SR may not be fully available.
2: If not, how to identify the source of the reported capacity discrepancy (755GB vs 4MB used out of 2TB) between the SAN and Xen.
3: Why the SR does not appear as a VG in /dev, but vgdisplay lists it properly.
4: How safe it is to run a storage reclamation job on the SR at hand.

Link to comment

12 answers to this question

Recommended Posts

  • 0
3 minutes ago, Tobias Kreidl said:

This is an iSCSI connection? If so, did you try going through the whole iscsiadm discovery process once again with HA initially disabled?

 

-=Tobias

Hello Tobias,

Yes this is an iSCSI connection, I did not mention that.

We did not go through the discovery process, as we were unsure if any VDI's exist on the volume at hand, and we were worried that if we did, if we had duplicate VDIs, that could potentially cause issues with VMs.

It also made sense that if perhaps the SR is empty, then only the pool master would have it mounted in /dev, since only the MGT LV exists on that VG

Link to comment
  • 0

I would think the device would still show up as a mounted area under /dev/VG... even if it contained no VHDs.

 

If concerned about multiple entries, you could always temporarily turn off multipathing if making use of it. An iscsiadm discovery process should then certainly not have any negative effects.

 

-=Tobias

Link to comment
  • 0
8 minutes ago, Tobias Kreidl said:

I would think the device would still show up as a mounted area under /dev/VG... even if it contained no VHDs.

 

If concerned about multiple entries, you could always temporarily turn off multipathing if making use of it. An iscsiadm discovery process should then certainly not have any negative effects.

 

-=Tobias

Thank you for the suggestion Tobias, I will give it a go tomorrow and post back with results. 

We're also thinking of just rebooting the hosts entirely and see if we have any changes. That will probably have the same effect on the iSCSI discovery process.

 

Do you know if Xen HA has any such mechanisms for migrating VDIs from one SR to another in cases were connectivity issues with an SR?

The time it took for the VMs to recover does not point to that direction, but it would definitely be a relief to confirm if Xen has any such functionality.

In the official documentation from Citrix regarding how HA works, there's nothing that would point in that direction, so I assume no, but I'd like to confirm as such.

 

Thank you

Link to comment
  • 0

No, the HA under XS will only allow you to restart the VM on a different server within the pool and only on a pooled storage repository (SR). You'd have to work out some means to manually migrate the storage, and in fact if things were flaky like that, you might not even want to risk moving the storage should something go really bad and lead to corruption. Having a good backup/recovery DR plan is always important for that reason. Just had a VM go bad recently and went though a recovery process (storage was not iSCSI in my case, not that this really matters).

 

-=Tobias

Link to comment
  • 0

Tobias, thank you for the insight. That's what I figured.

For the record, I put one of the hosts in maintenance mode and did a reboot. I also left a couple VMs powered off on it.

After the host reboot, no SRs appeared in /dev. As soon as I booted one of the VMs, the SR on which the VDI of the VM resides on, appeared in /dev, so it seems that SRs are only available in /dev if they are in use by the host.

 

I did not reboot any of the other 2 hosts in the pool, but I noticed that our Heartbeat SR, the Multipathing status reports the following:

  • HOST01: 2 of 2 paths active (1 iSCSI sessions)
  • HOST02: 2 of 2 paths active (1 iSCSI sessions)
  • HOST03: 2 of 2 paths active (2 iSCSI sessions)

 

HOST03 is the one that was rebooted today.

On HOST02, I attempted to login to the heartbeat iSCSI target:

iscsiadm -m node -T <IQN> --login

But nothing changed, so I added the IP parameter of both iSCSI NICs on the storage side:

iscsiadm -m node -T <IQN> -p <IP1> --login
iscsiadm -m node -T <IQN> -p <IP2> --login

But again, the number of sessions is still at 1.

iscsiadm -m session lists 14 sessions as it should, and iscsiadm -m node -P 3 and multipath -ll show the heartbeat SR as active and present.

I also ran iscsiadm with the -s option on the target and I see stats on both paths, yet only 1 session is reported present in XenCenter.

 

On the host side it seems that everything is ok when I check the SR status via the terminal, but XenCenter seems to think otherwise.

 

Do you have any input as to why the discrepancies between the 2, or how to go about restoring the missing iSCSI session without having to reboot the host?

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...