Jump to content
Welcome to our new Citrix community!
  • 0

What log will show me mutipath failure on XenServer 8.1


David Ross1709161745

Question

Hi,

 

Recently I had a storage switch failure. We have multipathing setup on the resource pool however the failover didn't appear to work properly and our numerous vdis were left in a hung state.

 

Will this be recorded in any of the logs on the XenServer hosts? If it is which one should I look in?

 

My understanding of these logs is basic. Can anyone advise of the the types of errors I should look out for?

 

Thanks

 

Link to comment

15 answers to this question

Recommended Posts

  • 0
10 minutes ago, Tobias Kreidl said:

Storage issues generally are logged to /var/log/SMlog. You can grep for things in varisous logs with a command such as

grep -i multipath /var/log/(NAME-OF-LOGFILE)

For examples of errors of at least one variety, see for example https://bugs.xenserver.org/browse/XSO-965?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&showAll=true

 

 

-=Tobias

Thank you Tobias. I'll try this now.

Link to comment
  • 0

Since multipathing is a per server action, move all of the VM's off of a server in the pool and work with a test VM. You should 

be able to  take your storage interfaces up/down individually for troubleshooting with ifcfg ethx up/down without losing

storage access to the VM.  As Tobias states SMlog should give you logging.

 

--Alan--

 

Link to comment
  • 0
On ‎2‎/‎5‎/‎2021 at 8:25 PM, Alan Lantz said:

Since multipathing is a per server action, move all of the VM's off of a server in the pool and work with a test VM. You should 

be able to  take your storage interfaces up/down individually for troubleshooting with ifcfg ethx up/down without losing

storage access to the VM.  As Tobias states SMlog should give you logging.

 

--Alan--

 

Thank you Alan.

 

We have a DR resource pool which we are able to repilcate failover on and it works fine. 3 out of 6 paths active and the VMs don't hang. 3 out of 6 paths on the faulty pool and the vms hang.

 

We have two other resource pools three in total including this one. I've run multipath -v 2 command from each pool. Below is the output from the resource pool that only works on 3 paths.

 

 multipath -v 2
Feb 07 11:42:41 | sdc: couldn't get asymmetric access state
Feb 07 11:42:41 | sde: couldn't get asymmetric access state
Feb 07 11:42:41 | sdg: couldn't get asymmetric access state
Feb 07 11:42:41 | sdi: couldn't get asymmetric access state
Feb 07 11:42:41 | sdk: couldn't get asymmetric access state
Feb 07 11:42:41 | sdm: couldn't get asymmetric access state
Feb 07 11:42:41 | Warning: should_multipath() only based on wwids. dev = sda
Feb 07 11:42:41 | Warning: should_multipath() only based on wwids. dev = sdc
Feb 07 11:42:41 | Warning: should_multipath() only based on wwids. dev = sde
Feb 07 11:42:41 | Warning: should_multipath() only based on wwids. dev = sdg
Feb 07 11:42:41 | Warning: should_multipath() only based on wwids. dev = sdi
Feb 07 11:42:41 | Warning: should_multipath() only based on wwids. dev = sdk
Feb 07 11:42:41 | Warning: should_multipath() only based on wwids. dev = sdm

 

 

When I run multipath -v 2 on the other two pools which work fine they both return the output below

 

Feb 07 11:42:41 | Warning: should_multipath() only based on wwids. dev = sda

 

Does that mean anything?

 

Thanks

Link to comment
  • 0
13 hours ago, Tobias Kreidl said:

^^^

That's why it would be helpful to see the multipath status with "multipath -ll" to see what paths are active or not. If it's an iscsi-based connection,

some output from iscsiadm would also be helpful.

 

-=Tobias

The resource pool host members haven't been rebooted since the switch failure. Only a toolstack restart has been performed.

 

Would it be a good idea to logout of the iSCSI connections on each of the hosts in the pool, reboot the servers then logon to each connection individually when they come back up?

 

When you say "Make your your IQN entries all match" Do you mean the ones on the hosts in the pool?

 

We have two iSCSI volumes connecting to the faulting resource pool. Here are the multipath -ll outputs:

 

36742b0f0000007f10000000000702c26 dm-3 NFINIDAT,InfiniBox
size=2.1T features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 2:0:0:13 sdo 8:224 active ready running
  |- 5:0:0:13 sdr 65:16 active ready running
  |- 3:0:0:13 sdp 8:240 active ready running
  |- 6:0:0:13 sds 65:32 active ready running
  |- 4:0:0:13 sdq 65:0  active ready running
  `- 7:0:0:13 sdt 65:48 active ready running
36742b0f0000007f100000000006ffff6 dm-1 NFINIDAT,InfiniBox
size=541G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 2:0:0:12 sdd 8:48  active ready running
  |- 5:0:0:12 sdj 8:144 active ready running
  |- 3:0:0:12 sdf 8:80  active ready running
  |- 6:0:0:12 sdl 8:176 active ready running
  |- 4:0:0:12 sdh 8:112 active ready running
  `- 7:0:0:12 sdn 8:208 active ready running

 

thanks
 

Link to comment
  • 0

Looks like you have six paths to each volume -- does that seem right? If you can migrate your VMs from one host to another and reboot that host w.o running VMs to see if it comes back OK, that might help. You can then see if the iSCSI connections are correctly restored or not. If not, you can go through the isciadm discovery process.

 

-=Tobias

Link to comment
  • 0
1 minute ago, Tobias Kreidl said:

Looks like you have six paths to each volume -- does that seem right? If you can migrate your VMs from one host to another and reboot that host w.o running VMs to see if it comes back OK, that might help. You can then see if the iSCSI connections are correctly restored or not. If not, you can go through the isciadm discovery process.

 

-=Tobias

Yes Tobias there are 6 paths to each volume.

We plan to check the config on the network and switches the hosts are attached to. Then compare with the working pools. These other pools failover without issue. However none of the working hosts are on the switches that don't work.

 

I'll recommend moving the VMs and rebooting. 

 

Thanks

 

Link to comment
  • 0
18 hours ago, Tobias Kreidl said:

Your paths still look odd to me, as if they are all coming from one connection. A typical multipath output should look something like this, with the backup connection at a lower priority.

See the example in this article, which also contains some helpful info on monitoring your system:

https://www.golinuxhub.com/2018/05/tutorial-cheatsheet-begineers-guide-dm-multipath-rhel-linux/

 

We are using bonded nics for the storage along with (obviously) multipathing. We've discussed this with Citrix however they say that's fine. Whilst it seems some of their documentation counters this with using one or the other. Not both. Nothing specifically says it won't work however nor does it go as far as saying it will fail and our other two resource pools are configured the same and work.

 

The Hypervisor/Xenserver we're using is version 8.1 with no patches and we're in the middle of upgrading to 8.2. I don't see any fixes or known issue in the docs with either of these versions.

 

I've been looking at the document below is this still relevant for the current Hypervisor versions?

 

https://www.yumpu.com/en/document/read/8998291/configuring-iscsi-multipathing-support-for-xenserver-citrix-

 

Thanks

Link to comment
  • 0

I wouldn't bond on multipathing either. I would do bonding or multipathing but not both.

You can multipath multiple interfaces so bonding really becomes unncessary. And no,

with storage I wouldn't VLAN or route traffic either and use jumbo frames. Keep latency

as low as possible.

 

--Alan--

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...