Jump to content
Welcome to our new Citrix community!
  • 0

There was an SR backend failure - Can no longer take snapshots or scan the SR


Jarrod Coombes

Question

Suddenly today, I cannot take any snapshots and/or scan my SR, which is an HP MSA 1040 HBA, connected to my 2 server pool via multipathed 8GB/s fiber. This was working just fine until todays backup ran, and threw out all of these errors. Xenserver 7.2, old I know, but havent gotten a chance to update yet, but it is patched all they way through XS72E017.

 

Here is the output from the scan command:

 

[root@pool-master ~]# xe sr-scan uuid=SR-UUID
There was an SR backend failure.
status: non-zero exit
stdout: 
stderr: Traceback (most recent call last):
  File "/opt/xensource/sm/LVMoHBASR", line 243, in <module>
    SRCommand.run(LVHDoHBASR, DRIVER_INFO)
  File "/opt/xensource/sm/SRCommand.py", line 351, in run
    sr = driver(cmd, cmd.sr_uuid)
  File "/opt/xensource/sm/SR.py", line 147, in __init__
    self.load(sr_uuid)
  File "/opt/xensource/sm/LVMoHBASR", line 105, in load
    LVHDSR.LVHDSR.load(self, sr_uuid)
  File "/opt/xensource/sm/LVHDSR.py", line 199, in load
    self._undoAllJournals()
  File "/opt/xensource/sm/LVHDSR.py", line 1133, in _undoAllJournals
    self._handleInterruptedCloneOps()
  File "/opt/xensource/sm/LVHDSR.py", line 882, in _handleInterruptedCloneOps
    self._handleInterruptedCloneOp(uuid, val)
  File "/opt/xensource/sm/LVHDSR.py", line 919, in _handleInterruptedCloneOp
    self._undoCloneOp(lvs, origUuid, baseUuid, clonUuid)
  File "/opt/xensource/sm/LVHDSR.py", line 983, in _undoCloneOp
    self.lvmCache.remove(lvs[origUuid].name)
  File "/opt/xensource/sm/lvmcache.py", line 49, in wrapper
    ret = op(self, *args)
  File "/opt/xensource/sm/lvmcache.py", line 118, in remove
    lvutil.remove(path)
  File "/opt/xensource/sm/lvutil.py", line 509, in remove
    _remove(path, config_param)
  File "/opt/xensource/sm/lvutil.py", line 522, in _remove
    ret = cmd_lvm(cmd)
  File "/opt/xensource/sm/lvutil.py", line 157, in cmd_lvm
    stdout = pread_func([os.path.join(LVM_BIN, lvm_cmd)] + lvm_args, *args)
  File "/opt/xensource/sm/util.py", line 189, in pread2
    return pread(cmdlist, quiet = quiet)
  File "/opt/xensource/sm/util.py", line 182, in pread
    raise CommandException(rc, str(cmdlist), stderr.strip())
util.CommandException: Input/output error

Google searches are not turning up anything that seems to fit, so any help would be greatly appreciated.

 

Thanks,

 

EDIT: If I shut down any of my VMs, they won't turn back one and I get the same generic "SR Failed to complete" error.

 

EDIT2: Forgot to say that this is not a storage space issue. Just over 50% of my SR is free.

 

Link to comment

7 answers to this question

Recommended Posts

  • 0
38 minutes ago, Alan Lantz said:

util.CommandException: Input/output error. With a local physical disk this is usually a failed hard drive.  You could be having 

some sort of SR corruption going on, or it could just be the hosts need restarted.

 

--Alan--

 

 

Hardware failure was my first thought, but I confirmed that the MSA is not showing any issues, all the drives show as happy and healthy.

How could I determine if it SR corruption? 

Link to comment
  • 0

Errors should show up in logs or you could run "fsck -n" on the disk device (the -n prevents any actions from being taken), provided the device can be recognized by fsck.

It's generally better to run fsck when the filesystem is inactive, of course, as otherwise blocks will be changing as it runs. That might not be practical for an active environment.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...