We have a very old host running XenServer 7.0 for two Windows Server 2012R2 VM's, one a large DB server and the other a web server. Local LVM storage using SSD's. We were deploying Veeam Agent as a backup service on the large DB VM (160GB RAM, 4.5TB of storage across multiple partitions) and seemed to be the cause. Not long in to the initial backup it throw up some errors and the backup failed.
Windows reported event 153 warnings complaining about storage: The IO operation at logical block address 0xcaf0 for Disk 3 (PDO name: \Device\0000003e) was retried.
It reported this against multiple partitions.
Digging further I've found in /var/log/daemon.log lots of errors like this:
May 22 21:07:31 hyp tapdisk[13252]: tap-err:guest_copy2: 31/51776, ring=0x2480010: req 7449327995633631470: failed to grant-copy segment 0: -3
May 22 21:07:31 hyp tapdisk[13252]: tap-err:tapdisk_xenblkif_complete_request: 31/51776, ring=0x2480010: req 7449327995633631470: failed to copy from/to guest: Input/output error
May 22 21:07:31 hyp tapdisk[13252]: tap-err:guest_copy2: 31/51776, ring=0x2480010: req 7449327995633631471: failed to grant-copy segment 0: -3
May 22 21:07:31 hyp tapdisk[13252]: tap-err:tapdisk_xenblkif_complete_request: 31/51776, ring=0x2480010: req 7449327995633631471: failed to copy from/to guest: Input/output error
The hardware is a Dell R730 with H730P PERC and some Samsung SM883 SSD's. The disks do not report any issues in SMART and the controller log has not recorded anything. We ran our old backup solution afterwards which completed without any issues.
Now obviously using outdated things like Xenserver 7.0 and soon to be EOL 2012R2 isn't particularly useful, and we'd like to upgrade to something newer, but we were hoping that by replacing out our old backup system with Veeam that it would provide us a path to doing that with better DR should there be issues as the current system is quite slow to restore. Veeam Agent runs fine on the small 150GB VM on this host without throwing up issues. .
Any thoughts on the cause of this and what we could do?
The host has been up for 339 days, we could arrange a maintenance to reboot perhaps although it's brought up some nerves about issues if we do that.
Question
John Hunt1709152117
Hi,
We have a very old host running XenServer 7.0 for two Windows Server 2012R2 VM's, one a large DB server and the other a web server. Local LVM storage using SSD's. We were deploying Veeam Agent as a backup service on the large DB VM (160GB RAM, 4.5TB of storage across multiple partitions) and seemed to be the cause. Not long in to the initial backup it throw up some errors and the backup failed.
Windows reported event 153 warnings complaining about storage: The IO operation at logical block address 0xcaf0 for Disk 3 (PDO name: \Device\0000003e) was retried.
It reported this against multiple partitions.
Digging further I've found in /var/log/daemon.log lots of errors like this:
May 22 21:07:31 hyp tapdisk[13252]: tap-err:guest_copy2: 31/51776, ring=0x2480010: req 7449327995633631470: failed to grant-copy segment 0: -3
May 22 21:07:31 hyp tapdisk[13252]: tap-err:tapdisk_xenblkif_complete_request: 31/51776, ring=0x2480010: req 7449327995633631470: failed to copy from/to guest: Input/output error
May 22 21:07:31 hyp tapdisk[13252]: tap-err:guest_copy2: 31/51776, ring=0x2480010: req 7449327995633631471: failed to grant-copy segment 0: -3
May 22 21:07:31 hyp tapdisk[13252]: tap-err:tapdisk_xenblkif_complete_request: 31/51776, ring=0x2480010: req 7449327995633631471: failed to copy from/to guest: Input/output error
May 22 21:07:31 hyp qemu-dm-31[13794]: XENVBD|PdoCompleteResponse:Target[4] : READ BLKIF_RSP_ERROR (Tag 31d80e0)
May 22 21:07:31 hyp qemu-dm-31[13794]: XENVBD|PdoCompleteResponse:Target[4] : READ BLKIF_RSP_ERROR (Tag 31d80e1)
May 22 21:07:31 hyp qemu-dm-31[13794]: XENVBD|PdoCompleteResponse:Target[4] : READ BLKIF_RSP_ERROR (Tag 31d80e2)
The hardware is a Dell R730 with H730P PERC and some Samsung SM883 SSD's. The disks do not report any issues in SMART and the controller log has not recorded anything. We ran our old backup solution afterwards which completed without any issues.
Now obviously using outdated things like Xenserver 7.0 and soon to be EOL 2012R2 isn't particularly useful, and we'd like to upgrade to something newer, but we were hoping that by replacing out our old backup system with Veeam that it would provide us a path to doing that with better DR should there be issues as the current system is quite slow to restore. Veeam Agent runs fine on the small 150GB VM on this host without throwing up issues. .
Any thoughts on the cause of this and what we could do?
The host has been up for 339 days, we could arrange a maintenance to reboot perhaps although it's brought up some nerves about issues if we do that.
Thanks,
John
Link to comment
3 answers to this question
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now