Jump to content
Updated Privacy Statement
  • 0

tapdisk throwing input/output error on Windows Server 2012R2 VM


John Hunt1709152117

Question

Hi,

We have a very old host running XenServer 7.0 for two Windows Server 2012R2 VM's, one a large DB server and the other a web server. Local LVM storage using SSD's. We were deploying Veeam Agent as a backup service on the large DB VM (160GB RAM, 4.5TB of storage across multiple partitions) and seemed to be the cause. Not long in to the initial backup it throw up some errors and the backup failed.

 

Windows reported event 153 warnings complaining about storage: The IO operation at logical block address 0xcaf0 for Disk 3 (PDO name: \Device\0000003e) was retried.

 

It reported this against multiple partitions. 

 

Digging further I've found in /var/log/daemon.log lots of errors like this:

 

May 22 21:07:31 hyp tapdisk[13252]: tap-err:guest_copy2: 31/51776, ring=0x2480010: req 7449327995633631470: failed to grant-copy segment 0: -3

May 22 21:07:31 hyp tapdisk[13252]: tap-err:tapdisk_xenblkif_complete_request: 31/51776, ring=0x2480010: req 7449327995633631470: failed to copy from/to guest: Input/output error

May 22 21:07:31 hyp tapdisk[13252]: tap-err:guest_copy2: 31/51776, ring=0x2480010: req 7449327995633631471: failed to grant-copy segment 0: -3

May 22 21:07:31 hyp tapdisk[13252]: tap-err:tapdisk_xenblkif_complete_request: 31/51776, ring=0x2480010: req 7449327995633631471: failed to copy from/to guest: Input/output error

May 22 21:07:31 hyp qemu-dm-31[13794]: XENVBD|PdoCompleteResponse:Target[4] : READ BLKIF_RSP_ERROR (Tag 31d80e0)

May 22 21:07:31 hyp qemu-dm-31[13794]: XENVBD|PdoCompleteResponse:Target[4] : READ BLKIF_RSP_ERROR (Tag 31d80e1)

May 22 21:07:31 hyp qemu-dm-31[13794]: XENVBD|PdoCompleteResponse:Target[4] : READ BLKIF_RSP_ERROR (Tag 31d80e2)

 

The hardware is a Dell R730 with H730P PERC and some Samsung SM883 SSD's. The disks do not report any issues in SMART and the controller log has not recorded anything. We ran our old backup solution afterwards which completed without any issues. 

 

Now obviously using outdated things like Xenserver 7.0 and soon to be EOL 2012R2 isn't particularly useful, and we'd like to upgrade to something newer, but we were hoping that by replacing out our old backup system with Veeam that it would provide us a path to doing that with better DR should there be issues as the current system is quite slow to restore. Veeam Agent runs fine on the small 150GB VM on this host without throwing up issues. . 

 

Any thoughts on the cause of this and what we could do?

 

The host has been up for 339 days, we could arrange a maintenance to reboot perhaps although it's brought up some nerves about issues if we do that. 

 

Thanks,

 

John

Link to comment

3 answers to this question

Recommended Posts

  • 0

Hi Mark,

I narrowed this down to being a grant table exhaustion issue after further reviewing of the logs. We replicated the problem on different hardware with the same configuration and software. From logs:

 

May 22 21:07:31 hyp qemu-dm-31[13794]: GNTTAB: MAP XENMAPSPACE_grant_table[33] @ 00000000.f0022000

May 22 21:07:31 hyp qemu-dm-31[13794]: XENBUS|CacheCreateObject: fail2

May 22 21:07:31 hyp qemu-dm-31[13794]: XENBUS|GnttabExpand: added references [00004200 - 000043ff]

May 22 21:07:31 hyp qemu-dm-31[13794]: XENBUS|CacheCreateObject: fail1 (c000009a)

May 22 21:07:31 hyp qemu-dm-31[13794]: XENBUS|GnttabPermitForeignAccess: fail1 (c000009a)

 

Upgrading XenServer to a version which used a higher grant table didn't fix this though.

 

There was also apparently a fix in the XenTools version 9.3.1 from January 2023 which then allowed Windows VMs to utilise the higher grant tables required by resource hungry VMs (the VM itself has 180GB RAM, 16-cores, 3 NIC's and 5 disks). This is referred to in:

 

https://support.citrix.com/article/CTX235403/updates-to-citrix-vm-tools-for-windows-for-xenserver-and-citrix-hypervisor

 

Once we had an increased grant table AND uninstalled the old tools, then installed the new tools v 9.3.1 it fixed this. 

 

 

Thanks for the feedback though on the Veeam bug. Is this the one you are referring to or is it something else? https://www.veeam.com/kb4428

 

John

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...