Jump to content
Welcome to our new Citrix community!
  • 0

8.1 tapdisk/tap-ctl issue


Kyle Peterson

Question

Hello figure I would post here and see if anyone might have some insight into an issue I am having (I do have a case open with citrix)

 

I updated a pool from 8.0 to 8.1 and that was mostly fine, then I tried to do a vm metadata backup and that process got stuck and I cannot unplug the vbd tied to the control domain

End result is any vm that is running on the stuck server is still running but I cannot start any new vms/migrate/shutdown/restart

 

tap-ctl list just hangs and the following is stuck on the main server 

 

root       616  0.0  0.0   6460  1428 ?        S    Feb20   0:00 /usr/sbin/tap-ctl close -p 11858 -m 32
root      4269  0.0  0.0   6460  1372 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list -f /var/run/sr-mount/bb728e42-dc5f-e3d9-06d2-19ae61907be6/13e75fdd-261c-4f4f-a392-3560fc38e8f8.vhd
root      4279  0.0  0.0   6460  1380 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list -f /var/run/sr-mount/bb728e42-dc5f-e3d9-06d2-19ae61907be6/ace40025-6aa3-4093-b28f-c6d8e0aadfb3.vhd
root      4281  0.0  0.0   6460  1348 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list -f /var/run/sr-mount/bb728e42-dc5f-e3d9-06d2-19ae61907be6/fba29f31-6e73-4aa9-aac9-d2bf0c87761c.vhd
root      5668  0.0  0.0   6460  1360 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list -m 25
root      5672  0.0  0.0   6460  1408 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list -m 23
root      5675  0.0  0.0   6460  1324 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list -m 24
root      6902  0.0  0.0   6460  1368 ?        S    09:21   0:00 /usr/sbin/tap-ctl list -f /var/run/sr-mount/bb728e42-dc5f-e3d9-06d2-19ae61907be6/2e3d6b27-880b-4b60-b10a-b1eacd55a700.vhd
root      8377  0.0  0.0   6460  1328 ?        Ss   Feb20   0:00 /usr/sbin/tap-ctl list
root      8508  0.0  0.0   6460  1424 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list
root      9657  0.0  0.0 112720  2344 pts/4    S+   09:30   0:00 grep --color=auto tap-ctl
root     10126  0.0  0.0   6460  1344 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list -f /var/run/sr-mount/bb728e42-dc5f-e3d9-06d2-19ae61907be6/2e3d6b27-880b-4b60-b10a-b1eacd55a700.vhd
root     13861  0.0  0.0   6460  1408 ?        Ss   Feb20   0:00 /usr/sbin/tap-ctl list
root     13978  0.0  0.0   6460  1320 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list
root     16613  0.0  0.0   6460  1364 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list -f /var/run/sr-mount/bb728e42-dc5f-e3d9-06d2-19ae61907be6/d42dd7e9-19f4-4bbd-aa4d-6f71202e5a02.vhd
root     22087  0.0  0.0   6460  1380 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list -m 30
root     26419  0.0  0.0   6460  1344 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list -f /var/run/sr-mount/bb728e42-dc5f-e3d9-06d2-19ae61907be6/13e75fdd-261c-4f4f-a392-3560fc38e8f8.vhd
root     26435  0.0  0.0   6460  1348 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list -f /var/run/sr-mount/bb728e42-dc5f-e3d9-06d2-19ae61907be6/fba29f31-6e73-4aa9-aac9-d2bf0c87761c.vhd
root     26440  0.0  0.0   6460  1348 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list -f /var/run/sr-mount/bb728e42-dc5f-e3d9-06d2-19ae61907be6/ace40025-6aa3-4093-b28f-c6d8e0aadfb3.vhd
root     31507  0.0  0.0   6460  1320 ?        Ss   Feb20   0:00 /usr/sbin/tap-ctl list
root     31630  0.0  0.0   6460  1328 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list
root     32146  0.0  0.0   6460  1428 ?        Ss   Feb20   0:00 /usr/sbin/tap-ctl list
root     32366  0.0  0.0   6460  1380 ?        S    Feb20   0:00 /usr/sbin/tap-ctl list

 

that one process tap-ctl close -p 11858 -m 32 is related to me trying to cancel the stuck metadata backup and I think that is holding up the whole server

The scary thing is if one of the running vms on the bad server gets shutdown it will not boot on the good server either.

 

Does anyone have any ideas? I'm thinking I should kill that one process but I don't want to make the situation any worse. Thanks

 

Also that one process (11858) shows the following

root     11858  0.0  0.0      0     0 ?        Ds   Feb20   0:03 [tapdisk]
 

Is this the cause?

 

Link to comment

2 answers to this question

Recommended Posts

  • 0

Hey Tobias, no the memory on dom0 is still at 8GB but a engineer had a look today and confirmed my findings that a defunct tapdisk process is holding up the tap-ctl commands.

 

So things on the server are running and rebooting the server should fix it but we cannot shutdown or migrate the servers running on it  so at the moment I would have to cut the power to the server and crash all the vms running on it so we are not quite ready to do that so going to give citrix some time to go through all the logs and try to see if there is anything else we can try first. On the bright side everything is still running

 

It's a strange issue for sure, haven't seen anything like this before.

 

Thanks

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...