Jump to content
Welcome to our new Citrix community!
  • 0

issue with moving storage from one SR to another on xenserver 7.2


Adnan Slomic

Question

I have an  issue with live migration of the storage on xenserver 7.2.

 

I tried to move the 2 vdisks (25G each) of the same VM in one go (system and the data partitions of the same linux box), from one Synology NAS to another (NFS mounts).

 

While it took 1.5 hours to move the first 25G (system) disk, the migration of the second one seems to be stuck at 100%.

 

The VM is still in running state, but it is not responding at all.

Cannot ping it, and cannot access the console via xen-center.

I have options to gracefully or forcefully restart or shut down the box, but not sure if this is a good idea?

 

The hypervisor says that the progress is 100% but the task in the xen-center is still hanging for more than 20 hours.

 

[root@hypervisor01 log]# xe task-list
uuid ( RO)                : a466bcf4-b7b8-c1ca-a033-d330d7a9b079
          name-label ( RO): Async.VDI.pool_migrate
    name-description ( RO):
              status ( RO): pending
            progress ( RO): 1.000

 

This is what I see when I check the vdi-list:

 

#NEW_NAS (23.94GB)
uuid ( RO)                : ab5474a3-5e45-4641-ab46-c6aae57f0f66
          name-label ( RW): data_disk
    name-description ( RW): data storage
             sr-uuid ( RO): 46ef093e-8a23-b764-aa17-ed4bdcd8cd2f
        virtual-size ( RO): 26843545600
            sharable ( RO): false
           read-only ( RO): false

#OLD_NAS (146.3MB)
uuid ( RO)                : 0b461460-9714-4987-b5ad-0e40e26a80fd
          name-label ( RW): data_disk
    name-description ( RW): data storage
             sr-uuid ( RO): 6caa0c9d-f2ba-be49-0369-858293a0c135
        virtual-size ( RO): 26843545600
            sharable ( RO): false
           read-only ( RO): false

 

#NEW_NAS (19.32G)
uuid ( RO)                : 0f3b9732-9b36-4776-9ab3-4d8f6a273163
          name-label ( RW): sys_disk
    name-description ( RW): System disk
             sr-uuid ( RO): 46ef093e-8a23-b764-aa17-ed4bdcd8cd2f
        virtual-size ( RO): 26843545600
            sharable ( RO): false
           read-only ( RO): false

 

The timestamps on the NAS VDI files are indicating that it simply stopped moving at some point, before it finished, and the task is hanging since then.

 

So it seems that the task almost finished, but there are still about 150MBs left on the old NAS, for some reason.

 

There is also what it looks a full copy of a data_disk on the OLD_NAS, in the same folder as the previously mentioned one:

 

ee9e0599-c43a-4357-b71a-d7c3197fd652.vhd (24.95GB).

 

uuid ( RO)                : ee9e0599-c43a-4357-b71a-d7c3197fd652
          name-label ( RW): base copy
    name-description ( RW):
             sr-uuid ( RO): 6caa0c9d-f2ba-be49-0369-858293a0c135
        virtual-size ( RO): 26843545600
            sharable ( RO): false
           read-only ( RO): true

 

But this one is not associated with my VM in any way, it seems?

 

And when I check the list of disks associated with this VM on the hypervisor, I see this:

 

[root@eqx-hypervisor01 log]# xe vm-disk-list vm=prd-vm-01
Disk 0 VBD:
uuid ( RO)             : 7c85119b-ba33-39fa-0562-bbcae684dccb
    vm-name-label ( RO): prd-vm-01
       userdevice ( RW): 1


Disk 0 VDI:
uuid ( RO)             : 0b461460-9714-4987-b5ad-0e40e26a80fd
       name-label ( RW): data_disk
    sr-name-label ( RO): VM02 @ OLD_NAS
     virtual-size ( RO): 26843545600


Disk 1 VBD:
uuid ( RO)             : 1131dc7b-9eb7-49a0-c7ad-f3955824b382
    vm-name-label ( RO): prd-vm-01
       userdevice ( RW): 0


Disk 1 VDI:
uuid ( RO)             : 0f3b9732-9b36-4776-9ab3-4d8f6a273163
       name-label ( RW): sys_disk
    sr-name-label ( RO): VM01 @ NEW_NAS
     virtual-size ( RO): 26843545600

 

So not really sure who to trust?

 

I read articles on different forums that say that cancelling the tasks that are moving storage is not a good idea and it might result in data loss.

Also, I read articles that say that restarting xen-toolstack is also a bad idea when there are tasks that are moving storage running.

 

Any ideas where to go from here?

 

I read on a few places on the forum that the task might timeout after 24 hours, but any ideas what should I do if it doesn't timeout?

 

Any help is appreciated.

 

Thank you all in advance.

 

Best regards,

Adnan

 

Link to comment

9 answers to this question

Recommended Posts

  • 0

Thank you for a reply, Alan. I did not expect anyone to reply on Sunday afternoon (depending on the timezone).

 

From those other posts that I read earlier on different forums, and by reading the documentation available online, I could not figure out what happens when the 24h timeout does happen eventually?

 

Is the task cancelled and the VDI is rolled back to where it was or?

 

I guess I will find out soon ?

image.png.c01c8c14cd869ba1fd9479fc5eeadb35.png

Link to comment
  • 0

Thank you for a reply, Alan. I did not expect anyone to reply on Sunday afternoon (depending on the timezone).

 

Here is the list of updates that are installed, since you've mentioned it:

image.png.b8a7e22f6a945aed699118d50f0fecc4.png

 

From those other posts that I read earlier on different forums, and by reading the documentation available online, I could not figure out what happens when the 24h timeout does happen eventually?

 

Is the task cancelled and the VDI is rolled back to where it was or?

 

I guess I will find out soon ?

image.png.c01c8c14cd869ba1fd9479fc5eeadb35.png

Link to comment
  • 0

Wife and kids are busy today so I was bored and checked a few things online including the Citrix Forums. At 24 hours the task will cancel itself.

I am hoping its smart enough to do so gracefully without loss of data. There are situations/combinations where data can be lost which is why

I would not want to interrupt what it is doing. 

 

--Alan--

 

Link to comment
  • 0

Thanks a lot, again, Alan.

 

I let the task reach the timeout but to my surprise, the task was marked as completed:

image.png.aa57babad414c49676edc8c9fd22c73e.png

a few milliseconds after the timeout.

 

But the VM is still hanging, and unreachable. It still has this icon in xen-center: image.png.300c3b866f22570c664d04c7f4bb1fa5.png.

 

Now that all the tasks on the hyersvisor are gone (task-list returns nothing), I assume it is safe to do a xen-toolstack restart for this server, and after, if I understood everything correctly, this VM should be down, and I should be able to power it on?

 

What do you think?

 

Sorry to bug you with this problem during the weekend, but I am not really experienced with xen server, and this is very important production VM for us...

 

Thanks again for helping me out.

 

Best regards,

Adnan

Link to comment
  • 0

Okay, the good news is that the VM recovered. After restarting the toolstack on host server, the VM was indeed shut down, and after I started it, it is up and running again!

 

What I still find weird is that it seems like there is still something happening with the disk that I was trying to migrate.

Looks like the VM is still using the data disk on the old NAS (as if the migration did not happen at all) and when I look at the NAS itself, the disk that is mounted to this VM is growing! 

In my original post I wrote that it is taking only 146MB, and since I started it grew to 1.47GB.

 

image.thumb.png.902e20600fb27b0fdf2479456e58fbc5.png

 

I cannot really explain this, except that maybe it is still rolling back the move operation?

 

Maybe the second disk on the list (the base copy one) has all the actual data, and now it is backfilling the disk that is actually mounted and used?

 

Anyways, I am really happy that I was able to restore the VM. Huh, that was a difficult one!

No more live migrations for me. From now on I will copy the VMs and then shut down and destroy the old ones, once I confirm that the copy is good! ?

 

Thanks again Alan for being moral support, and for sharing your knowledge.

 

Have a good week ahead.

 

Best regards,

Adnan

 

 

 

 

Link to comment
  • 0

That vhd size difference is weird. I've had pretty good luck with storage migrations, but when it goes South it really goes South.

Personally I export/import as much as I can and if I am forced to do a live migration I do a VM full copy first and make sure my

backups are good just so I have a good point to return too. You can never be too cautious/careful.

 

--Alan--

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...