Jump to content
Welcome to our new Citrix community!
  • 0

Cross-pool Storage XenMotion migration does not work on identical servers


SERGII KRIUCHATOV

Question

Hi there.

Need community help befor the call of Citrix support

 

Hardware and software description 

 

I have 2 almost identical Servers ProLiant DL325 Gen10

 

  • AMD EPYC 7402P 24-Core Processor
  • 128GB DRAM
  • 128GB SATA + 2TB NVMe SSDs
  • HP Ethernet 1Gb 2-port 361T Adapter
  • HP FlexFabric 10Gb 2-port 533FLR-T Adapter

 

Only one difference is NVIDIA Tesla T4 in latter (2nd) server

 

  • BIOS and Firmware identical 
  • Citrix Hypervisor 8.2 + Latest pathes (upgrade from 8.1 Express)
  • Licensed by Citrix Virtual Apps and Desktops perpetual License
  • Symmetrically connected to 2 10G switches using bonds on 10G adapters
  • System time is consistent (used Centos 8 storage appliance as NTP server)
  • Management interface on single 1G adapter
  • HA and WLB not used
  • Enough free memory and free place on SSDs

 

Workload is 2 Windows 2019 Servers (Domain + CVAD) + 10 Windows 10 VDIs + some auxilliary Centos VMs
Microsoft office + MS SQL

Expirience 10+ years since Xenservr 5.5

 

Preface

 

Workload Succesfully lived on the 1st server (without NVIDIA)
Everything went more or less smooth till the 2nd server arrived.

Idea was very simple - unite the servers into pool, move the VDI to server with NVIDIA vGPU to offload the CPU.

 

Implementation procedure was quite simple

  1. Citrix Hypervisor 8.1 was installed on 2nd server
  2. All VMs migrated from 1st server to 2nd
  3. 1st server patched to Citrix hypervisor 8.2 + latest patch
  4. All VMs migrated back to 1st server
  5. 2nd server patched to Citrix hypervisor 8.2 + latest patch + NVIDIA vGPU supplement patch
  6. And then many attempts to add 2nd server to inital pool with 1st server failed. 

 

Problem 1: 2nd server denied any attemps to join  the existing pool without even minimal error messages.

 

To be honesty i'd used XenCenter and didn't tried xe CLI.

 

Later, I didn't want to waste the much time in fight with pools and decided Just to check all the benefits from NVIDIA vGPUs.

I'd decided to leave 2 different pools (one server in each), migrate all VDI VMs on 2nd with NVIDIA and check the workload.


Finally, i could move only 5 VMs (3 Windows + 2 Centos)


Problem 2: After that any attempts to move VMs between pools were failed.


Nothing is happened, neither from XenCenter nor from xe CLI.
I used the XenOrchestra Community appliance for everyday work, XOA also can't move anything.

 

below command with all corresponded wistles and blowes 

xe vm-migrate uuid= --live ... blah blah blah

 
gives only one message

Performing a Storage XenMotion migration. Your VM's VDIs will be migrated with the VM.

xensource.log containes only one message

Aug 17 17:53:10 EPYC1 xapi: [ info||121302 UNIX /var/lib/xcp/xapi||cli] xe vm-migrate uuid=cebcc471-d230-44da-c15f-3f54a4c0a8f7 live=true remote-master=<EPYC2-IP> remote-username=root remote-password=(omitted) vif:33c8eeb0-8490-4e54-9954-179f688e8dec=daf28b9f-1a21-951b-a131-8db2cc0c553b username=root password=(omitted)

SMlog, kern.log, daemon.log don't contatine anything specific.

Target server as well, nothin about errors.

No move, no traffic on switches, anything.
 

Please help me point out what i could miss, what I have to check more, etc

Thanks in advnace.

Link to comment

Recommended Posts

  • 0

Finally, I've opened the case on Citrix.

Upon the SSR uploaded analyze showed the problem - MTU on the management interfaces.

 

It seems definitely my falls - MTU was 9126, too big.

Just after the MTU has been reconfigured back to 1500 everything started to work.

 

Citrix Recommendations:  

Detected that one or more XenServer Management Interfaces are configured with an MTU higher than 1500. This is unsupported as higher transmit units, such as Jumbo Frames, are specific to storage and or other non-management interfaces.
Reconfigure management interfaces as to use an MTU of 1500. This is the supported configuration.

https://support.citrix.com/article/CTX134880

 

Thanks to everyone trying to help.

  • Like 1
Link to comment
  • 0

Are you using local storage? Because I don't read anything about storage in description. If you using local storage you can't use xenmotion.
When migrating a VM with XenMotion or Storage XenMotion, the new VM and server must meet the following compatibility requirements:

    XenServer Tools must be installed on each VM that you wish to migrate.
    The destination server must have the same or a more recent version of XenServer installed as the source.
    For Storage XenMotion, if the CPUs on the source and destination server are different, the destination server must provide at least the entire feature set as the source server’s CPU. Consequently, it is unlikely to be possible to move a VM between, for example, AMD and Intel processors.
    For Storage XenMotion, VMs with more than six attached VDIs cannot be migrated.
    The target server must have sufficient spare memory capacity or be able to free sufficient capacity using Dynamic Memory Control. If there is not enough memory, the migration will fail to complete.
    For Storage XenMotion, the target storage must have enough free disk space (for the VM and its snapshot) available for the incoming VMs. If there is not enough space, the migration will fail to complete.

When you trying to join server to an existing pool and fails, that's because some misconfiguration.

Pls check if you are using local storage the thing you want to do is not posible, if you are using external storage pls re-check configurations and multipath options.

Link to comment
  • 0

Thanks for your replay.

 

Sorry to was not precise enough about storage configuration.

 

Both servers have almost identical storages - local SATA storage (70GB) + local ETX3 NVMe storage (2TB) + shared iSCSI storage (1TB).

All storages have enough free space (no more than 20% occupied)

 

In my case XenMotion or Storage XenMotion Procedure didn't depends on which type of storage VM resides - i'd tryed both local and shared.

Result the same - nothing is happen.

 

I've read the manual regarding Storage XenMotion requirements.

 

There is no anything outside the mentioned requirements: servers identical with identical software, VMs have indentical XenTools (8.1 - is the root ????), only one VDI attached.

 

4 hours ago, Nicolaacutes Ventre said:

 

Pls check if you are using local storage the thing you want to do is not posible, if you are using external storage pls re-check configurations and multipath options.

Just curiouse how i did that with all my VMs before the upgrade to 8.2LTSB?

I moved it freely between pools and VMs were on the local NVMe based storages.

 

May be this is the root issue - something changed during upgrade from 8.1 to 8.2.

 

BTW

Quote from documentation:

https://docs.citrix.com/en-us/citrix-hypervisor/vms/migrate.html

 

Quote

Storage live migration also allows VMs to be moved from one host to another, where the VMs are not on storage shared between the two hosts. As a result, VMs stored on local storage can be migrated without downtime and VMs can be moved from one pool to another

 

Edited by skriuch118
UPDATE
Link to comment
  • 0
6 minutes ago, Nicolaacutes Ventre said:

I think this is the problem or maybe a bug.
If you migrate the virtual machine while shut down, does it work?

No, i can't migrate VMs between pools at all.

Either On or off.

Nevertheless the storage - local or shared.

 

Inside the pool (Intrapool) i could move VMs between storages freely - either live or off.

 

Link to comment
  • 0
50 minutes ago, Tobias Kreidl said:

Are the times properly synchronized on all these servers? Is there enough free space on both SRs? As mentioned above, migration from local storage may not work. Also make sure all hotfixes are the same on both pools?

 

Time syncronization was my biggest concern.

But seems consitent now

[root@EPYC1 ~]# chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* 10.1.0.5                      2   9   377    82    +27us[  +57us] +/- 3210us

[root@epyc2 ~]# chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* 10.1.0.5                      2  10   377     6    +66us[  +90us] +/- 3307us

As i told above - both local and shared SRs have more than 1TB of free space.

Mean VDI of VMs is not above 100GB.

So space is far enough.

 

Both pools have :

 

EPYC1

Fully applied:

CH82 (version 1.0)
XS82E001 (version 1.0)

 

EPYC2

Fully applied:

CH82 (version 1.0)
NVIDIA-vGPU-xenserver-450.55 (version 450.55)
XS82E001 (version 1.0)

 

I have moved the test Linux VM to the iSCSI shared SR just to check.

Migration Wizard stuck on the Waiting.... and didn't allow to go to the next step.

 

image.thumb.png.678f6e1181416d2a0fff5cb1a6f47bf9.pngW

 

xe cli also stuck on the start

[root@EPYC1 ~]# xe vm-migrate uuid=d26a45e7-cf8d-2c59-e179-c04827b2fec2 --live remote-master=10.1.0.38 remote-username=<> remote-password=<> vif:33c8eeb0-8490-4e54-9954-179f688e8dec=daf28b9f-1a21-951b-a131-8db2cc0c553b
Performing a Storage XenMotion migration. Your VM's VDIs will be migrated with the VM.

[root@EPYC1 ~]# tail -F -n 500 /var/log/xensource.log | grep migrat
Aug 18 18:47:10 EPYC1 xapi: [ info||217957 UNIX /var/lib/xcp/xapi||cli] xe vm-migrate uuid=d26a45e7-cf8d-2c59-e179-c04827b2fec2 live=true remote-master=10.1.0.38 remote-username=root remote-password=(omitted) vif:33c8eeb0-8490-4e54-9954-179f688e8dec=daf28b9f-1a21-951b-a131-8db2cc0c553b username=root password=(omitted)

Really have no idea what to check also.

 

 

Link to comment
  • 0
4 minutes ago, Tobias Kreidl said:

No update to the status?: xe task-list

AFAIU nothing related to xe vm-migrate :(

[root@EPYC1 ~]# xe task-list
uuid ( RO)                : 23165cfa-b487-aa7a-db99-dfda56c95f81
          name-label ( RO): Connection to VM console
    name-description ( RO):
              status ( RO): pending
            progress ( RO): 0.000


uuid ( RO)                : 524f83df-0436-77c9-32d2-ac01be45add4
          name-label ( RO): VM.assert_can_migrate
    name-description ( RO):
              status ( RO): pending
            progress ( RO): 0.000

VM.assert_can_migrate hangs from the Migration Wizard.

Will gone quetly after some hours.

Link to comment
  • 0
15 minutes ago, Tobias Kreidl said:

Seems stuck! Weird. Sure there is enough free space on both servers and that this isn't with local storage anywhere?

 

All VMs are on the local storage.

 

Exactly tested VM moved recetly to the iSCSI storage with plenty of free space.

 

Please below listing to check the all configuration

[root@EPYC1 ~]# xe vm-list uuid=d26a45e7-cf8d-2c59-e179-c04827b2fec2 params=all
uuid ( RO)                                  : d26a45e7-cf8d-2c59-e179-c04827b2fec2
                            name-label ( RW): MYVM-XOA.COMMUNITY
                      name-description ( RW): Ubuntu Bionic Beaver 18.04 (1) username: sergii passwd: sergii1!
                          user-version ( RW): 1
                         is-a-template ( RW): false
                   is-default-template ( RW): false
                         is-a-snapshot ( RO): false
                           snapshot-of ( RO): <not in database>
                             snapshots ( RO):
                         snapshot-time ( RO): 19700101T00:00:00Z
                         snapshot-info ( RO):
                                parent ( RO): <not in database>
                              children ( RO):
                     is-control-domain ( RO): false
                           power-state ( RO): running
                         memory-actual ( RO): 4294963200
                         memory-target ( RO): <expensive field>
                       memory-overhead ( RO): 37748736
                     memory-static-max ( RW): 4294967296
                    memory-dynamic-max ( RW): 4294967296
                    memory-dynamic-min ( RW): 4294967296
                     memory-static-min ( RW): 536870912
                      suspend-VDI-uuid ( RW): <not in database>
                       suspend-SR-uuid ( RW): <not in database>
                          VCPUs-params (MRW): weight: 256
                             VCPUs-max ( RW): 2
                      VCPUs-at-startup ( RW): 2
                actions-after-shutdown ( RW): Destroy
                  actions-after-reboot ( RW): Restart
                   actions-after-crash ( RW): Restart
                         console-uuids (SRO): 818f488b-3fe5-1027-5ea9-64631abfeab8
                                   hvm ( RO): true
                              platform (MRW): timeoffset: 2; videoram: 8; hpet: true; secureboot: false; device-model: qemu-upstream-compat; apic: tr                                                                                     ue; device_id: 0001; vga: std; nx: true; pae: true; viridian: false; acpi: 1; cores-per-socket: 2
                    allowed-operations (SRO): changing_dynamic_range; migrate_send; pool_migrate; changing_VCPUs_live; suspend; hard_reboot; hard_shu                                                                                     tdown; clean_reboot; clean_shutdown; pause; checkpoint; snapshot
                    current-operations (SRO):
                    blocked-operations (MRW):
                   allowed-VBD-devices (SRO): <expensive field>
                   allowed-VIF-devices (SRO): <expensive field>
                        possible-hosts ( RO): <expensive field>
                           domain-type ( RW): hvm
                   current-domain-type ( RO): hvm
                       HVM-boot-policy ( RW): BIOS order
                       HVM-boot-params (MRW): order: dc; firmware: bios
                 HVM-shadow-multiplier ( RW): 1.000
                             PV-kernel ( RW):
                            PV-ramdisk ( RW):
                               PV-args ( RW):
                        PV-legacy-args ( RW):
                         PV-bootloader ( RW):
                    PV-bootloader-args ( RW):
                   last-boot-CPU-flags ( RO): vendor: AuthenticAMD; features: 178bfbff-f6f83203-2fd3fbff-040005f7-0000000f-219c01a9-00400004-00000000                                                                                     -00001005-00000000-00000000-00000000-00000000-00000000-00000000
                      last-boot-record ( RO): <expensive field>
                           resident-on ( RO): 96c370a0-359a-4671-bf15-52ee9f52cfc5
                              affinity ( RW): <not in database>
                          other-config (MRW): import_task: OpaqueRef:b4acb91a-1882-4b81-b97f-8ac4b23ceb04; auto_poweron: true; base_template_name: Ub                                                                                     untu Bionic Beaver 18.04; mac_seed: 65b99061-411e-cd9e-8b8b-fd379dbaefc8; install-methods: cdrom,nfs,http,ftp; linux_template: true
                                dom-id ( RO): 21
                       recommendations ( RO): <restrictions><restriction field="memory-static-max" max="1649267441664"/><restriction field="vcpus-max                                                                                     " max="32"/><restriction field="has-vendor-device" value="false"/><restriction field="allow-gpu-passthrough" value="1"/><restriction field="allow-vgp                                                                                     u" value="1"/><restriction field="allow-network-sriov" value="1"/><restriction field="supports-bios" value="yes"/><restriction field="supports-uefi"                                                                                      value="no"/><restriction field="supports-secure-boot" value="no"/><restriction max="255" property="number-of-vbds"/><restriction max="7" property="nu                                                                                     mber-of-vifs"/></restrictions>
                         xenstore-data (MRW): vm-data: ; vm-data/mmio-hole-size: 268435456
            ha-always-run ( RW) [DEPRECATED]: false
                   ha-restart-priority ( RW):
                                 blobs ( RO):
                            start-time ( RO): 20200816T09:16:31Z
                          install-time ( RO): 19700101T00:00:00Z
                          VCPUs-number ( RO): 2
                     VCPUs-utilisation (MRO): <expensive field>
                            os-version (MRO): name: Ubuntu 18.04.4 LTS; uname: 4.15.0-112-generic; distro: ubuntu; major: 18; minor: 04
                    PV-drivers-version (MRO): major: 8; minor: 0; micro: 50; build: 1
    PV-drivers-up-to-date ( RO) [DEPRECATED]: true
                                memory (MRO):
                                 disks (MRO):
                                  VBDs (SRO): bbecac64-8a81-daa0-2b34-9982d1c08cc5; 89296c5a-8435-1411-24cf-e30bc67d0464
                              networks (MRO): 0/ip: 10.1.0.56; 0/ipv4/0: 10.1.0.56; 0/ipv6/0: fe80::c41:deff:fea0:caca
                   PV-drivers-detected ( RO): true
                                 other (MRO): platform-feature-multiprocessor-suspend: 1; has-vendor-device: 0; feature-suspend: 1; feature-poweroff:                                                                                      1; feature-reboot: 1; feature-vcpu-hotplug: 1; feature-balloon: 1
                                  live ( RO): true
            guest-metrics-last-updated ( RO): 20200818T15:43:08Z
                   can-use-hotplug-vbd ( RO): unspecified
                   can-use-hotplug-vif ( RO): unspecified
              cooperative ( RO) [DEPRECATED]: <expensive field>
                                  tags (SRW):
                             appliance ( RW): <not in database>
                     snapshot-schedule ( RW): <not in database>
                      is-vmss-snapshot ( RO): false
                           start-delay ( RW): 0
                        shutdown-delay ( RW): 0
                                 order ( RW): 0
                               version ( RO): 3
                         generation-id ( RO):
             hardware-platform-version ( RO): 0
                     has-vendor-device ( RW): false
                       requires-reboot ( RO): false
                       reference-label ( RO): ubuntu-18.04
                          bios-strings (MRO): bios-vendor: Xen; bios-version: ; system-manufacturer: Xen; system-product-name: HVM domU; system-versi                                                                                     on: ; system-serial-number: ; baseboard-manufacturer: ; baseboard-product-name: ; baseboard-version: ; baseboard-serial-number: ; baseboard-asset-tag                                                                                     : ; baseboard-location-in-chassis: ; enclosure-asset-tag: ; hp-rombios: ; oem-1: Xen; oem-2: MS_VM_CERT/SHA1/bdbeb6e0a816d43fa6d3fe8aaef04c2bad9d3e3d


[root@EPYC1 ~]# xe vbd-list uuid=bbecac64-8a81-daa0-2b34-9982d1c08cc5
uuid ( RO)             : bbecac64-8a81-daa0-2b34-9982d1c08cc5
          vm-uuid ( RO): d26a45e7-cf8d-2c59-e179-c04827b2fec2
    vm-name-label ( RO): MYVM-XOA.COMMUNITY
         vdi-uuid ( RO): fcef9b08-e945-4a1b-9a06-e4dbe78bea0f
            empty ( RO): false
           device ( RO): xvda

[root@EPYC1 ~]# xe vbd-list uuid=89296c5a-8435-1411-24cf-e30bc67d0464
uuid ( RO)             : 89296c5a-8435-1411-24cf-e30bc67d0464
          vm-uuid ( RO): d26a45e7-cf8d-2c59-e179-c04827b2fec2
    vm-name-label ( RO): MYVM-XOA.COMMUNITY
         vdi-uuid ( RO): <not in database>
            empty ( RO): true
           device ( RO): xvdd

[root@EPYC1 ~]# xe vdi-list uuid=fcef9b08-e945-4a1b-9a06-e4dbe78bea0f
uuid ( RO)                : fcef9b08-e945-4a1b-9a06-e4dbe78bea0f
          name-label ( RW): MYVM-XOA.COMMUNITY_10G
    name-description ( RW): MYVM-XOA.COMMUNITY_10G
             sr-uuid ( RO): 52fcd2e1-8223-6e06-ebcb-7b833c61c78e
        virtual-size ( RO): 10737418240
            sharable ( RO): false
           read-only ( RO): false

[root@EPYC1 ~]# xe sr-list uuid=52fcd2e1-8223-6e06-ebcb-7b833c61c78e params=all
uuid ( RO)                    : 52fcd2e1-8223-6e06-ebcb-7b833c61c78e
              name-label ( RW): iSCSI.RAID5
        name-description ( RW): iSCSI SR [10.10.0.5 (*; LUN 0: dac86df8-1bdb-4b70-9f46-68dc13b98047: 1.8 TB (LIO-ORG))]
                    host ( RO): EPYC1
      allowed-operations (SRO): VDI.enable_cbt; VDI.list_changed_blocks; unplug; plug; PBD.create; VDI.disable_cbt; update; PBD.destroy; VDI.resize; VDI.clone; VDI.data_destroy; scan; VDI.snapshot; VDI.mirror; VDI.create; VDI.destroy; VDI.set_on_boot
      current-operations (SRO):
                    VDIs (SRO): fcef9b08-e945-4a1b-9a06-e4dbe78bea0f; bbddfca2-45a5-4924-bd39-3e51d599ca16; 8b6ae75b-cc45-4a69-8553-b4dd6b296a81
                    PBDs (SRO): b149a48a-84bc-3b7d-4927-9028c797fd18
      virtual-allocation ( RO): 118111600640
    physical-utilisation ( RO): 107613257728
           physical-size ( RO): 2000330686464
                    type ( RO): lvmoiscsi
            content-type ( RO):
                  shared ( RW): true
           introduced-by ( RO): <not in database>
             is-tools-sr ( RO): false
            other-config (MRW):
               sm-config (MRO): allocation: thick; use_vhd: true; multipathable: true; devserial: scsi-36001405dac86df81bdb4b709f4668dc1
                   blobs ( RO):
     local-cache-enabled ( RO): false
                    tags (SRW):
               clustered ( RO): false

[root@EPYC1 ~]# xe pbd-list uuid=b149a48a-84bc-3b7d-4927-9028c797fd18 params=all
uuid ( RO)                  : b149a48a-84bc-3b7d-4927-9028c797fd18
     host ( RO) [DEPRECATED]: 96c370a0-359a-4671-bf15-52ee9f52cfc5
             host-uuid ( RO): 96c370a0-359a-4671-bf15-52ee9f52cfc5
       host-name-label ( RO): EPYC1
               sr-uuid ( RO): 52fcd2e1-8223-6e06-ebcb-7b833c61c78e
         sr-name-label ( RO): iSCSI.RAID5
         device-config (MRO): port: 3260; SCSIid: 36001405dac86df81bdb4b709f4668dc1; targetIQN: *; multihomelist: 10.11.0.5:3260,10.10.0.5:3260; target: 10.10.0.5; multiSession: 10.10.0.5,3260,iqn.2020-05.disti.pro:sata-raid5|10.11.0.5,3260,iqn.2020-05.disti.pro:sata-raid5|
    currently-attached ( RO): true
          other-config (MRW): iscsi_sessions: 2; mpath-36001405dac86df81bdb4b709f4668dc1: [2, 2]; multipathed: true; storage_driver_domain: OpaqueRef:9b2a0e6f-d981-4042-8c48-7c29709315eb

Below picture shows overall storage usage for both pools.

 

image.thumb.png.5219cf1e77669468300bed983be9d58f.png

Edited by skriuch118
Added picture of SRs
Link to comment
  • 0
7 hours ago, Tobias Kreidl said:

Can you at least export it? You can try pushing it to /dev/null so that you don't need to store the actual output.

 

Everything inside the pool works fine.

Export/Import, backup and snapshotting either from XenCenter or XOA looking good.

I could export VM and import it back nevertheless the used SR.

 

It seems the problem in communication between the pools.

Will check the initial process by TCPdump.

 

...for command 

 

[root@EPYC1 ~]# xe vm-migrate uuid=d26a45e7-cf8d-2c59-e179-c04827b2fec2 --live remote-master=10.1.0.38 remote-username=root remote-password=(omitted) vif:33c8eeb0-8490-4e54-9954-179f688e8dec=daf28b9f-1a21-951b-a131-8db2cc0c553b
Performing a Storage XenMotion migration. Your VM's VDIs will be migrated with the VM.

I have below result

[root@epyc2 ~]# tcpdump -i xenbr3 ip and host 10.1.0.37
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on xenbr3, link-type EN10MB (Ethernet), capture size 262144 bytes
09:04:06.040550 IP EPYC1.disti.pro.45362 > EPYC2.disti.pro.https: Flags [S], seq 4293608450, win 27528, options [mss 9176,sackOK,TS val 310032521 ecr 0,nop,wscale 7], length 0
09:04:06.040624 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45362: Flags [S.], seq 3346915181, ack 4293608451, win 27492, options [mss 9176,sackOK,TS val 2512943826 ecr 310032521,nop,wscale 7], length 0
09:04:06.041028 IP EPYC1.disti.pro.45362 > EPYC2.disti.pro.https: Flags [.], ack 1, win 216, options [nop,nop,TS val 310032522 ecr 2512943826], length 0
09:04:06.041476 IP EPYC1.disti.pro.45362 > EPYC2.disti.pro.https: Flags [P.], seq 1:167, ack 1, win 216, options [nop,nop,TS val 310032522 ecr 2512943826], length 166
09:04:06.041505 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45362: Flags [.], ack 167, win 224, options [nop,nop,TS val 2512943827 ecr 310032522], length 0
09:04:06.045391 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45362: Flags [P.], seq 1:1209, ack 167, win 224, options [nop,nop,TS val 2512943831 ecr 310032522], length 1208
09:04:06.045585 IP EPYC1.disti.pro.45362 > EPYC2.disti.pro.https: Flags [.], ack 1209, win 234, options [nop,nop,TS val 310032527 ecr 2512943831], length 0
09:04:06.049458 IP EPYC1.disti.pro.45362 > EPYC2.disti.pro.https: Flags [P.], seq 167:381, ack 1209, win 234, options [nop,nop,TS val 310032530 ecr 2512943831], length 214
09:04:06.052011 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45362: Flags [P.], seq 1209:1523, ack 381, win 232, options [nop,nop,TS val 2512943837 ecr 310032530], length 314
09:04:06.052768 IP EPYC1.disti.pro.45362 > EPYC2.disti.pro.https: Flags [P.], seq 381:818, ack 1523, win 253, options [nop,nop,TS val 310032534 ecr 2512943837], length 437
09:04:06.090073 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45362: Flags [P.], seq 1523:2072, ack 818, win 240, options [nop,nop,TS val 2512943875 ecr 310032534], length 549
09:04:06.091648 IP EPYC1.disti.pro.45362 > EPYC2.disti.pro.https: Flags [P.], seq 818:903, ack 2072, win 272, options [nop,nop,TS val 310032573 ecr 2512943875], length 85
09:04:06.091897 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45362: Flags [P.], seq 2072:2157, ack 903, win 240, options [nop,nop,TS val 2512943877 ecr 310032573], length 85
09:04:06.091903 IP EPYC1.disti.pro.45362 > EPYC2.disti.pro.https: Flags [F.], seq 903, ack 2072, win 272, options [nop,nop,TS val 310032573 ecr 2512943875], length 0
09:04:06.092023 IP EPYC1.disti.pro.45362 > EPYC2.disti.pro.https: Flags [R], seq 4293609353, win 0, length 0
09:04:06.092030 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45362: Flags [F.], seq 2157, ack 904, win 240, options [nop,nop,TS val 2512943877 ecr 310032573], length 0
09:04:06.092065 IP EPYC2.disti.pro > EPYC1.disti.pro: ICMP host EPYC2.disti.pro unreachable - admin prohibited, length 48
09:04:06.092096 IP EPYC1.disti.pro.45362 > EPYC2.disti.pro.https: Flags [R], seq 4293609354, win 0, length 0
09:04:06.195522 IP EPYC1.disti.pro.45364 > EPYC2.disti.pro.https: Flags [S], seq 1515053284, win 27528, options [mss 9176,sackOK,TS val 310032676 ecr 0,nop,wscale 7], length 0
09:04:06.195593 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [S.], seq 2704587866, ack 1515053285, win 27492, options [mss 9176,sackOK,TS val 2512943981 ecr 310032676,nop,wscale 7], length 0
09:04:06.195768 IP EPYC1.disti.pro.45364 > EPYC2.disti.pro.https: Flags [.], ack 1, win 216, options [nop,nop,TS val 310032677 ecr 2512943981], length 0
09:04:06.196296 IP EPYC1.disti.pro.45364 > EPYC2.disti.pro.https: Flags [P.], seq 1:167, ack 1, win 216, options [nop,nop,TS val 310032677 ecr 2512943981], length 166
09:04:06.196319 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], ack 167, win 224, options [nop,nop,TS val 2512943981 ecr 310032677], length 0
09:04:06.200434 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [P.], seq 1:1209, ack 167, win 224, options [nop,nop,TS val 2512943986 ecr 310032677], length 1208
09:04:06.200596 IP EPYC1.disti.pro.45364 > EPYC2.disti.pro.https: Flags [.], ack 1209, win 234, options [nop,nop,TS val 310032681 ecr 2512943986], length 0
09:04:06.205922 IP EPYC1.disti.pro.45364 > EPYC2.disti.pro.https: Flags [P.], seq 167:381, ack 1209, win 234, options [nop,nop,TS val 310032687 ecr 2512943986], length 214
09:04:06.208515 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [P.], seq 1209:1523, ack 381, win 232, options [nop,nop,TS val 2512943994 ecr 310032687], length 314
09:04:06.209593 IP EPYC1.disti.pro.45364 > EPYC2.disti.pro.https: Flags [P.], seq 381:738, ack 1523, win 253, options [nop,nop,TS val 310032691 ecr 2512943994], length 357
09:04:06.212230 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [P.], seq 1523:1800, ack 738, win 240, options [nop,nop,TS val 2512943997 ecr 310032691], length 277
09:04:06.212394 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], seq 1800:10964, ack 738, win 240, options [nop,nop,TS val 2512943998 ecr 310032691], length 9164
09:04:06.212488 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [P.], seq 10964:18269, ack 738, win 240, options [nop,nop,TS val 2512943998 ecr 310032691], length 7305
09:04:06.212555 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], seq 18269:27433, ack 738, win 240, options [nop,nop,TS val 2512943998 ecr 310032691], length 9164
09:04:06.212838 IP EPYC1.disti.pro.45364 > EPYC2.disti.pro.https: Flags [.], ack 1800, win 415, options [nop,nop,TS val 310032694 ecr 2512943997,nop,nop,sack 1 {10964:18269}], length 0
09:04:06.212866 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], seq 27433:36597, ack 738, win 240, options [nop,nop,TS val 2512943998 ecr 310032694], length 9164
09:04:06.212875 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [P.], seq 36597:37527, ack 738, win 240, options [nop,nop,TS val 2512943998 ecr 310032694], length 930
09:04:06.213084 IP EPYC1.disti.pro.45364 > EPYC2.disti.pro.https: Flags [.], ack 1800, win 434, options [nop,nop,TS val 310032694 ecr 2512943997,nop,nop,sack 2 {36597:37527}{10964:18269}], length 0
09:04:06.213110 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], seq 1800:10964, ack 738, win 240, options [nop,nop,TS val 2512943998 ecr 310032694], length 9164
09:04:06.228933 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], seq 18269:27433, ack 738, win 240, options [nop,nop,TS val 2512944014 ecr 310032694], length 9164
09:04:06.424965 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], seq 1800:10964, ack 738, win 240, options [nop,nop,TS val 2512944210 ecr 310032694], length 9164
09:04:06.860952 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], seq 1800:10964, ack 738, win 240, options [nop,nop,TS val 2512944646 ecr 310032694], length 9164
09:04:07.692965 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], seq 1800:10964, ack 738, win 240, options [nop,nop,TS val 2512945478 ecr 310032694], length 9164
09:04:09.356957 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], seq 1800:10964, ack 738, win 240, options [nop,nop,TS val 2512947142 ecr 310032694], length 9164
09:04:12.844957 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], seq 1800:10964, ack 738, win 240, options [nop,nop,TS val 2512950630 ecr 310032694], length 9164
09:04:19.500982 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], seq 1800:10964, ack 738, win 240, options [nop,nop,TS val 2512957286 ecr 310032694], length 9164
09:04:32.812985 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], seq 1800:10964, ack 738, win 240, options [nop,nop,TS val 2512970598 ecr 310032694], length 9164
09:04:48.941015 IP EPYC2.disti.pro.https > EPYC1.disti.pro.44624: Flags [.], seq 1779472646:1779481810, ack 611411911, win 240, options [nop,nop,TS val 2512986727 ecr 309741631], length 9164
09:04:59.180941 IP EPYC2.disti.pro.https > EPYC1.disti.pro.45364: Flags [.], seq 1800:10964, ack 738, win 240, options [nop,nop,TS val 2512996967 ecr 310032694], length 9164
09:05:27.852990 IP EPYC2.disti.pro.https > EPYC1.disti.pro.44402: Flags [.], seq 1902969576:1902978740, ack 41217107, win 240, options [nop,nop,TS val 2513025639 ecr 309658334], length 9164

Only one things looks suspicious

 

09:04:06.092065 IP EPYC2.disti.pro > EPYC1.disti.pro: ICMP host EPYC2.disti.pro unreachable - admin prohibited, length 48

But pings and SSHs between pools are definitely working

Edited by skriuch118
add TCPDUMP listening
Link to comment
  • 0
8 minutes ago, Tobias Kreidl said:

XenTools are properly installed? That's needed for migration to work.

 

Yes, updated to the latest Linux Guest Tools from CH 8.2

 

BTW, i have disabled the iptables.

Few times i could move forward in xe cli vm-migrate, but still all tasks hangs on 0%.

 

For now both pools in the same state - no tasks for migration after xe cli created.

 

At this time checklist before migration looks like

 

  1. VM VDI is on the shared SR
  2. Time in pools consistent
  3. All SRs have enough space
  4. All pool-wide networks accessible between hosts
  5. iptables disabled
  6. XenTools is latest

Looks quite magic

Link to comment
  • 0
42 minutes ago, Tobias Kreidl said:

Nothing more in /etc/logs/SMlog that might shed a clue? The license server sees these hosts and is happy?

 

Because there is no task created after the vm-migrate, SMLog is clean

 

Licences seems quite good, at least from hypervisor point of view

 

image.thumb.png.6a725b8eac7aa45adfdd0617cd211ddd.png

 

Honestly have no idea how to check the licences from licence server side (it is the part of CVAD server)

 

Link to comment
  • 0

This morning started from clean reboot of both pools.

 

I've changed the IP of managed interfaces just to avoid any possible duplication in the network.

 

ping checked - OK

time checked - OK

DNS resolution checked - OK

Licences checked - OK

 

Then simple vm-migrate initiated

Everything stuck on 

Performing a Storage XenMotion migration. Your VM's VDIs will be migrated with the VM.

 

xe task-list showed nothing

 

Thorough inspection of all possible logs

 

tail -F -n 2000 /var/log/xensource.log - nothing
tail -F -n 2000 /var/log/SMlog - nothing
tail -F -n 2000 /var/log/daemon.log - nothing
tail -F -n 2000 /var/log/audit.log - nothing
tail -F -n 2000 /var/log/kern.log - nothing
tail -F -n 2000 /var/log/secure - something found

 

Aug 20 09:02:02 is the date and time of vm-migrate task start

 

Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: stunnel 5.56 on x86_64-redhat-linux-gnu platform
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: Compiled with OpenSSL 1.1.1c FIPS  28 May 2019
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: Running  with OpenSSL 1.1.1d-fips  10 Sep 2019
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: Threading:PTHREAD Sockets:POLL,IPv6 TLS:ENGINE,FIPS,OCSP,SNI Auth:LIBWRAP
Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: errno: (*__errno_location ())
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: Reading configuration from descriptor 60
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: UTF-8 byte order mark not detected
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: FIPS mode disabled
Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: No PRNG seeding was required
Aug 20 09:02:02 EPYC1 stunnel: LOG6[ui]: Initializing inetd mode configuration
Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: Ciphers: ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-GCM-SHA384:AES256-SHA256:AES128-SHA256
Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: TLSv1.3 ciphersuites: TLS_CHACHA20_POLY1305_SHA256:TLS_AES_256_GCM_SHA384:TLS_AES_128_GCM_SHA256
Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: TLS options: 0x02100004 (+0x00000000, -0x00000000)
Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: No certificate or private key specified
Aug 20 09:02:02 EPYC1 stunnel: LOG4[ui]: Service [stunnel] needs authentication to prevent MITM attacks
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: Configuration successful
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Service [stunnel] started
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Setting local socket options (FD=0)
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Option TCP_NODELAY not supported on local socket
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Setting local socket options (FD=1)
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Option TCP_NODELAY not supported on local socket
Aug 20 09:02:02 EPYC1 stunnel: LOG5[0]: Service [stunnel] accepted connection from unnamed socket
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: s_connect: connecting 10.1.0.39:443
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: s_connect: s_poll_wait 10.1.0.39:443: waiting 10 seconds
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: FD=6 events=0x2001 revents=0x0
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: FD=3 events=0x2005 revents=0x0
Aug 20 09:02:02 EPYC1 stunnel: LOG5[0]: s_connect: connected 10.1.0.39:443
Aug 20 09:02:02 EPYC1 stunnel: LOG5[0]: Service [stunnel] connected remote server from 10.1.0.41:37614
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Setting remote socket options (FD=3)
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Option SO_KEEPALIVE set on remote socket
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Option TCP_NODELAY set on remote socket
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Remote descriptor (FD=3) initialized
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: SNI: sending servername: 10.1.0.39
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: Peer certificate not required
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): before SSL initialization
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS write client hello
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS write client hello
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read server hello
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: Certificate verification disabled
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: Certificate verification disabled
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read server certificate
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read server key exchange
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: Client certificate not requested
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read server done
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS write client key exchange
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS write change cipher spec
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS write finished
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS write finished
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read server session ticket
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read change cipher spec
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read finished
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: New session callback
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Peer certificate was cached (1054 bytes)
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: Session id: BC4F6AF4EB38482B2325149B0F18BF253253B3D38004982453E4AFFF260508AE
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]:      1 client connect(s) requested
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]:      1 client connect(s) succeeded
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]:      0 client renegotiation(s) requested
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]:      0 session reuse(s)
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: TLS connected: new session negotiated
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: TLSv1.2 ciphersuite: ECDHE-RSA-AES256-SHA384 (256-bit encryption)
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: Read socket closed (readsocket)
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: Write socket closed (write hangup)
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Sending close_notify alert
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS alert (write): warning: close notify
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: SSL_shutdown successfully sent close_notify alert
Aug 20 09:02:02 EPYC1 stunnel: LOG5[0]: Connection closed: 354 byte(s) sent to TLS, 470 byte(s) sent to socket
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Remote descriptor (FD=3) closed
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Service [stunnel] finished (0 left)
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Deallocating section defaults
Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: Clients allowed=500
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: stunnel 5.56 on x86_64-redhat-linux-gnu platform
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: Compiled with OpenSSL 1.1.1c FIPS  28 May 2019
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: Running  with OpenSSL 1.1.1d-fips  10 Sep 2019
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: Threading:PTHREAD Sockets:POLL,IPv6 TLS:ENGINE,FIPS,OCSP,SNI Auth:LIBWRAP
Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: errno: (*__errno_location ())
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: Reading configuration from descriptor 60
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: UTF-8 byte order mark not detected
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: FIPS mode disabled
Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: No PRNG seeding was required
Aug 20 09:02:02 EPYC1 stunnel: LOG6[ui]: Initializing inetd mode configuration
Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: Ciphers: ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-GCM-SHA384:AES256-SHA256:AES128-SHA256
Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: TLSv1.3 ciphersuites: TLS_CHACHA20_POLY1305_SHA256:TLS_AES_256_GCM_SHA384:TLS_AES_128_GCM_SHA256
Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: TLS options: 0x02100004 (+0x00000000, -0x00000000)
Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: No certificate or private key specified
Aug 20 09:02:02 EPYC1 stunnel: LOG4[ui]: Service [stunnel] needs authentication to prevent MITM attacks
Aug 20 09:02:02 EPYC1 stunnel: LOG5[ui]: Configuration successful
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Service [stunnel] started
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Setting local socket options (FD=0)
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Option TCP_NODELAY not supported on local socket
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Setting local socket options (FD=1)
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Option TCP_NODELAY not supported on local socket
Aug 20 09:02:02 EPYC1 stunnel: LOG5[0]: Service [stunnel] accepted connection from unnamed socket
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: s_connect: connecting 10.1.0.39:443
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: s_connect: s_poll_wait 10.1.0.39:443: waiting 10 seconds
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: FD=6 events=0x2001 revents=0x0
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: FD=3 events=0x2005 revents=0x0
Aug 20 09:02:02 EPYC1 stunnel: LOG5[0]: s_connect: connected 10.1.0.39:443
Aug 20 09:02:02 EPYC1 stunnel: LOG5[0]: Service [stunnel] connected remote server from 10.1.0.41:37616
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Setting remote socket options (FD=3)
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Option SO_KEEPALIVE set on remote socket
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Option TCP_NODELAY set on remote socket
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Remote descriptor (FD=3) initialized
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: SNI: sending servername: 10.1.0.39
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: Peer certificate not required
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): before SSL initialization
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS write client hello
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS write client hello
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read server hello
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: Certificate verification disabled
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: Certificate verification disabled
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read server certificate
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read server key exchange
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: Client certificate not requested
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read server done
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS write client key exchange
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS write change cipher spec
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS write finished
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS write finished
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read server session ticket
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read change cipher spec
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: TLS state (connect): SSLv3/TLS read finished
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: New session callback
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]: Peer certificate was cached (1054 bytes)
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: Session id: 17FE5F5F56F1C3FAA028D743B565C7EA33B0784810A8783416364439EA140BDB
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]:      1 client connect(s) requested
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]:      1 client connect(s) succeeded
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]:      0 client renegotiation(s) requested
Aug 20 09:02:02 EPYC1 stunnel: LOG7[0]:      0 session reuse(s)
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: TLS connected: new session negotiated
Aug 20 09:02:02 EPYC1 stunnel: LOG6[0]: TLSv1.2 ciphersuite: ECDHE-RSA-AES256-SHA384 (256-bit encryption)
Aug 20 09:02:03 EPYC1 stunnel: LOG5[1954]: Service [xapi] accepted connection from ::ffff:10.1.0.56:50712
Aug 20 09:02:03 EPYC1 stunnel: LOG5[1954]: s_connect: connected 127.0.0.1:80

I couldn't clearly treat the above results but it seems there is some problem with TSL 1.3/1.2 certificates.

Because after 09:02:02 nothing happens, no task, no traffic, no meaningful message in other logs.

 

Could anybody help encode above sequence - is it ok or no?

Thanks in advance.

 

One more found on the target (EPYC2) server. Not sure it is related but...

After about 4.5 minutes after vm-migrate initiated in xensource.log appeared error messages

Aug 20 09:06:23 epyc2 xapi: [ info||4327 INET :::80|event.from D:8e37bed87a3e|xapi_event] trackid=829a6842bc3edff86499372ec1f4dad9 raising SESSION_INVALID *because* subscription is invalid
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] event.from D:8e37bed87a3e failed with exception Server_error(SESSION_INVALID, [ OpaqueRef:ad0a1808-ff2c-4597-8a57-6c589466539c ])
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] Raised Server_error(SESSION_INVALID, [ OpaqueRef:ad0a1808-ff2c-4597-8a57-6c589466539c ])
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] 1/13 xapi Raised at file ocaml/xapi/xapi_event.ml, line 434
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] 2/13 xapi Called from file ocaml/xapi/xapi_event.ml, line 606
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] 3/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] 4/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] 5/13 xapi Called from file ocaml/xapi/xapi_event.ml, line 587
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] 6/13 xapi Called from file ocaml/xapi/xapi_event.ml, line 698
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] 7/13 xapi Called from file ocaml/xapi/rbac.ml, line 223
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] 8/13 xapi Called from file ocaml/xapi/rbac.ml, line 231
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] 9/13 xapi Called from file ocaml/xapi/server_helpers.ml, line 103
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] 10/13 xapi Called from file ocaml/xapi/server_helpers.ml, line 121
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] 11/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] 12/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace] 13/13 xapi Called from file lib/backtrace.ml, line 177
Aug 20 09:06:23 epyc2 xapi: [error||4327 INET :::80||backtrace]

 

Edited by skriuch118
add more information
Link to comment
  • 0
6 minutes ago, Tobias Kreidl said:

I assume a full migration within the same pool works OK?  Also, how about if you try migrating VM from your original target host instead to your host of origin just to see if you see the same problem in reverse?

 

Intrapool moution works fine.

But each pool consists onlu one server, so i could check only store motion.

That's working.

I could freely move any VM between storages.

 

XenCenter cross-pool migration wizard does not work in both ends (neither EPYC1=>EPYC2 nor vise versa) , will try to do so from xe cli.

Link to comment
  • 0
7 hours ago, Tobias Kreidl said:

Might then be a CPU compatibility issue which would preclude live migrations? I wonder if it would work with a shut-down VM?

 

-=Tobias

 

What kind of incompatibility you mean?

 

Between servers or between server and software?

 

Servers identical and have the same platform and CPU - AMD EPYC 7402P 24-Core Processor

 

Software (Citrix Hypervisor) claims the whole support of EPYC 2 family since CH8.1.

 

Migration didn't started at all, neither for online nor offline VMs.

It seems the communication between servers/pools is broken.

 

Below row in the log file might give the clue:

 

Aug 20 09:02:02 EPYC1 stunnel: LOG7[ui]: errno: (*__errno_location ())

Short googling about that message give a lot of problem in schannel communication, but not for citrix especially

 

I worry about schannel and TLS 1.2/1.3.

 

ecause in the documentation of CH8.2 stated the change in cypher - to TLS 1.3.

May be upgrade didn't update something correctly?

 

/etc/stunnel/xapi.conf contains direct support of TLS 1.2 only,

 

; autogenerated by xapi
fips = no
pid = /var/run/xapissl.pid
socket = r:TCP_NODELAY=1
socket = a:TCP_NODELAY=1
socket = l:TCP_NODELAY=1
socket = r:SO_KEEPALIVE=1
socket = a:SO_KEEPALIVE=1
; no idle timeout
debug = authpriv.5

[xapi]
accept = :::443
connect = 80
cert = /etc/xensource/xapi-ssl.pem
ciphers = ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-GCM-SHA384:AES256-SHA256:AES128-SHA256
curve = secp384r1
TIMEOUTclose = 1
options = CIPHER_SERVER_PREFERENCE
sslVersion = TLSv1.2

 

Just don't know what to do next with certificates and schannel :( 

Link to comment
  • 0
6 hours ago, Tobias Kreidl said:

What TLS versions are enabled that show up in XenCenter? See also https://support.citrix.com/article/CTX262795

 

Since 8.2 there is no Security tab in pool properties

 

image.thumb.png.12d65068df491b31a6fb5f4ffd9bb714.png

Documentation seems obsoleted

https://docs.citrix.com/en-us/xencenter/current-release/pools-properties.html#security

Because still mentioned switching between TLS 1.2 and Legacy Security option.

 

image.thumb.png.f33e76c861b3b26c28fb66c264877ee0.png

 

Citrix Hypervisor documentation stated the TLS 1.2 support only

 

https://docs.citrix.com/en-us/citrix-hypervisor/whats-new/whats-new-since-7-1.html#security-improvements

 

I worry about these title in documentation:

 

As part of this feature the legacy SSL mode and support for the TLS 1.0/1.1 protocol have been removed. Ensure you disable legacy SSL mode in your pools before upgrading or updating them to Citrix Hypervisor 8.2. If you have any custom scripts or clients that rely on a different protocol, update these components to use TLS 1.2.

 

I did it after upgrade, when everything already went wrong.

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...