-
Posts
11 -
Joined
-
Last visited
Content Type
Forums
Articles
Labs
Videos
TechZone
Citrix Community Articles
Events
Profiles
Posts posted by Usuario Sistemas
-
-
On 10/15/2021 at 3:39 PM, Alan Lantz said:
what is the output of xe host-cpu-info of both systems ? It could just be some difference
enabled/disabled in bios as well.
--Alan--
Hello.
The pool master's xe host-cpu-info is as follows:
flags: fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ht syscall nx lm constant_tsc arch_perfmon rep_good nopl nonstop_tsc pni pclmulqdq monitor est ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat dtherm features: 029ee3ff-bfebfbff-00000001-2c100800 features_pv: 17c9cbf5-82b82203-2191cbf5-00000003-00000000-00000000-00000000-00000000-00001000-8c000000 features_hvm: 17cbfbff-82ba2223-2d93fbff-00000003-00000000-00000000-00000000-00000000-00001000-9c000000
The one of the host I want to add to the pool has the following:
flags: fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ht syscall nx lm constant_tsc arch_perfmon rep_good nopl nonstop_tsc pni pclmulqdq monitor est ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat dtherm features: 029ee3ff-bfebfbff-00000001-2c100800 features_pv: 17c9cbf5-82b82203-2191cbf5-00000003-00000000-00000000-00000000-00000000-00001000-8c000000 features_hvm: 17cbfbff-82ba2223-2d93fbff-00000003-00000000-00000000-00000000-00000000-00001000-8c000000
How could I know exactly where the problem comes from? Because you can see how the features_hvm has a variation from 9c000000 to 8c000000. At BIOS level I use default values activating the virtualization.
On the other hand reviewing more in depth differences the file "/proc/cpuinfo" I have observed that in the master of the pool marks the following:
bugs: l1tf
While in the new host it does not appear:
bugs:
Could this be the problem? Do you know any possibility to fix it to the current pool without affecting the availability? Because it is a pool in production.
Thank you very much Alan and Tobias for your answers!
-
18 hours ago, Alan Lantz said:
Is the firmware the same as well ?
--Alan--
Good morning Alan. First, thank you for your reply.
Yes, all firmware are up-to-date and up to par with the other hosts in the pool.The information at Dell OMSA level is exactly the same. The CPUs are the following:
Greetings.
-
Good morning. ?
When I try to add a host to a pool I get the following error:
You are attempting to add the server 'test' to a pool that is using newer CPUs. VMs starting on the pool in future will only use the reduced set of CPU features common to all the servers in the pool. VMs already running in the pool will not be able to migrate to the new server until they are restarted. Do you want to do this?
The problem is that the CPUs used are exactly the same even when checking the microcode matches between the CPUs used and those already present in the pool.
Do you know any solution to this problem?
Thanks in advance! ?
-
8 hours ago, Tobias Kreidl said:
You may need to destroy that old VM if the host still thinks it exists. This KB article may help: https://support.citrix.com/article/CTX215974
Hello Tobias and thank you for the reply.
Since I did the xl destroy the log stopped showing the error:
[10:47 XXX ~]# list_domains | grep 2c1a728b-6061-bae3-f71a-d88a67786720 19 | 2c1a728b-6061-bae3-f71a-d88a67786720 | B H [10:50 XXX ~]# xl destroy 19
Best regards,
David.
-
Hello,
Last month we updated the servers to XCP 8.2.
Today when I was revising the /var/log/xensource.log I saw this:
Oct 4 00:41:16 XXX xapi: [error||54 |Updating VM memory usage D:96dd0a69967d|monitor_mem_vms] Unable to update memory usage for VM 2c1a728b-6061-bae3-f71a-d88a67786720: Db_exn.Read_missing_uuid("VM", "", "2c1a728b-6061-bae3-f71a-d88a67786720")
The machine (as the log says), doesn't exists:
[15:57 XXX ~]# xe vm-list | grep -A3 2c1a728b-6061-bae3-f71a-d88a67786720
This logs repeat at every two second approximately.
I don't know what to do about this machine, because the log says that they can't update the memory, but the machine doesn't exists.
XCP Version:
[16:03 XXX ~]# cat /etc/redhat-release XCP-ng release 8.2.0 (xenenterprise)
Thank you in advance.
Regards,
David. -
2 hours ago, Tobias Kreidl said:
At this point, do you just want to get rid of the volume (meaning you don't care about trying to salvage any of the content)?
Hello Tobias,
I want to delete the volume , but I don't want to make affectation. This is a production server and there are machines on them.
Is there any way to check if the volume is associated to any disk_machine / snapshot or whatever?
Thank you again for your time,
Best regards,
David.
-
Hello Tobias,
I answer your questions:
3 hours ago, Tobias Kreidl said:Do you see the volume mounted, or I assume still not?
The volume appears "INACTIVE" as previously:
[root@xxx backup]# lvscan | grep "ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0" inactive '/dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0' [2.90 GiB] inherit
If I search this path, system say that the path don't exists:[root@xxx ~]# ls -l /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0 ls: cannot access /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0: No such file or directory
3 hours ago, Tobias Kreidl said:You could try to restore from the backup folder, using the most recent (April of this year) if you believe it to be current enough. I'd made sure all VMs are properly backed up, first, before taking such a drastic step.
About to restore the volume from the backup folder, the most recent backup is from April 26th:[root@xxx backup]# ls -la total 108 drwx------ 2 root root 4096 Apr 26 08:57 . drwxr-xr-x 7 root root 4096 Mar 25 2019 .. -rw------- 1 root root 1403 May 27 2020 VG_XenStorage-06ee4789-8c9a-6489-ff1b-85dd39aeb645 -rw------- 1 root root 90077 Apr 26 08:57 VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd -rw------- 1 root root 1838 Mar 29 12:27 VG_XenStorage-413867b7-fe7a-66a7-3c93-6af274a59280
However, I think that even restoring the copy the problem will persist since the copy is very recent.
In this host they are only 15 machines, I can move them to other host (like yyy , or zzz or aaa) and then restore the backup. But this is a big change and I would like to contemplate all the possibilites.
Do you have any other idea? Such delete the volume (I don't' think if it works either because the system can't find them).
Thank you again,
Regards.
David.
-
Hello,
Now the SMLog say:
Apr 26 09:01:27 xxx SMGC: [32260] SR 1189 ('PS405_DCP') (156 VDIs in 114 VHD trees): no changes Apr 26 09:01:27 xxx SMGC: [32260] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~* Apr 26 09:01:27 xxx SMGC: [32260] *********************** Apr 26 09:01:27 xxx SMGC: [32260] * E X C E P T I O N * Apr 26 09:01:27 xxx SMGC: [32260] *********************** Apr 26 09:01:27 xxx SMGC: [32260] gc: EXCEPTION <class 'XenAPI.Failure'>, ['UUID_INVALID', 'VDI', 'ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0'] Apr 26 09:01:27 xxx SMGC: [32260] File "/opt/xensource/sm/cleanup.py", line 2712, in gc Apr 26 09:01:27 xxx SMGC: [32260] _gc(None, srUuid, dryRun) Apr 26 09:01:27 xxx SMGC: [32260] File "/opt/xensource/sm/cleanup.py", line 2597, in _gc Apr 26 09:01:27 xxx SMGC: [32260] _gcLoop(sr, dryRun) Apr 26 09:01:27 xxx SMGC: [32260] File "/opt/xensource/sm/cleanup.py", line 2561, in _gcLoop Apr 26 09:01:27 xxx SMGC: [32260] sr.updateBlockInfo() Apr 26 09:01:27 xxx SMGC: [32260] File "/opt/xensource/sm/cleanup.py", line 2239, in updateBlockInfo Apr 26 09:01:27 xxx SMGC: [32260] if not vdi.getConfig(vdi.DB_VHD_BLOCKS): Apr 26 09:01:27 xxx SMGC: [32260] File "/opt/xensource/sm/cleanup.py", line 477, in getConfig Apr 26 09:01:27 xxx SMGC: [32260] config = self.sr.xapi.getConfigVDI(self, key) Apr 26 09:01:27 xxx SMGC: [32260] File "/opt/xensource/sm/cleanup.py", line 341, in getConfigVDI Apr 26 09:01:27 xxx SMGC: [32260] cfg = self.session.xenapi.VDI.get_sm_config(vdi.getRef()) Apr 26 09:01:27 xxx SMGC: [32260] File "/opt/xensource/sm/cleanup.py", line 473, in getRef Apr 26 09:01:27 xxx SMGC: [32260] self._vdiRef = self.sr.xapi.getRefVDI(self) Apr 26 09:01:27 xxx SMGC: [32260] File "/opt/xensource/sm/cleanup.py", line 316, in getRefVDI Apr 26 09:01:27 xxx SMGC: [32260] return self._getRefVDI(vdi.uuid) Apr 26 09:01:27 xxx SMGC: [32260] File "/opt/xensource/sm/cleanup.py", line 313, in _getRefVDI Apr 26 09:01:27 xxx SMGC: [32260] return self.session.xenapi.VDI.get_by_uuid(uuid) Apr 26 09:01:27 xxx SMGC: [32260] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__ Apr 26 09:01:27 xxx SMGC: [32260] return self.__send(self.__name, args) Apr 26 09:01:27 xxx SMGC: [32260] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request Apr 26 09:01:27 xxx SMGC: [32260] result = _parse_result(getattr(self, methodname)(*full_params)) Apr 26 09:01:27 xxx SMGC: [32260] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result Apr 26 09:01:27 xxx SMGC: [32260] raise Failure(result['ErrorDescription']) Apr 26 09:01:27 xxx SMGC: [32260] Apr 26 09:01:27 xxx SMGC: [32260] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~* Apr 26 09:01:27 xxx SMGC: [32260] * * * * * SR 1189b3ac-a362-3dd1-e657-4f11794238dd: ERROR Apr 26 09:01:27 xxx SMGC: [32260]
The UUID is the same that is supposedly corrupted.Thanks in advance.
David.
-
Hello Tobias and thank you for the reply.
I didn't try to reactive the volume, because I assumed that if the volume is inactive is for some reason (like deleted snapshot or deleted vm).
I searched under the /etc/lvm/backup and I don't see the volume:
[root@xxx ~]# ls -l /etc/lvm/backup total 100 -rw------- 1 root root 1403 May 27 2020 VG_XenStorage-06ee4789-8c9a-6489-ff1b-85dd39aeb645 -rw------- 1 root root 87743 Apr 23 17:05 VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd -rw------- 1 root root 1838 Mar 29 12:27 VG_XenStorage-413867b7-fe7a-66a7-3c93-6af274a59280
The second line is referenced to the UUID of the main storage
Under /etc/lvm/archive there's nothing:
[root@xxx ~]# ls -l /etc/lvm/archive total 0
Thank you,Sincerely,
David.
-
Hello,
From 04/21/2021 we are reporting a problem with the Garbage Collection process. The process shows us the following error:
Apr 23 14:20:35 xxx SMGC: [13659] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~* Apr 23 14:20:35 xxx SMGC: [13659] *********************** Apr 23 14:20:35 xxx SMGC: [13659] * E X C E P T I O N * Apr 23 14:20:35 xxx SMGC: [13659] *********************** Apr 23 14:20:35 xxx SMGC: [13659] coalesce: EXCEPTION <class 'util.SMException'>, VHD ee40bf51[VHD](50.000G//2.898G|a) corrupted Apr 23 14:20:35 xxx SMGC: [13659] File "/opt/xensource/sm/cleanup.py", line 1542, in coalesce Apr 23 14:20:35 xxx SMGC: [13659] self._coalesce(vdi) Apr 23 14:20:35 xxx SMGC: [13659] File "/opt/xensource/sm/cleanup.py", line 1732, in _coalesce Apr 23 14:20:35 xxx SMGC: [13659] vdi._doCoalesce() Apr 23 14:20:35 xxx SMGC: [13659] File "/opt/xensource/sm/cleanup.py", line 1179, in _doCoalesce Apr 23 14:20:35 xxx SMGC: [13659] self.parent.validate() Apr 23 14:20:35 xxx SMGC: [13659] File "/opt/xensource/sm/cleanup.py", line 1172, in validate Apr 23 14:20:35 xxx SMGC: [13659] VDI.validate(self, fast) Apr 23 14:20:35 xxx SMGC: [13659] File "/opt/xensource/sm/cleanup.py", line 673, in validate Apr 23 14:20:35 xxx SMGC: [13659] raise util.SMException("VHD %s corrupted" % self) Apr 23 14:20:35 xxx SMGC: [13659] Apr 23 14:20:35 xxx SMGC: [13659] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
As you can see, they say that the VHD ee40bf51 is corrupted. I read this article -> https://support.citrix.com/article/CTX201296 but I can't resolve the problem. If anyone would analyse the code, I attach the entire log.
I do this test:
Find the VHD path:
[root@xxx ~]# lvdisplay | grep -B2 -A11 "ee40bf51" --- Logical volume --- LV Path /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0 LV Name VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0 VG Name VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd LV UUID IJe1sI-ia1U-jzK1-dgZ7-SEXj-uinb-6lR4LX LV Write Access read only LV Creation host, time , LV Status NOT available LV Size 2.90 GiB Current LE 742 Segments 1 Allocation inherit Read ahead sectors auto
Check the path on all of the hosts:
[root@xxx ~]# ls -l /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0 ls: cannot access /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0: No such file or directory [root@yyy ~]# ls -l /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0 ls: cannot access /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0: No such file or directory [root@zzz ~]# ls -l /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0 ls: cannot access /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0: No such file or directory [root@aaa ~]# ls -l /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0 ls: cannot access /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0: No such file or directory
Check if VHD is active or not:[root@xxx ~]# lvscan | grep "ee40bf51" inactive '/dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0' [2.90 GiB] inherit
Try to repair the volume:
[root@xxx ~]# vhd-util repair -n /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0 error opening /dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0: -2
Check if volume is associated to any VM:
[root@xxx ~]# xe vbd-list vdi-uuid=ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0 [root@xxx ~]#
Do a vdi-forget:
[root@xxx ~]# xe vdi-forget uuid=ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0
Search more information about the volume:
[root@xxx ~]# xe vdi-param-list uuid=ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0 uuid ( RO) : ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0 name-label ( RW): base copy name-description ( RW): is-a-snapshot ( RO): false snapshot-of ( RO): <not in database> snapshots ( RO): snapshot-time ( RO): 19700101T00:00:00Z allowed-operations (SRO): generate_config; force_unlock; update; forget; destroy; snapshot; resize; copy; clone current-operations (SRO): sr-uuid ( RO): 1189b3ac-a362-3dd1-e657-4f11794238dd sr-name-label ( RO): PS405_DCP vbd-uuids (SRO): crashdump-uuids (SRO): virtual-size ( RO): 53687091200 physical-utilisation ( RO): 3112173568 location ( RO): ee40bf51-f6e4-4935-a3e7-b4fe3b0066f0 type ( RO): User sharable ( RO): false read-only ( RO): true storage-lock ( RO): false managed ( RO): false parent ( RO) [DEPRECATED]: <not in database> missing ( RO): false is-tools-iso ( RO): false other-config (MRW): xenstore-data (MRO): sm-config (MRO): vhd-blocks: eJztlEs....NGVLUI; vhd-parent: 4d5f02ca-46ac-4a5c-ab77-940362440fc3; vdi_type: vhd on-boot ( RW): persist allow-caching ( RW): false metadata-latest ( RO): false metadata-of-pool ( RO): <not in database> tags (SRW): cbt-enabled ( RO): false
Here I see the "vhd-parent" -> 4d5f02ca-46ac-4a5c-ab77-940362440fc3
Then, I check if VHD parent is active or not and see again the path (no results again):
[root@xxx ~]# lvscan | grep "4d5f02ca" inactive '/dev/VG_XenStorage-1189b3ac-a362-3dd1-e657-4f11794238dd/VHD-4d5f02ca-46ac-4a5c-ab77-940362440fc3' [1.55 GiB] inherit
And I do again all the test, but no result...
Finally, looking again on the logs, I read this sentence:Apr 23 14:20:35 xxx SM: [19295] Failed to lock /var/lock/sm/mpathcount1/host on first attempt, blocked by PID 19232
But when I looking for the PID, nothing is returned:
[root@xxx ~]# ps -p 19232 PID TTY TIME CMD
I don't know how to do with these volumes. I don't try to do a lvremove, but I think it would not work, since according to the system the volume does not exist. And before doing this I want to make sure I don't delete any virtual machines or client snapshots.
XCP Version:
[root@xxx ~]# cat /etc/redhat-release XCP-ng release 7.5.0 (xenenterprise)
Thank you in advance.
Regards,
David.
NFS randomly timing out
in Storage
Posted
Hello everyone,
We are facing a problem with a new Pool. This new pool is entirely made out of Dell R720, furthermore this servers are all up to date, and firmware/hardware is exactly the same on all the pool hosts. The OS version is the last stable version of XCP-ng. The OS of the VM is the last stable version of Debian 12.
The method we use is the following: We deploy a virtual machine through a custom template previously standardized by us (basic Debian 12 + cloud-init cloud-initramfs-growroot). Once the machine is deployed, we install and configure some additional software we provide, and we configure a IP. All this is done through the use of a custom ISO that have the user-data and meta-data files for cloud-init. This ISO is provided by an NFS server which is connected to the pool through an SR which is configured as an NFS ISO Library. As soon as cloud-init process is done, the same script unmounts the ISO ("eject" command), and a reboot is done ("/sbin/reboot" command). Here's when our error appears.
We have encountered that the following error does affect all VM's in one random host in the pool.
The moment the OS of the VM sends the reboot order, XCP-ng shows the console blank (white screen), preventing us from any further operation through console. The only actions available (making use of the buttons above) are "Force Reboot", "Force Shutdown", and "Pause", but these mentioned buttons don't execute any actions.
On the other hand, VM's on other hosts, work just fine (the process explained above finishes correctly and the VM is operative).
It should be noted that this same method is being used for other pools that are working correctly, so it is an error that we are only experiencing in this pool.
While investigating the issue, we have noticed that timeouts are appearing repeatedly in the affected host logs only when the ISO needs to be ejected:
When this log is shown, if we try to "df -h" to check the mount point, the command gets stuck and never finishes. Doing a "strace df -h" we can see that it is trying to reach the mount point itself:
Through Xen Orchestra, we can see that the SR is correctly attached, no errors whatsoever.
We also checked connectivity between the host itself and the NFS server and they are seeing each other correctly
.
The thing is, before starting installing VMs we can see the SR mounted:
Thanks in advance,
Regards.