Jump to content

Richard Buffone

Internal Members
  • Posts

    26
  • Joined

  • Last visited

Posts posted by Richard Buffone

  1. 14 minutes ago, Sergio Masone1709161115 said:

    I tested the patch quickly, i still find a 200kb discrepancy between the file size of perfstringbackup.ini in 2211 or 2304 with the registry key workaround i was provided, I haven't checked yet the difference inside file to find whats missing, but I'm guessing it will be the same as you posted, but I will check to confirm this later on today, and report my findings here.

     

    You will need to remove the registry workarounds set on 23.4.5 before the fix will run.

  2. On 6/9/2020 at 5:18 AM, Akinola Oke said:

     

    I am just facing same issue and the procedure seems not to be working in my case

     

    I actually upgraded my OS-Layer from build v1809 to v1909. The upgrade went successfully and I was able to finalized the version.

    However any new version created from the upgraded "v1909" version is showing this symtom  and I have not been able to get a way around it.

    Windows was able to shutdown "shutdownForFinalize" successfully without any error message.

     

    The first comand runs fine, but the second always generate error message that the partition is already the requested size, so nothing could be in that respect.

     

    As it now, I am not able to update my OS-Layer in any way.

     

    Any other option please?

     

    Thanks

     

     

    The most recent occurrence of this I have seen was resolved by running chkdsk on the OS layer.

     

    The cleanest way to do this is to scan the OS layer disk in an offline state.

     

    -Shutdown the packaging machine that is exhibiting the problem.

    -Attach the OS layer disk to a non App Layering Windows VM (maintenance VM).

    -If needed, set the disk online in the maintenance VM through the Windows disk manager. Assign a drive letter to the OS disk.

    -Open an administrator command prompt and run:

    chkdsk /f X:

     

    -When done, make note if any errors were corrected and shutdown the maintenance VM.

    -Disconnect the disk and power back on the packaging machine. If it boots up fine, then select "Shutdown to finalize" again and verify the task completes.

     

    If this does not work, please open a support case so we can review what the specific error messages are in the logs.

  3. It's possible the network is the slow down. Please connect to the ELM with two ssh sessions. On the first session please run the following:

    sudo dstat -d --disk-tps

     

    On the second please run:

    sudo iptraf-ng

    Select "Detailed interface statistics" and then "eth0".

     

     

     

    If iptraf-ng is not installed, please install it using:

    sudo yum install iptraf-ng

     

     

    Now start a layering process, adding a version to an OS layer would be a good test. Pay attention to the current step being run per the App Layering web console and then watch the disk r/w rate from dstat. When the disks are being copied out to the storage account or being copied back in from storage, watch what the rate is on iptraf-ng and how it compares to the disk rate. If the disk copying steps are fast but the network copy steps are slow then we know it is a network bottleneck. If the disk steps are slow no matter which step we are running, then I would suggest opening a case with Microsoft to understand why the VM is not performing up to the specs of the underlying hardware. We have no internal optimizations that would need to be applied and the ELM will simply try to run as fast as possible based on the hardware.

     

    One side note, I assumed you installed the ELM on a recent version 1910, 1911 or 2001. If this was an ELM that was installed on 4.12 or an earlier version (whether you upgraded the appliance or not), you will have the older slower network interface on the appliance. This can be swapped out for the Azure accelerated networking interface for better throughput, or simply install a new ELM based on a more recent version.

     

    Please see here for how to swap the network interface on an App Layering installed at 4.12 or earlier:

    https://support.citrix.com/article/CTX237606

  4. My test ELM has a 1TB disk as Rob suggested and I see very good throughput at about 120 - 150MB/s. Is there a specific message you see when the task times out? Are you seeing a Server Busy message? If so then you are likely hitting the disk I/O limits and Azure is throttling it. Increasing the repository disk size to 1TB or larger should take care of that as it will allow for much more I/O at any given time.

     

    For the page file, Azure handles this on its own. I would suggest leaving the default settings to use the D: drive. This will not effect the layering process.

  5. 7 minutes ago, BFCS IT said:

     

    Would an acceptable workaround be to logon the app layer to the domain, install the application, test kerberos, logoff the domain, ngen update and finalize?

     

    This might work if no other SSO objects are being layered, however this would be against best practices and how we QA test this. On the technical side the SSO entries in the registry all need to be in one place, the platform layer, so each application changing this key understands there are other applications with SSO functionality. If each one of these apps is layered in a different layer, then they will only see their own registry entry at the time of installation. Whichever layer has the highest priority will override that key from any layer below it. The platform layer will always be the highest priority of any other layer which helps eliminate this issue.

     

    Depending on your environment you likely will still need to domain join your platform layer. When doing this please log in as a domain user once. Then reboot and log back in as the local admin and delete domain user's profile from the layer. Then you can finalize it. I am not sure if this is needed for this specific case but if you do need to perform a domain join, please do so using this process and only do so on the platform layer.

  6. Just an ELM logs export would be enough. Please do this shortly after you reboot the ELM when you encounter the problem.

     

    The logs can be exported from: System -> Manage Appliance -> Export Logs

     

     

    Thank you

  7. That looks like the Silverlight error I was thinking of. We would need more logs from the ELM to get a better idea of what is going on there. As I mentioned before check for network filters or firewalls that might be breaking the connection. Could something be limiting or conflicting with the ELM's IP address? Possibly test powering down the ELM for a while and then ping the IP address. If something responds then you have an IP conflict.

     

    If you are not doing so already, please try connecting to a VM that is on the same subnet as the ELM and be sure the data path does not get rerouted anywhere else first. See if you have the same problem from that VM as it shouldn't have the same policies applied. If that works then there is something between your management PC and the ELM appliance.


     

    Quote


    The "CIFS VFS......" message are shown directly on the Appliance console on the hypervisor.

    I would have to ignore these or make changes as necessary.


     

     

    As for making the SMB version changes. From the hypervisor console just press enter a few times until you see a login prompt. Those messages will clear as you login. Alternatively connect over SSH and the messages won't be in your way.

  8. 3 hours ago, Akinola Oke said:

    "CIFS VFS: Dialect not supported by server. Consider specifying vers=1.0 or vers=2.0 on mount for accessing older servers", "cifs_mount failed w/return code = -95.

     

    Are you seeing this from the hypervisor console of the ELM? Or are you seeing this in an error message or from a log while connected to the ELM with ssh?

     

    This message would only occur when the ELM establishes a connection to the SMB share. It seems that the share is reporting it supports a specific SMB version but then cannot talk the CIFS dialect of that protocol version. As the message states you can set the SMB version to something lower to ensure compatibility. To do this please see this solution which should be done from an ssh session to the ELM:

     

    https://support.citrix.com/article/CTX225742

     

    As noted in the article we did see a brief time when NetApp shares would not work with SMB version 3.02 and using only SMB 2.0 or 1.0 would work. I have not seen this issue reported for some time now and it is possible this was related to an older NetApp firmware version or a setting on the share itself.

     

     

    Crashing:

    As for your ELM crashing. I would be surprised if the SMB share is causing this but please let us know if it helps. The worst that should happen would be the inability to access the file share for layer imports / exports and elastic layer updates. The ELM logs would show if we are getting hung up reconnecting to the share or something like that but we should be failing gracefully if it cannot connect. For us to analyze the logs you would need a support case.

     

    Is the error message you are seeing a Silverlight exception? This means the Silverlight app running in your browser lost connection to the web server on the ELM. Does refreshing your web browser session allow it to reconnect? Do you have a proxy or network filter that touches the network traffic between the production ELM and the computer you are using to log into the App Layering web console? We have seen some network filters allow communication for a period of time and then cut us off with an ECONNREST message.

  9. The layer repository is using XFS for its file system. XFS does not have a tool that will shrink the file system safely and the general suggestion when using XFS is to backup the data, recreate the volume at the size you want, format it, and then copy the data back.

     

    The easy workaround for this is to export all layers you want to keep from your existing ELM to a file share. Then deploy a new ELM and import those layers. Once you have confirmed the new ELM is set up correctly and all layers have been restored, then you can delete the original ELM.

     

    To export your layers, please select the Layers tab and then click on any of the three sub tabs. You will see the Export and Import options on the right side menu. Select Export and you will be prompted to specify which file share to export to (it does not need to be the one used for elastic layers) and then you can select multiple layers to export at once.

     

     

    Thank you

    • Like 1
  10. Chrome and Firefox no longer support Silverlight and Edge never had Silverlight support, so only IE can be used. Do you see any additional popups or any sign that it has been blocked? Typically when I use a system where where I have never logged into the UMC before it will prompt me to confirm I want to allow Silverlight to run. It will appear as if it is not installed until you explicitly allow it.

    • Like 1
  11. I would suggest testing out the layer without adding the AlwaysOnBoot keys and see if any problems occur on the provisioned machine.

     

    Did you set this key on the OS layer from the prior time you set this up? I find it odd the issue returns if leave the keys in place. The AlwaysOnBoot key basically tells our filter driver to not virtualize those folders, in which case it should work as if our filter wasn't even installed.

  12. Are you using Citrix Federated Authentication Service in your environment? I saw one instance of FAS being attributed to elastic layer disconnects. For that case they did not need this feature so it was disabled and no other troubleshooting was done in that instance. I opened a ticket for this feature to be QA'd with elastic layers to see if there is a real problem or if this was a one off configuration issue.

     

    Otherwise I agree with the tests Rob suggested above. Another possibly useful test would be to have a server with elastic layering disabled. Login with one user and manually attach one of the elastic layer VHD's to the VM using the Windows Disk Manager (Action -> Attach VHD). This will simulate what our service is doing but will remove App Layering from the equation. This could help uncover any SMB specific issues.

  13. I have not seen any other reports yet about wmiprvse.exe running at 100% CPU on an app layer. Googling around a bit and it seems to be something that is common outside of a layering or virtualized environment. If you let the packaging machine sit for let's say 30 minutes to an hour, do you see the CPU activity stop? It could be a situation where Teams is looking for some other application, feature, or service already installed and then needs to perform some configuration or clean up that takes a bit of time.

     

    If the CPU usages does die down on the layer but jumps back on the layered image then we could take a look at this through a support case to see if it has anything to do with us. We would want to see if the problem is present with and without elastic layering enabled on the image. If it occurs on an image without elastic layers enabled then it could still be outside of our control since we would have no filter drivers running at that time. If it occurs only with elastic layering enabled then we have good reason to believe we could be causing this. In either case we would have a look to see if we are part of the problem.

×
×
  • Create New...