Jump to content
Welcome to our new Citrix community!

Netscaler VPX (200) updating to version 13.0 58.32 not working


Recommended Posts

First off, I'm not well versed in Netscaler/Linux, so I hope that won't be too much of a problem :) 

 

We have a set of Netscaler (NS13.0 47.24 nc), which recently received an update from Citrix. 

In order to install this update I accessed the secondary node from the pair through the GUI, went to the System Upgrade bit, referred to the downloaded nCore update build-13.0-58.32_nc_64.tgz file locally, and ticked the reboot if successful.

 

The install went ahead, but seemed to stall at the compiling of a bit of python. After looking at that for 10 minutes (seemingly stalled), I decided to actually look at what the server was doing on it's main screen. Seems it silently rebooted, was aware that it needed the HDD to start, and stopped prior to actually loading the Linux OS. After letting that sit for another 10 minutes (hoping it would start eventually), I pulled the power on the VM, rebooted it, only to end at the same point. The server would not start it's actual boot. 

 

Reverted to a snapshot, and server came up with the old 47.24 build. Unsure if this was related to a wrong update file, I grabbed the KVM download, and tried to use that to upgrade (assuming the nCore download might have been the wrong file for our situation). That one just stopped the process midway through by itself noting a directory didn't exist, so that update is probably not the right one. 

 

Out of further options, I re-ran the nCore update on the node again, only to have it stall on the extraction of data. After looking at it trying to extract the same RPM file for 10 minutes, I hit the stop and close button. No changes were made from the looks of things, and knowing the first run of the update worked, I again kicked off the upgrade, only to now find it stalling at nsvpnc_setup64.exe during the extraction process. After letting that sit for another 10 minutes, I again stopped and closed the process out. 

 

So at this point the secondary node is up with version 47.24, The /var folder has lost a fair bit of free space, and I'm unsure on how to go about to actually get version 58.32 installed. Any thoughts?

Link to comment
Share on other sites

I would do the upgrade via the CLI only and make sure its not just a gui bug; if the issue persists you may need to involve support.

 

Make necessary config and vm backups before proceeding.

1. Create a directory for build_13.0-58.32 in the /var/nsintall/ directory

2.  Upload the build (.tgz) to /var/nsinstall/build_13.0-58.32/ directory

3.  Extract the build in this directory (for convenienc)
shell

cd /var/nsinstall/build_13.0-58.32/

tar xzvf <build name>

4.  Run install:  ./installns

 

If the issue is just with the GUI upload/timeouts or unpack, this should take care of the problem.

If the CLI is also failing, something else is going on.

  • Like 1
Link to comment
Share on other sites

Using the CLI is a possibility, but my unfamiliarity with Linux is somewhat of a problem. I'd mostly be going in blind and feeling about to see what does what. 

 

Sure, I managed to switch to the NSinstall folder... The build-folder already exists there. 

Still contains data, including the TGZ file from the GUI upload. Uploading that file to the primary node of the pair is going to be something I'm not sure how I'm going to pull that off. 

Ran the extract... Seems to have gone on about without too much trouble

Kicked off the installation. Which ran through without an issue, then asked me to restart. 

 

When looking at the log, I found the compiling bit on Python which was the last bit noted in my first GUI install, just prior to the completion of the install, so I'm assuming the install ran to completion, just didn't update the screen on my first run:

  • installns: [9530]: Compiling python modules... done
  • Creating before PE start upgrade script ...
  • installns: [9530]: Creating before PE start upgrade script ...
  • Creating after upgrade script ...
  • installns: [9530]: Creating after upgrade script ...
  • installns: [9530]: prompting for reboot
  • installns: [9530]: END_TIME 1594385484 Fri Jul 10 12:51:24 2020
  • installns: [9530]: total_xml_requests.db renamed to total_xml_requests.db
  • Installation has completed.
  • Reboot NOW? [Y/N] y
  • Rebooting ...
  • installns: [9530]: Rebooting ...

Post reboot again, the same error... System doesn't start... 

 

Shows the following on screen:

 

  • BTX loader 1.00  BTX version is 1.02
  • Consoles: internal video/keyboard
  • BIOS drive C: is disk0
  • BIOS 639kB/3144680kB available memory
  •  
  • FreeBSD/x86 bootstrap loader, Revision NS1.2
  • Loading /boot/defaults/loader.conf
  • \

And that's it... Just sits there... 

 

My colleague apparently has seen this before (and back then after much cursing and a massive amount of internet browsing managed to find out a cause) and referred to something regarding a required COM port in the loader.conf that is killing off the boot process. Mostly since we don't have a COM port on that VM:

  • Modify /boot/defaults/loader.conf
  • (/flash/boot/loader.conf)
  • find the line with console="vidconsole,comconsole"
  • remove comconsole from the console option.
  •  
  • Remember to modify the loader.conf file each time you upgrade/downgrade NS firmware.
  • Use CLI to update the NS firmware & do not reboot immediately after running /var/nsinstall/build_<targetbuildnumber>/installns

Seems that as long as the COMCONSOLE is in the loader.conf, the boot will fail... And you have no means to get around that to actually edit the loader.conf to remove that bit of text, and allowing the boot to proceed.

 

Also, seems the procedure for a HA pair of NetScalers is also a bit more involved then I was led to believe: https://www.markbrilman.nl/2012/09/netscaler-in-ha-command-line-upgrade-or-downgrade-procedure/

 

So now I once again trashed the server, and went back to it's snapshot... Managed to find that WinSCP allows for file transfers... so using that to atleast get the TGZ files to the systems. 

 

Doing this upgrade is going to be a lot more involved that I originally anticipated...

Link to comment
Share on other sites

Yes WinSCP or any SCP transfer can load files to the correct directory on both adc's.

I would delete the existing upload and do a fresh upload/extraction.

Previous instances of the python issue were caused by the gui process. Switching to cli, but using previous upload may still be the problem.

 

Upgrades are relatively straightforward, so if you are still having a command issue like that, you may have to contact support (or try a different build, check the build release notes for compatibililty issues). An engineer might respond to the forum, but no gurantee.

 

What platform is your VPX (which hypervisor)

And what exact build firmware file do you have (Just in case you have the wrong file).

If the kernel hangs during install, there is a kernel recovery process but you can rollback the vm via the snapshot too.

 

For an HA pair, the procedure is essentially:

backup all necessary configs on both systems.

upgrade secondary/reboot.  Then failover. Upgrade primary and reboot.

There are a few more steps...but that's the basic gist.

 

Is /flash full (instead of /var):  

https://support.citrix.com/article/CTX133587

 

Last case scenario you could deploy a NEW vpx instance with a fresh install and migrate config to get around the upgrade issue. (But that may be more complicated if you are unfamiliar.)

 

 

Link to comment
Share on other sites

Both NetScalers did NOT have the TGZ file (the one I worked with that died was reverted to a snapshot prior to the file ever existing). 

Both have the folder and TGZ file (newly uploaded) on them now via means of WinSCP, so kicking off the updates is feasable at this point in time. With how involved this however seems to be, I'm not going to do this just prior to leaving for the weekend. 

 

The issue I ran into wasn't specifically taking the HA configuration into account, but is referring to a problem in the loader.conf file... And THAT problem will remain with every update apparently, since I assume the loader.conf is just recreated during the update process. As the apparent recreation has the referral to the COMCONSOLE hardcoded in, and that entry needs to be removed on our hypervisors, that will cause a problem on every update, making the update process very much NOT straightforward. 

 

I was under the impression it wouldn't be more than updating the secondary node from the GUI, and once that one is back from it's reboot to update the primary node (again from the GUI), which should kick off a HA failover as part of it's update process. Once the primary is back to select to do a failback, or leave things as-is... The document link I posted however goes about the whole process through the CLI. Which is probably needed due to the loader.conf problem.

 

Hypervisor we use is based on KVM (it's a hyperconverged platform), firmware file I'm now using is the build-13.0-58.32_nc_64.tgz, which clocks in at 812.485.030 bytes. 

Link to comment
Share on other sites

Ah...so this may be a kvm specific issue (which might be having an impact) that you don't see in other hypervisors.

https://docs.citrix.com/en-us/citrix-adc/13/deploying-vpx/install-vpx-on-kvm/prerequisites-installation-on-kvm.html

 

See if anything here might be relevant:

https://discussions.citrix.com/topic/394112-netscaler-gateway-vpx-kvm-acropolis-hypervisor-no-boot/

As it talks about a loader.conf issue on AHV (which is kvm adjacent)

 

 

Otherwise, I would reach out to support as the install should not be having a conflict and they may know about a workaround.

Link to comment
Share on other sites

Thanks for this, I found it quite helpful as I'm looking into upgrading Citrix ADC to this version as well.

One thing I can add is that Citrix ADC uses FreeBSD, which is similar but not the same as Linux, so when using the command line be aware there can be differences if you are familiar with only Linux. I had issues trying to get the Azure extension(s) to work with this as a VM as the extensions seems to be more geared to the Linux OS's.  *BSD variants are quite different (they come from different code source) and I hope it's okay spreading a bit of BSD world info here, they are under represented. :5_smiley:

Link to comment
Share on other sites

Seeing I had a day off on Friday I did not want to update the system last Thursday. If anything was going to act up, I'd have 'wrecked' it, and someone else with no knowledge of what exactly I did would have to clean it up. 

 

Didn't have time yesterday, but took the time today. Installed the update via a Putty session, and did NOT reboot at the end. 

Connected with WinSCP to the system post-update, went to /flash/boot/loader.conf, and edited the file

 

The contents of the file (pre update) were

 

autoboot_delay=3
boot_verbose=0
kernel="/ns-13.0-47.24"
vfs.root.mountfrom="ufs:/dev/md0c"
#console="vidconsole,comconsole"

console="vidconsole"

 

The default contents of the file (post update) are:

 

autoboot_delay=3
boot_verbose=0
kernel="/ns-13.0-58.32"
vfs.root.mountfrom="ufs:/dev/md0c"
console="vidconsole,comconsole"

 

So that file is definitly changed as part of the update! I edited that to 

 

autoboot_delay=3
boot_verbose=0
kernel="/ns-13.0-58.32"
vfs.root.mountfrom="ufs:/dev/md0c"
#console="vidconsole,comconsole"

console="vidconsole"

 

Then I hit reboot... And the system went down and properly rebooted again. Our secondary node (once I log in to the webpage for it) shows the 58.32 version as being active. So any update edits the loader.conf to the default, which will NOT work on all hypervisor systems. I suspect it's assumed XenServer is the hypervisor of choice for the NetScalers, and I also assume XenServer has no issues in handeling that comconsole bit of the loader, whereas different hypervisors kind of 'choke' on this. 

 

Due to the failover causing a complete disconnect of all users that are currently logged in through the NetScaler, I will not force a failover right now. My colleague who will be in really early tomorrow is going to look at that. Once that is done, the connections then should be made through the now updated node, leaving us free to update the other node tomorrow through the same means. 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...