Posted November 6, 201410 yr We monitor hundreds of websites with HTTP monitors using secure and port 443, essentially checking a website over HTTPS and looking for a 200 response code. Pretty typical stuff. Several of these sites have the monitor consistently fail, and when we look at the servicegroup to see why, the monitor says "Last response: failure - Time out during SSL handshake stage". This is cropping up more and more, and we can't figure out why. We've looked at the typical settings like the response time-out to be sure that wasn't too low, SSL certificate errors, and we tried "GET /" instead of "HEAD /" as well, all without success. To dig deeper, I SSH'd into the Netscaler and dropped to the shell and did a "curl -IL -k "https://www.example.com" to see what would result (example.com is not the real website, of course) in an effort to mimic what the Netscaler's monitor might be doing. I got back this: "curl: (35) Unknown SSL protocol error in connection to www.example.com:443" Okay, this is somewhat helpful, although not entirely. But for other sites that give the same "Time out during SSL handshake stage", curl is not helpful because it shows "HTTP/1.1 200 OK" on the response, which makes it even more perplexing as to why the monitor is down. I'm posting today with the hope that someone has encountered a similar issue in the past and may have some ideas and/or troubleshooting strategies to see if we can get to the bottom of this. Right now we've had to use simple PING monitors for these sites instead, but that is far from accurate since PING might work when Apache/IIS is down, as we all know. Thanks!
November 7, 201410 yr Why not run a TCP trace on the Netscaler, and see what the actual monitors are doing, rather than trying to mimic them? Not only will you be able to see the packet details, you'll see the timings. If you put your SSL key onto wireshark, it will (if recent) also decrypt the SSL packets. Also worth remembering that HTTP 1.1 says you should have a Host header, which the default monitors don't actually have...
November 7, 201410 yr We are having the same problem. We have found that the monitor work fine with SHA-1 certificates, but they fail with SHA256 certificates. Are you using SHA256 certificates? We are having the issue when attempting to monitor two Web Interface servers runningon Windows Server 2008 R2. We are using VPX 3000s on 10.5. Did you find a solution? We have been working with Citrix support...thus far, to no avail. I am on the line with them now...just found your post.
November 7, 201410 yr Author John, I get the error with SHA1 certs too, so that doesn't seem to be the case for me. I also have successful HTTP monitoring with both types of certs, so it is something unique. It could be an IPS, firewall or something else on the other end too, for all I know. I'm working on setting up a test VPX device so I can run a trace like Paul suggests above. My production netscaler appliance is so busy that running a trace for 1 minute makes Gigs and Gigs of data. I'll post any results.
November 11, 201410 yr Re live box: can't you use a filter on the trace to pre-limit the data you capture? But yeah, a lab VPX would work well (although do remember that VPX doesn't do TLS 1.1 or 1.2, which could be part of the issue!) I'm assuming that you have "normal" timings for the monitor, so it's not a REAL timeout issue?
November 11, 201410 yr Possible reasons : 1) If you have a firewall in between these servers which is patched with "Poodle sslv3 block" , its possible that the packets are dropped on firewall when Netscaler uses sslv3 for ssl handshake . Better disable sslv3 on the services forcing service monitors on tlsv1 . 2) backend server are over consumed with resources , and is rejecting some ssl connections . 3) backend servers have multiple interfaces , and some return traffic are not routed back to Netscaler as its taking a different interface and looping in your network .
February 11, 201510 yr Author I can't believe that 3 months have gone by, but I finally had a couple hours to spare today, so I ran a trace and captured what the monitor was doing. This was the result: TLSv1 Record Layer: Alert (Level: Fatal, Description: Unsupported Certificate)Content Type: Alert (21) So I dug further to find the difference between this monitored device and others and found that this device has a certificate with a 4096 bit key. So I did some more testing and indeed, 4096 bit keys are not supported. 2048, no problem. Maybe if I have extra time in the next week, I'll try a 3072 to see how that goes. So the next question to the folks at Citrix is WHY!!! This will become a huge problem in a very short amount of time. Hopefully it is on the roadmap for support.
February 12, 201510 yr Author For what it is worth, I spun up an older Netscaler VPX with 9.3 - 54.4.nc and it is happy to check SSL certs with a 4096 bit key. I then upgraded that same Netscaler to 9.3 - 68.3.nc and it fails. So it seems Citrix downgraded this functionality at some point. : (
February 12, 201510 yr http://support.citrix.com/proddocs/topic/ns-faq-map/ns-faq-ssl-ref.html on an MPX, 4096 bit keys supported on back end servers, but on VPX, only 2048 bit keys supported on backend servers.
August 2, 20168 yr Possible reasons : 1) If you have a firewall in between these servers which is patched with "Poodle sslv3 block" , its possible that the packets are dropped on firewall when Netscaler uses sslv3 for ssl handshake . Better disable sslv3 on the services forcing service monitors on tlsv1 . 2) backend server are over consumed with resources , and is rejecting some ssl connections . 3) backend servers have multiple interfaces , and some return traffic are not routed back to Netscaler as its taking a different interface and looping in your network . 1st Option fixed my issue, Thanks.
March 25, 20196 yr same problem... https service to exchange and status down. I tried all...use tls1.2 disable firewall some different certificates. And error : Failure - Time out during SSL handshake stage I have netscaler NS12.0.56.20.nc and exchange 2016
June 26, 20196 yr Hi, I have ADC 13.0 36.27 and it also seems, that 4096 bit certificates are not supported on backend servers. I got the same error. Then I changed to a 2048 bit certificate and everything is ok. Is there still a known issue with 4K certificates? br, Patrick
June 26, 20196 yr Hi, I have ADC 13.0 36.27 and it also seems, that 4096 bit certificates are not supported on backend servers. I got the same error. Then I changed to a 2048 bit certificate and everything is ok. Is there still a known issue with 4K certificates? br, Patrick
June 26, 20196 yr Hi, I have ADC 13.0 36.27 and it also seems, that 4096 bit certificates are not supported on backend servers. I got the same error. Then I changed to a 2048 bit certificate and everything is ok. Is there still a known issue with 4K certificates? br, Patrick
Archived
This topic is now archived and is closed to further replies.