Jump to content
Welcome to our new Citrix community!

Question about limits for URL length?


Ross Helfand

Recommended Posts

Hello all,

 

I'm still investigating this, but I figured I'd post a question to see if anyone has run in to this or has any clues to point me in the right direction.

 

In our environment, we route requests to different lb_vservers based on the contents of the URL.  So for example, we might have:

https://www.example.com/app1/STUFF -> routes to app1_lb_vs

https://www.example.com/app2/STUFF -> routes to app2_lb_vs

https://www.example.com/static/STUFF -> routes to static_lb_vs

etc.

 

We have responder policies/actions that use regex to parse the request and send them to the correct place (basically it uses HTTP.REQ.URL.PATH.GET(1) + "_lb_vs")

 

We also have a "Default" lb_vs where any request ends up that doesn't match any of the above.

 

The issue we're seeing is that we have a very long GET request coming in (it's a base64 encoded SVG file that is about 12k in length), and although it matches the regex, it's not getting parsed correctly.  The URL looks something like:
https://www.example.com/static/url(data:image/svg+xml;base64,LOTS_AND_LOTS_OF_CHARACTERS=")

 

Could this have anything to do with the TCPProfile?  I see bufferSize, but I'm not sure if I'm in the right ballpark here.  My other thought is that it could be trying to parse the entire URL during the regex in the responder action, and is not able to parse it?

 

I'm just looking for some ideas.  Any help would be appreciated!

Link to comment
Share on other sites

Also, I hope you are doing the above with content switching.

 

There's a difference between the string-based policy evaluators and the regex evaluators, in terms of what type of things can go wrong.  Which are you using? 

It may not be character limit, but it might be a syntax error or other things going on.

However, per this article:  https://docs.citrix.com/en-us/citrix-adc/current-release/appexpert/policies-and-expressions/regular-expressions/regular-exp-operations.html the regex_match operator evaluates an argument up to 1499 characters. How long is your URL and we might be able to write a better expression than what you are trying if length is part of the problem.

Also, the "evaluator" can choke on data that the ADC can properly parse so you can't always judge if it will work if the evaluator can show a test result. But I don't think that's the case here.

 

http.req.url.path.get(1) is not a regex-based evaluator and retrieves the element between the first and second "/" in the part of the URL, so it shouldn't have issues unless the URL is too long to be read at all (such as exceeding legitimate url lengths and the note above). The regex operators might in fact have issues, but your example above doesn't look like that should be the problem.

 

So in a URL like:

https://www.example.com/static/url<stuff>

 

http.req.url.path.get(1) is targeting "static"

http.req.url.path.get(2) is targeting "url<stuff>"

 

But if the "get" element is bigger than 1499 (as mentioned as the limit for the regex operators; no confirmation if get or contains has same problems), switch to starts_with or we can try to shorten the comparison to something else that might work.

 

If you want to see if the policy engine can't handle what you are asking, you could use a responder custom respondewith action to hit on that traffic and return a test page response:

Quotes are needed in gui for this output in the respondwith response field:

"HTTP/1.1 200 OK\r\n" + "Server:unknown\r\n" + "Connection:close\r\n\r\n" + "What ADC sees:  " + http.req.url.path.get(2) + "\r\n\r\n"

 

You could then compare what your expression "sees" or the other regex variant compared to what you pass in to see if there is a truncation occurring.

 

If its not length, it could be something else:

Supply an example of your expression in use and the url format to see if maybe your expression is just wrong for what you are evaluating (I know you may want to obscure parts, but it could just be a policy issue.) If you are in fact doing regex comparisons, there may be something wrong in your expression.

 

Also remember, the regular string based evaluations are also case-sensitive unless you make them not case sensitive and it may be your expression is just wrong for the value you want to look at. 

 

 

 

Link to comment
Share on other sites

Hi Rhonda,

 

Thank you so much for the detailed answer!  Yes, I was trying to dumb things down a bit to obscure the details.  Let me be a bit more detailed.

 

I hadn't thought about the policy evaluator, just because I was nervous about pasting in a 16k URL.  I figured it wouldn't be able to parse that.  But that is a great idea, and I will try it out using a smaller part of the URL.

 

The request that is "falling through" to our default_lb_vs is a base64 encoded SVG file and it's huge (about 16,000 bytes long).  It shouldn't really even be hitting our application.  When we look at Developer Tools in Chrome, the image is rendered directly by the browser.  However, in Safari, the request is sent back to our application.  It only happens a few times per day, out of many million requests per day so this is not anything urgent, it's just causing some sporadic alerts and it's bugging me.  ?  It also seems to only happen to a couple of the images, that are larger SVGs when encoded.

 

We do indeed use CS policies and actions. 

 

Our URLs actually have a couple of different parts, so they look more like:

https://www.example.com/NUMBER_1/NUMBER_2/rest_of_url

Where NUMBER_2 corresponds to the lb_vserver we want to send to.

 

Earlier in the list of rules we have something like this (still trying to dumb it down a little but this is more accurate):
 

add cs policy cs_static_content_pol -rule "HTTP.REQ.URL.PATH.REGEX_MATCH(re#/static[^/]*/.*#)"

bind cs vserver ourapp_cs_vs -policyName cs_static_content_pol -targetLBVserver static_lb_vs -priority 100

Then later:

add cs action cs_NUM_act -targetVserverExpr "HTTP.REQ.URL.PATH.GET(2) + \"_lb_vs\""

add cs policy cs_NUM_pol -rule "(Just verify that the URL here is valid, we do some things with checking against cookies and a patset)" -action cs_athenanet_stack_NUM_act

bind cs ourapp_cs_vs -policyName cs_NUM_pol -priority 200

 

The crazy URL coming through looks like:

https://www.example.com/NUMBER_1/NUMBER_2/static/build/url("data:image/svg+xml;base64,xxxxxxxxxxxxxxxxETC

 

So I don't really understand how it's missing BOTH of the content switching policies.

Link to comment
Share on other sites

Interesting!  I did some digging, and that regex for "static" is greedy.  I don't know why exactly.  Then I went and tried the policy editor.  As I added more and more characters to the URI, it took longer and longer to parse, and eventually the policy editor popped up a big red ERROR with "Response status: Request-URI Too Long (414)."

 

I guess my next question would be, would this kick the response back to the client as a 414 or would it just fall through the CS policies down to our default lb_vserver and allow the web server to process the request?

Link to comment
Share on other sites

I had drafted a different response last night but waited to post. So I'm going to grab the relevant bits for your update.

 

1) 12000 is way to long for a valid URL anyway and it might be something that was trying to embed with svg like an image or a css that was supposed to be in the body content; but its not generating a valid URl when requested.  And the regex engine won't handle above 1499 chars in that URL.

 

2) The greediness problem is something I thought was happening but didn't have the phrase for. But yes, your .* in that expression are looking for longest possible match and so the longer the URL the more exhaustive the search becomes and the regex engine is giving up if nothing else is.

 

3) It's possible this is also generating if not a flat out traffic DROP response its generating an UNDEFINED result in the policy evaluation and CS doesn't have any granular way to handle UNDEF, so I **believe** it would result in a DROP as well which might also be why you are getting no hits of this traffic. (I'm assuming other URLs are working.)

Either case, whether URL too long means a) adc drops for its own good, b) regex engine chokes on evaluation, or c) anything that results in undefined you have a problem.

 

I think this URL is going to be a problem no matter what, but its possible that a non-regex expression *might* handle it where as the regex engine specific incurs more processing overhead than the string comparison operators.

 

For this example:

https://www.example.com/NUMBER_1/NUMBER_2/static/build/url("data:image/svg+xml;base64,xxxxxxxxxxxxxxxxETC

 

See if simplifying the expression to these would actually process:

http.req.url.path.get(3).set_text_mode(ignorecase).contains("static")

--or--

http.req.url.path.get(3).set_text_mode(ignorecase).eq("static")
 

If the Number_1, Number_2 are fixed strings and not dynamic, just do:

http.req.url.path.starts_with("/Number_1/Number_2/static/")

 

The total length may still be a problem here but I think your in invalid content territory.

  • Like 1
Link to comment
Share on other sites

  • 10 months later...

This rare condition is only likely to occur when a client has improperly converted a POST request to a GET request with long query information. The HTTP 414 URI Too Long response status code indicates that the URI(Uniform Resource Identifier) requested by the client is longer than the server is willing to interpret.

To resolve this problem :

 

  • By POST request: Convert query string to json object and sent to API request with POST.
  • By GET request: Max length of request is depend on sever side as well as client side. Most webserver have limit 8k which is configurable. On the client side the different browser has different limit. The browser IE and Safari limit to 2k, Opera 4k and Firefox 8k. This means that the max length for the GET request is 8k and min request length is 2k.

 

If exceed the request max length then the request truncated outside the limit by web server or browser without any warning. Some server truncated request data but the some server reject it because of data lose and they will return with response code 414 Request-URI Too Long.

 

Under Apache, the limit is a configurable value, LimitRequestLine. If you want to increase URL limit to 5000 characters (bytes), add the following lines to your server configuration or virtual host file.

LimitRequestLine 5000

If you want to increase maximum header length supported by Apache to 3000 characters, then add the following line.

LimitRequestFieldSize 3000

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...