12 Apr 2007 12:00 AM EDT

Hi, I am Prasanna Padmanabhan. I am a software developer in Citrix Engineering. This is my first blog entry in the Citrix Community blogs, so I am a newbie.

I am working with a small engineering team looking into some ideas for enhancing Load Balancing (LB) in Citrix Presentation Server (CPS), something we are investigating as part of the Constellation set of technologies here at Citrix. The reason why I am writing this blog is not only to get your thoughts on these new ideas, but also to understand from you about how you typically configure LB in CPS .

Before I start to describe the new ideas, let me tell you this. I or another one of our team members be able to write a separate blog post describing in detail how LB works in CPS today. If you like me to do that, do let me know since LB is a somewhat complex subject for most of us.

There are a couple of ideas that we are exploring.

  1. User Experience (EUE) based load balancing
  2. preferential load balancing.

The first has to do with load balancing based on the experience. That is loadbalancing to the server that at instance gives the best user experience. is a separate topic in itself and it discussion.

In this post, I talk User-Application preferential load balancing, which is certain applications and users to others. Often times, administrators want to provide a certain level of service to certain users based on their job functions, their position within the company etc. the same way published applications may want to be assigned a level of importance based on how critical that application is to their business. This could also change during different times of the year. Accounting and finance applications become all the more mission critical during the end of each quarter!

Today, administrators can do this in several ways, but we think they are not too straightforward.

  • 1. Manual isolation or siloing - manually assigning mission critical applications to their own servers and grouping many normal applications together on other servers.

This is acceptable but could sometimes be an administrative hassle- if you by mistake publish an important application along with other important applications on the same server, users will begin to notice performance issues.

  • 2. CPU priority levels - You can assign CPU priority levels for published applications (via the Access Management Console). This uses the operating systems CPU scheduler, which could potentially starve a lower priority application if a higher priority application does not relinquish the CPU for a long time (does not happen very often, but when it does users might start to notice slowness). Another problem is that you might publish the same application twice, with a different importance level setting, and end up with Outloook_2 and Outlook_3.
  • 3. Using the CPS policy mechanism to set limits on virtual channels such as bandwidth. But unfortunately, these are hard limits. Even if someone is not using that much bandwidth, a needy user cannot get it.

I am looking at this from a designer/developer point of view. As our customers you use the admin tools much more than I do. Do you

  • a) Agree that these are indeed problems? If so how do you currently work around it or
  • b) Think that it not a huge problem but would make your life easier things if there was a better solution to this or
  • c) Feel that this is not an issue at all?

We heard of (a) and (b), but not many (c) But if you feel that this is not a problem I like to hear why you think so.

User Application preferential load balancing is an idea that we believe will solve a good portion of these issues and that what I am going to talk about.

Central to the idea of User Application preferential load balancing, is the idea of resource shares. shares are simply numbers or quantifiers that denote how important a user or application will be treated. The more resource shares you have, the more CPU you get. The more resource shares an application gets, the more CPU it gets. example, assume that there are two ICA sessions, S1 and S2, currently running on a server, and they had 4 shares and 8 shares respectively. Then, if they were at any point competing for CPU at the same time, then they would get 33% and 66% of the CPU respectively.

The clause, in italics is important to understand - they were at any point competing for CPU at the same time These are soft-caps. So if the S2 wasn doing anything (i.e., in an idle state), then S1 could get more than its share of 1/3rd of the CPU if it needed it. But the moment S2 started to do something CPU intensive that would make it grab much of the CPU (eg: doing a search operation), then the CPU share enforcer would kick in, would take away the extra CPU cycles that S1 was temporarily enjoying, and hand those off to S2, the more important session.

Typically applications never consume large portions of the CPU for very long periods of time. They usually do so in short bursts (eg: a macro is being run in an application like Word, a search is being performed on a document etc). Without the CPU rebalancing feature, users might suddenly see longer response times or general slowness when other users perform these CPU intensive tasks. Thus, the CPU rebalancer effectively shields important users and application from these kinds of situations.

Some readers of this might be able to relate and connect this feature to the fair-share CPU scheduling feature ( CPU and the user Server User load rule.

border="4" cellpadding="0" cellspacing="0" style="width: 681px; height: 289px">| |

CPS 4.5 Enterprise valign="top" width="145">

CPU scheduling feature| Codenamed CPU this allowed administrators to ensure fair share (equal share) CPU usage amongst user sessions on a server. | Instead of each user session getting equal importance, sessions are given a numerical importance level based on who the user is and what applications they run within that session. Sessions with more get more importance within a server. this denotes inter session competition within a CPS server. | Server User load rule | This load rule tries to load balance sessions such that each server would approximately have equal number of sessions running on them at any given point. | Uses the notion of shares described above (session importance levels) to load balance between servers, so that important sessions are made to occupy more on a server thus prohibiting other important sessions to run on the same server . this denotes inter server competition within a CPS farm. | many shares a session gets is a function of the user and the applications that he or she is running. It is a product of your importance and the max of all the apps running in your session. Here is an example with a matrix to explain this better.

In a hospital, doctors and nurses may want to be assigned more shares for mission critical applications such as medical imaging applications (X-rays, CAT scans etc) or patient records since they deal with people lives; plus doctors in a hospital have several patients to visit during ward-rounds that patient record information must be quickly available to them when they are by the patient side, without any delays or bottlenecks in launching or using the application.
Compare this with that of a clinical lab technician or an administrative assistant at the same hospital. He/she might also have to pull up the same patient records once in a while but small delays might be acceptable in this case. So they could be simply considered normal users, which is of course the default setting. In the same way, a normal application could be something like a standard home grown which gets the default number of shares (2).
Thus, the User-Application preferential load balancing gives users a more predictable and consistent user experience. This workload management that I described comes from Aurema, a company that Citrix recently acquired.
That about all I have to say in this article. I shall talk about EUE based load balancing in a separate post. In the meantime, I am looking for actionable feedback on this idea. Also don forget to tell me whether you want to see another post on how load balancing, in general, works in Citrix Presentation Server.

Permalink | Comments (11) |

Thanks for this interesting article. Most of our customers won't need this kind of special treatment for "more important" persons. In fact, most ceo's, doctors, and so on would feel bad, if they would be treated as someone special, slowing down performance for everyone... This is the situation with our customers in switzerland and might/will be different in other countries. But in some cases this might anyway be a good solution for special situations (e. g. big excel sheets with huge calculations that slows down a terminal server for all users for minutes). More informations on standard loadbalancing of CPS today would be much appreciated. Regards Ecki

I wonder how many dimensions really apply - you've pulled out two 'who' and 'what' (application), but are there other important factors - time of day, location, 'other thing that Citrix will never think of'... It's great to have a mechanism out of the box for stand-alone CPS, but is there a case for also making the number of shares something that Web Interface or Access Gateway can influence? I'd have thought that tying in to AAC conditions could be useful - or perhaps it's something that could be open for scripting at Web Interface, allowing Web Interface to specify the number of shares a session deserves - based on any logic that makes sense for that deployment?

Ken, Good point. What you have suggested is something that we are already planning on doing for the most part. The idea is to tie the shares for a user via the poilcy mechanism. That is, it would be determined by the existing set of inputs to the policy system such as AAC filters, IP address etc. What about time of the day or year? That is something useful that we haven't given a good thought yet since the existing policy mechanism does not support time of day as one of its inputs.

Hi Prasanna, I work in our Consulting group, so I am always "recommending" and configuring custom load evaluators for our large and enterprise customers. I have a pretty good understanding of LB, but I would like to see you post how it "actually" works from the developer's standpoint - I think we would find that very valuable (and would help me make better recommendations and Citrix best practices in the field). Let me tell you what I typically do for customers before I address your ideas. I always immediately come into a customer site and get them off the default load evaluator (LE). I copy the Advanced LE and then modify it by setting up the custom LE using the following rules: CPU - and drop it from 90/10 to 75/10 Memory - and drop it from 90/10 to 75/10 Server User Load - and drop it from 100 to a reasonable value (50, 75, etc - I find this out after using TLoad or LoadRunner) - this ensures we have a cap just in case. Load Throttling - new in Ohio - I also leave this at High or medium-high typically. But this depends on how hard the boxes get hit in the morning, etc. And that is it - I always (always, always) delete the Page Swaps rule since that is based off a hard value (100/sec). So there are really 2 things I would like to see changed going forward: Don't apply the default load evaluator out of the box - apply something that we actually recommend as a best practice. I cant tell you how many times I come into customer sites for assessments and the default LE is being used and they have no clue as to why certain servers are "slow". Delete the page swaps rule from the advanced LE and make it the default. CPU and Memory are nice rules because they are based on percentages - not hard values like page swaps. Now - onto your post (but I hope we take some action on those "recommendations" I provided above). I like your ideas...Aurema has some cool technology indeed. But I'm not sure if it will be worth the time to implement some of these "nice-to-haves" in my opinion. This does make sense for some customers I've come across - for example in the retail vertical. Customers like Home Depot and Macy's could probably benefit from these new rules. But 99% of the customers I go to are completely satisfied with using CPU, Memory and Server User Load (again - server user load is just for setting a cap). With that being said, I think we need to enhance our LB technology and put in more AI. We need intelligence now - and I think that is what you are ultimately getting at with your ideas. The existing rules work just fine - but it would be nice if the LB technology could "learn" and respond as necessary to the overall health of the servers, certain users and times of year, etc. Execs will probably love this rule and your ideas - so they can get a nice fat server that is 10% utilized to check their email on - but this doesn't make sense for a lot of our customers. I foresee a lot of under-utilization with that...don't you? Anyway - there you have it - that is my experience at about 100 big customers of ours over the past 3 years. Let me know what you think. And good luck to you and your team - you certainly have made it easy for us to distribute load easily with the LB technology. Please post on how the LB technology is actually implemented - I am looking for more of what we cannot find in the ACG. Cheers, Nick.

Thanks for your detailed feedback Nick. I will go over your comments with the rest of the team and see if we can fix existing issues. Regarding your point about the possibility of under-utilization of servers, this is not quite true. The caps on CPU that we enforce are called "soft-caps". What that means is that, although an important user might have a large number of shares, these shares are not enforced if the user is idle or not doing enough computation to require the large CPU percentage he/she has been alloted. For example, if an exec has 8 shares, and there are two more user sessions on the server with 4 shares each, the exec does not really get 50% of the CPU unless he needs that much. If the exec's session wasn't doing anything, then the other two users could enjoy all or most of the CPU power among themselves, not just the 25%.

You make the following statement in relation to MalooCPU in CPS 4.5 "Ohio" "Maloo CPU allowed administrators to ensure fair share (equal share) CPU usage amongst user's sessions on a server." That statement is not completely true. Firstly, by default MalooCPU enforces fair share CPU usage among users, not sessions. That is a subtle distinction but worth noting, given that the same user could be logged in multiple times to the same server and therefore have multiple sessions. Secondly, MalooCPU may already be configured, via the registry, to assign non-default shares or reservation to users, thereby allowing one or more users to be favored in preference to others. The key details are as followed: Key: HKLM\SOFTWARE\Citrix\CTXCPU ValueName: Policy Where Policy is a multistring containing CPU share/reservation directives to MalooCPU. I presume that this information is buried somewhere in the CPS documentation. And the reason that I know all this? Well I was one of the MalooCPU developers at Aurema, and joined Citrix in Feb 2007 when Citrix acquired Aurema. Regards, Willie

Prasanna I've got a number of comments to make on this topic and I'll post them one at a time. Regarding Ken's comment I wonder how many dimensions really apply - you've pulled out two 'who' and 'what' (application), but are there other important factors - time of day, location, 'other thing that Citrix will never think of'..." Considering your example a healthcare environment location can be very important. If you look at for example a Pediatrician: A Pediatrician requires a higher priority of service in an exam room of an out-patient clinic than she would in when she is in her office. That same doc providing care of a child in the emergency department would require higher prioritization still. Same person, same application, same time of day, different location, different level of priority. And that is a really interesting challenge.

Hi Simon, You've got a good point - there will always be some policy inputs which we might never be able envision, that will infact be a great use case in some situations. Wrt "location", along with the user, IP address/client name are already inputs to our policy system. So, assuming that in your example the ER and child-wards are in different subnets, you can assign a different priority level to a user based on the IP address range.

can i know how Access Gateway works...

inay, not sure if you wanted to know how AG works in conjunction with CPS load balancing or simply how AG works in general. I'll try to answer both. 1. Preferential LB does not affect how AG works. All load balancing decisions to server in a Presentation Server farm are made by the servers themselves (each server has the logic built in to resolve a session to the least loaded server) and not by AG. When connecting via AG using Web Interface or PN Agent, the ICA file that is sent down to the client already has the information to instruct the client to connect to the least loaded server (AG simply acts as a proxy). 2. There is plenty of documentation about how AG works. Please check out: http://www.citrix.com/English/ps2/products/product.asp?contentID=15005

Not only is it useful to get to the 'how' in defining an LB strategy, but we would also need to be able to back that data up in production using EdgeSight Reporting Services. Currently there is no XML template which allows even basic published application reporting metrics such as user session statistics where there are no process faults or errors. Where can we find a template that can be uploaded and adjusted to report against these? Thanks!