The SmartAuditor system is by far the most scalable session recording system on the market. Competitors' products define scalability in terms of hundreds of sessions whereas SmartAuditor deals in thousands or tens of thousands of sessions. This makes it the only true enterprise scalable recording solution available, and in most cases, this is achieved without any additional hardware or administration costs.
This article explains how SmartAuditor achieves high scalability and how an IT manager or administrator can get the most out of their recording system at the lowest cost.
Why SmartAuditor scales so well
There are a number of reasons SmartAuditor scales so well when compared to competing products. The main reason is file size. A SmartAuditor recorded session file is highly compact - many orders of magnitude smaller than an equivalent video recording as used by solutions that screen scrape. The network bandwidth, disk space and disk I/O required to transport and store each file is typically at least ten times less than an equivalent video file.
The other major reason is the low processing overhead in generating each file. A SmartAuditor recording file is the ICA protocol data for a session extracted virtually in its native format. This means the same ICA protocol data stream used to communicate with the XenApp client is what's captured in the recording file. There is no need to run expensive transcoding or encoding software components to change the format of data in real-time. This low processing overhead is also very important for XenApp scalability ensuring the end user experience is maintained when many sessions are being recorded from the same server.
These scalability benefits are gained without any loss of functionality and in many cases there are many gains. Small file size means low overhead during playback allowing for faster and smoother rendering of video frames. SmartAuditor recordings are also completely lossless and have no pixilation common to most compact video formats. This makes text appearing in session recordings as easy to read during playback as what the original user experienced. To maintain the small file size, SmartAuditor does not record key-frames within the file. Patent pending technology implemented in the SmartAuditor Player allows for benefits of fast seeking without the need for key-frames. The seeking is in fact typically a lot faster than seeking in other video players.
Another optimization used is to only record the ICA virtual channels that are capable of being played back. For example, the printer and client drive mapping channels are not recorded as they can generate high volumes of data without benefit to the player.
Recording Sessions
The SmartAuditor Server is the central collection point for recorded session files. Each computer running XenApp with SmartAuditor enabled is configured to send recorded session data to this central collection point. As much as the SmartAuditor system is designed for dealing with high volumes of data and can tolerate bursts and faults, there are physical limits on how much one server can handle.
The most frequently asked questions regarding SmartAuditor are, "how large are recorded session files?" and, "how many session recordings can one centralized SmartAuditor Server handle?" The answer to the second question is dependent on the first. If the recording of a session generates a large file with a high data rate, the number of sessions that can be recorded by one SmartAuditor Server will be low. However, if the recording data rate per session is low, the server will be able to handle a large number of sessions.
Data rate per session varies greatly depending on what is being recorded. An ICA session running Microsoft Outlook as a published application where the end user occasionally sends and receives an email is likely to generate a small recording. 20MB over an 8 hours work day period is not unusual. A session running a CAD design package where graphic activity is constantly high will generate a much larger recording, maybe generating many hundreds of megabytes of data over the same period.
Many companies have run their own tests in the past and already know how much network bandwidth ICA sessions consume. As each SmartAuditor recorded session file is essentially the ICA protocol data captured in a file, the rate of data sent to the SmartAuditor Server for storage will closely resemble the data generated using the XenApp client.
How many recordings one SmartAuditor Server can handle is dependent on how quickly it can process and store the incoming data. You need the rate at which you can store data to be higher than the input rate. For example, if you need to record 5,000 sessions simultaneously running Outlook over an 8 hour work day, this equates to 100,000MB in total (that is, 5,000 x 20MB). Calculating this as the data rate per second, the result is approximately 3.5MB/s (that is, 100,000 divided by 8 hours, divided by 3600 seconds). This represents your input rate. A typical SmartAuditor Server connected to a 100Mbps LAN with sufficient disk space to store the recorded data would be capable of processing data at around 5.0MB/s, based on the physical limits imposed by disk and network I/O. Since this processing rate is higher than the input rate (3.5MB/s in versus 5.0MB/s out), recording these 5,000 Outlook sessions is feasible. Recording the same number of CAD sessions would generate an extremely high input rate and would require the use of more SmartAuditor Servers.
Bursts and Faults
The previous example assumed a very simple uniform throughput of data but does not explain how the system would deal with short periods of higher activity, or bursts. A burst might occur when everyone logs on at the same time in the morning, sometimes known as the 9 o'clock rush, or when everyone in the company receives the same email in their Outlook inbox at once. The 5.0MB/s processing rate we have determined the SmartAuditor Server is capable of will be drastically inadequate to deal with this immediate demand.
The SmartAuditor Agent running on each XenApp sends recorded data to the Storage Manager running on the central SmartAuditor Server via Microsoft Message Queuing, more commonly known as MSMQ. This causes data to be sent in a store-and-forward manner similar to how email is delivered between sender, mail servers and receiver. If the SmartAuditor Server or the network cannot handle the high rate of data, the recorded session data is temporarily stored until the backlog of data messages can be cleared. The data message might be temporarily stored in the outgoing queue on the XenApp Server if the network is congested, or stored on the SmartAuditor Server's receiving queue if the data has traversed the network but the Storage Manager is still busy processing other messages. MSMQ also serves as a fault tolerance mechanism. If the SmartAuditor Server goes down or the link is broken, recorded data will be held in the outgoing queue on each XenApp server. When the fault is rectified, the queued data will be sent en-masse. The use of MSMQ also allows an administrator to take a SmartAuditor Server offline for upgrade or maintenance without interrupting the recording of existing sessions and losing data. The main limitation of MSMQ is that disk space for the temporary storage of data messages is finite. This limits how long a burst, fault, or maintenance event can last before data will be eventually lost. The overall system can continue after data loss, but individual recordings will have chunks of data missing. A file with missing data is still playable but only up to the point that data was first lost. Adding more disk space to each server, especially the SmartAuditor Server, and making it available to MSMQ can increase the tolerance to bursts and faults. It is also important to set the Message Life setting for each SmartAuditor Agent to an appropriate level. This is configured on the Connections tab in the SmartAuditor Agent Properties application. The default value of 7,200 seconds (2 hours) implies that each recorded data message has 2 hours to reach the Storage Manager before it will be discarded and recording files will be damaged. With more disk space available (or less sessions to record), the administrator may choose to increase this value. The maximum value is 365 days.
The other limitation with MSMQ is that when data backlogs, there is extra disk I/O on the queue to write and read data messages. Under normal conditions, the Storage Manager receives and processes data from the network directly without the data message ever being written to disk. Storing the data involves a single write to disk to append the recorded session file. When data is backlogged the disk I/O is tripled; each message must be written to disk, read from disk and finally written to file. As the Storage Manager is heavily I/O bound, the processing rate of the SmartAuditor Server drops until the backlog of messages is cleared. To mitigate the effects of this extra I/O it is recommended that the disk where MSMQ stores messages is different from the recording file storage directories. Even though I/O bus traffic is tripled, the drop in the true processing rate is never as severe. The overall best practice is to have planned outages at off-peak times only. Depending on budget constraints, recognized approaches to building high availability servers should be followed. This includes the use of UPS, dual NICs, redundant switches, and hot swappable memory and disks.
Design for Spare Capacity
The data rate of recorded session data is unlikely to be uniform, bursts and faults will occur, and the clearing of message backlogs is expensive in terms of I/O. For this reason it is recommended that each SmartAuditor Server be designed with plenty of spare capacity. Extra capacity can always be gained by adding more servers or improving the specification of existing servers, as described in later sections. The general rule of thumb is to run each SmartAuditor server at a maximum of 50% of its total capacity. From the earlier example, if the server is capable of processing 5.0MB/s then target the system to run at 2.5MB/s. Instead of recording 5,000 Outlook sessions that generates 3.5MB/s to one SmartAuditor Server, reduce this to 3,500 sessions that will only generate around 2.5MB/s.
Backlogs and Live Playback
When a reviewer opens a session recording for playback while the recording of that session is still active, this is called live playback. During live playback the SmartAuditor Agent responsible for the session will switch into a streaming mode for that session where recording data is sent immediately to the Storage Manager without internal buffering. As the recording file is being constantly updated, the player can continue to be fed with the latest data for the live session. Note however, data sent from Agent to Storage Manager is still via MSMQ and the same queuing rules apply. The problem that can occur in this scenario is when MSMQ is backlogged; the new recorded data available for live playback is queued like all other data messages. The reviewer will still be able to play the file but there will be some delay in getting the latest live recorded data. If live playback is an important feature to reviewing users, it is recommended that the system be deployed to ensure a low probability of backlog. Designing for spare capacity and reducing the impact of faults is important.
XenApp Scalability
SmartAuditor will never slow or halt XenApp session performance as a method of dealing with recorded data backlogs. Maintaining the end user's experience and single server scalability was paramount in the design of the SmartAuditor recording system. If the recording system becomes irreversibly overloaded, recorded session data will simply be discarded. From extensive testing performed at the scalability test labs at Citrix, the impact of recording ICA sessions on XenApp performance and scalability was very light. The actual measure is dependent on the platform, memory available and the graphical nature of the sessions being recording. The following configuration can expect a single server scalability impact of between 1% and 5%. If the server was capable of hosting 100 users before installing SmartAuditor, it now would be able to host between 95 and 99 users.
- 64-bit server with 8GB RAM running XenApp.
- All sessions running office productivity applications, such as Outlook or Excel.
- Usage of applications by users is active and sustained.
- All sessions are being recorded as configured by the SmartAuditor recording policies.
If fewer sessions are recorded or session activity is less sustained and more sporadic, the impact will be less. In many cases, scalability impact will be negligible and user density per server will remain the same. As mentioned earlier, the low impact is due to the very simple processing requirements of the SmartAuditor components installed on each XenApp. Recorded data is simply extracted from the ICA session stack and sent verbatim to the SmartAuditor Server via MSMQ. There is no expensive encoding of data.
There is a very minor overhead of using SmartAuditor even when recording no sessions on a server. Although the impact is light, if it is known that no sessions will ever be recorded from a particular server then an administrator may wish to disable recording on that server altogether. Uninstalling SmartAuditor is one solution. A less invasive approach is to uncheck the Enable session recording for this Presentation Server_ _check box on the Recording tab in the SmartAuditor Agent Properties application. If session recording is ever required in future, the option need only be re-enabled.
Measuring Throughput
There are a number of ways to measure throughput of recorded session data both from the sending XenApp side and the receiving SmartAuditor Server side. One of the simplest and most effective approaches is to observe the size of files being recorded and the rate in which disk space on SmartAuditor Server is being consumed. The volume of data written to disk will very closely reflect the volume of network traffic being generated. The Windows Performance Monitor tool (perfmon.exe) has a range of standard system counters that can observed as well as some SmartAuditor provided counters. Performance counters can be used to measure throughput, identify bottlenecks and system problems. The following table outlines some of the most relevant performance counters to watch.
| Performance Object |
Counter Name |
Description |
| Citrix SmartAuditor Agent |
Active Recording Count |
This counter indicates the number of sessions that are currently being recorded on a particular XenApp computer. |
| Citrix SmartAuditor Agent |
Bytes read from the SmartAuditor Driver |
The measure of the number of bytes read from the kernel components responsible for acquiring session data. This is useful to determine how much data a single XenApp is generating for all sessions recorded on that server. |
| Citrix SmartAuditor Storage Manager |
Active Recording Count |
Similar to the SmartAuditor Agent counter except for the SmartAuditor Server. This will indicate the total number of sessions currently being recorded for all servers. |
| Citrix SmartAuditor Storage Manager |
Message bytes/sec |
The measure of throughput of all recorded sessions. This counter can be used to determine the rate at which the Storage Manager is processing data. If MSMQ is backlogged with messages, the Storage Manager will be running at full speed and thus this value can be used to indicate the maximum processing rate of the Storage Manager. |
| LogicalDisk |
Disk Write Bytes/sec |
This counter can be used to measure disk write-through performance. Disk performance is important in achieving high scalability for the SmartAuditor Server. Performance of individual drives can also be observed. |
| MSMQ Queue |
Bytes in Queue |
This counter can be used to determine the amount of data backlogged in the CitrixSmAudData message queue. If this value increases over time, the rate of recorded data received from the network is greater than the rate at which the Storage Manager can process data. This counter is useful for observing the affect of data bursts and faults. |
| MSMQ Queue |
Message in Queue |
Similar to the Bytes in Queue counter but measured in the number of messages. |
| Network Interface |
Bytes Total/sec |
This counter can be measured on both sides of the link to observe how much data is generated when recording sessions. When measured on the SmartAuditor Server this counter indicates the rate at which incoming data is received. This is in contrast to the Citrix SmartAuditor Storage Manager/Message bytes/sec counter which measure the processing rate of data. If network rate is greater, messages will build in the message queue. |
| Processor |
% Processor Time |
As much as CPU is unlikely to be bottleneck, it is worth monitoring CPU usage. |
SmartAuditor Server Hardware
When considering from a hardware perspective how to best increase the capacity of the overall system, the administrator has two choices. Scaling up, thus increasing the capacity of each server, or scaling out by adding more servers. The overall aim is to increase scalability at the lowest cost.
Scaling Up
In looking at a single SmartAuditor Server, there are a number of best practices to ensure optimal performance for available budget. As mentioned earlier, the system is very I/O bound with the aim of getting a high throughput of recorded data from the network onto the disk. Investment made in appropriate network and disk hardware is of most importance. For a high-end SmartAuditor Server, a dual CPU or dual core CPU is recommended but little is gained from any higher specification. 64-bit processor architecture is recommended but an x86 processor type is also suitable. 2 to 4GB or RAM is recommended but again there is little benefit from adding more.
Network
From a network perspective, a 100Mbps link is suitable but some gains can be made by using a gigabit Ethernet connection. Despite gigabit indicating a ten times improvement over 100Mbps in name, in practice the gain in throughput will be significantly less. Ensure network switches are not being shared with third party applications that might compete for available network bandwidth. Ideally switches should be dedicated for use with the SmartAuditor Server. If network congestion proves to be the bottleneck, a network upgrade is a relatively inexpensive way to increase the scalability of the system.
Storage
Investment in disk and storage hardware is the single most important factor in server scalability. The faster data can be written to disk, the higher the performance of the overall system. In selecting a storage solution be careful to take note of the write performance characteristics and less so in the read performance. Storage can either be a set of local disks possibly controlled as RAID by a local disk controller, or via a SAN (Storage Area Network). Storing to a NAS (Network Attached Storage) based on file-based protocols such as SMB, CIFS or NFS has serious performance and security implications and should never be used in a production SmartAuditor system. For a local drive setup, aim for a disk controller with built-in cache memory. Caching allows the controller to use elevator sorting during write-back thus minimizing disk head movement and ensuring write operations are completed without having to wait for the physical disk operation to complete. This can improve write performance significantly with minimal extra cost. Caching does however raise the problem of what happens to data after a power failure. To ensure the integrity of data and the file system, the caching disk controller must come with a battery backup facility. If power is lost, the cache will be maintained and data will be written to disk when power is eventually restored.
The use of a suitable RAID storage solution must also be considered. There are a number RAID levels available depending on requirements of performance and redundancy. The table below specifies each of the RAID levels and how applicable each standard is to SmartAuditor.
| RAID Level |
Type |
Minimum number of disks |
Description |
| RAID 0 |
Stripped set without parity |
2 |
Provides high performance but no redundancy. Loss of any disk destroys the array. This is a low cost solution for storing recorded session files where the impact of data loss is low. Easy to scale up performance by adding more disks. |
| RAID 1 |
Mirrored set without parity |
2 |
No performance gain over having one disk, making it a relatively expensive solution. Only use if high level of redundancy is required. |
| RAID 3 |
Stripped set with dedicated parity |
3 |
Provides very high write performance with redundancy characteristics similar to RAID 5. RAID 3 is recommended for video production and live streaming application. As SmartAuditor is a form of this type of application, this RAID level is the most recommended despite it not being very common. |
| RAID 5 |
Stripped set with distributed parity |
3 |
Provide high read performance with redundancy but at the cost of slower write performance. RAID 5 is the most common for general purpose usages but due to the slow write performance is not recommended for SmartAuditor. RAID 3 can be deployed at similar cost but with significantly better write performance. |
| RAID 10 |
Mirrored set and stripped set |
4 |
Provides performance characteristics of RAID 0 with redundancy benefits of RAID 1. An expensive solution that is not recommended for SmartAuditor. |
RAID 0 and RAID 3 are the most recommended RAID levels that should be considered. RAID 1 and RAID 5 are popular standards but are not recommended for SmartAuditor. RAID 10 does provide some performance benefits but is too expensive for the additional gain.
The next decision to be made is the type and specification of disk drives. IDE/ATA drives and external USB or Firewire drives are not suitable for use in SmartAuditor. The main choice is between SATA and SCSI. SATA drives provide reasonably high transfer rates at a reduced cost per megabyte when compared with SCSI drives. SCSI drives however provide better performance and are more common in server deployments. Server RAID solutions mostly support SCSI drives but some SATA RAID products are now available. When evaluating the specifications of disk drive products consider the rotational speed of disk, seek performance and other write performance characteristics.
As the recording of thousands of sessions per day can consume significant amounts of disk space, the choice between overall capacity and performance must be made. From an earlier example, recording 5,000 Outlook sessions over an 8 hour work day period will consume around 100GB of storage space. To store 10 days worth of recordings (that is, 50,000 recorded session files,) 1000GB or 1TB is required. This pressure on disk space can be eased by shortening the retention period before archiving or deleting old recordings. If 1TB of disk space was available, a 7 day retention period might be reasonable, ensuring disk space usage remains around 700GB with 300GB left over as a buffer for "busy" days. In SmartAuditor, the archiving and deleting of files is supported with the icldb command line utility and has a minimum retention period of 2 days. This can be scheduled to run as a background task once a day at some off-peak time. For more information on the icldb command and archiving see Citrix SmartAuditor for Presentation Server 4.5.
The alternative to using local drive and controllers is to use a SAN storage solution based on block-level disk access. To the SmartAuditor Server, the disk array appears as a local drive. SANs are more expensive to set up, but as the disk array is shared, SANs do have the advantage of simplified and centralized management. There are two main types of SAN; fibre-channel and iSCSI. iSCSI is essentially SCSI over TCP/IP and is gaining popularity over fibre-channel since the introduction of gigabit Ethernet.
Scaling Out
Even with the best scaling up practices, there are limits to performance and scalability that can be reached with a single SmartAuditor Server when recording a very large number of sessions. It may be necessary to add extra servers to meet the load. Each server would have its own dedicated storage, network switches and database. The collection of computers running XenApp with the SmartAuditor Agent is split to point at different servers thus dividing the load. This is discussed in SmartAuditor Deployment Scenarios.
Database Scalability
The SmartAuditor Database is hosted on any edition of Microsoft SQL Server 2005 including the freely available Express Edition. Since the physical recording files are written to separate disk files, the volume of data sent to the database on the SQL Server is very small. The database only stores metadata about recorded sessions which typically equates to only about 1KB of data per recording. Recording 5,000 sessions per 8 hour day will only consume about 5MB of database space. SQL Server 2005 Express Edition imposes a database size limit of 4GB; at 1KB per recording this means the database is capable of cataloguing up to around 4 million recordings. The other editions of SQL Server 2005, such as Standard or Enterprise Editions, do not have this size restriction; available disk space is the only limitation. Performance of the database and speed of searches will only degrade at a negligible rate as the number of records increases.
The exception to the 1KB per recording is if the SmartAuditor Event API is used to inject searchable events into recording files. The amount of space required to store event metadata in the database depends on the frequency and size of each event. Non-searchable events have no affect on the database as the event metadata is only recorded within the recording file itself.
In terms of transaction rates, assuming the Event API is not used, each recording file will generate four database transactions. Two when the recording starts, one when the user logs onto the session being recorded and one when the recording ends. For 5,000 sessions this equates to around 20,000 transactions over an 8 hour day. If the Event API is used, each searchable event recorded will generate one transaction. Since even the most basic database deployments can typically handle hundreds of transactions per second, the processing load on the database is never likely to be stressed. As the impact is so light, the SmartAuditor database can co-exist with existing third-party databases (including the XenApp Data Store database) on the same SQL Server instance.
For larger databases where many millions of recordings are stored, Microsoft guidelines on setting up and configuring SQL Server 2005 for scalability should be followed.
I find it a pity that printer and client drive mapping channels cannot be recorded - for monitoring remote access to a farm by admins from other companies I would need the capability to check scripts that were started via client drive mapping ...
I know it is not easy to implement; but if a complete file is copied, maybe you find some way of "keeping" this file?
"Keeping" a print job (we use UPD III only) is maybe easier, but not that important in my scenario.