Jump to content
Welcome to our new Citrix community!
  • 0

XenServer Windows 10 VM is dropping files with write errors in Blue Iris IP Cam Software: How do I eliminate RAID read/write lag to optimize VM for Blue Iris Surveillance Software?


Xen See

Question

I'm fairly new at this hypervisor stuff, and not particularly experienced at hardcore IT. But my roommate bought a crappy Nest Cam, and I was so moved by the awful experience of the cloud-connected wifi surveillance cam subscription service racket that I built a Dell R710 homelab surveillance server running XenServer 7.6 and Blue Iris in a Windows 10 Pro VM. I realize this all might be easier if I just ran W10 bare metal, but I don't intend to max out this hardware with just cameras, hence why I'm trying to virtualize and leave some headroom for other fun stuff.
 

Initial Specs:

  • Dell R710 w/ 2x Xeon 5660 2.80Ghz CPUs, 128GB ECC RAM (16x8GB), PERC H700, 6x 3.5" hotswap bays
  • 6x Seagate 15k.7 3.5" 450GB SAS drives
  • XenServer installed on Disks 0 and 1 in RAID-0 (I'm aware there is no redundancy or failover protection with this setup, risk i'm willing to take until i have more spinning disks)
  • Windows 10 VM and, importantly, the camera footage both stored in the same 4 disk RAID-10 array on Disks 2, 3, 4, and 5


My problem is that every so often I notice that Blue Iris drops a video clip with a write error, specifically with the Blue Iris error code: Clip: write error 80000005, undefined. Not sure if that BI error code will be extraneous info here, but also of note is that it lists a negative value for the size of the storage volume in Blue Iris' Clip Storage tab. I picked a very low size of 24GB for the VM initially, with the intent to increase it's allocation of the logical volume as needed, and, if I'm  not mistaken, it is this 24GB size that, when surpassed by the VM due to increasing video clip storage requirements, causes a negative gigabyte value to be displayed within Blue Iris. It's like W10 isn't sure how big any files are, let alone the logical volume on which it resides.

I have a Hikvision 4k 2.8mm H.265+ PoE camera, which is the sole input to my Blue Iris software so far. Pretty heavy duty camera for consumer grade home surveillance server, but the load from one of these things even at full resolution should not be terribly taxing to a RAID 10 array on four 15k RPM SAS drives, I wouldn't think, especially if I've got H.265+ configured properly, which I believe I do. It would surprise me if that one camera is enough to max out the system as I have it configured, but there are obvious RAID issues that I need to work out, and hopefully doing so will fix this issue of dropping video clips.

Number 1 is that my clips are being stored to the same logical volume as my Windows 10 VM. Number 2 is that either my RAID-0 XenServer volume or my 4-disk RAID-10 VM+Clip volume (or both, but probably more so the RAID-10) are insufficient for the level of performance required.  I had hoped that with just one VM, and also just one or at least not many cameras, the file I/O of the RAID array would not be an issue. Maybe it shouldn't be. Maybe there is some other aspect that could be better configured, without re-configuring the RAID array, that would fix this write error. I'm very interested to learn of other ways to increase performance without adding spinning disks. But yeah for this first run of testing Blue Iris, my clips were stored directly within the Windows 10 VM's own file system on the 4 spinning disk RAID-10 array. Does the fact that I had issues with this configuration indicate that there is something more subtle going on or that I should just store the VM and clips on separate volumes and use better RAID config? What I mean is, does this sound like it should work ok and I'm maybe missing some other aspect than the RAID config?

 

I've been debating how to "add more spinning disks" to increase my RAID performance, working from the assumptions that fixing issues 1 and 2 will fix the write errors, and it seems to me that putting the Windows 10 VM on a USB flashdrive in my R710 might be a viable solution without buying any more hardware to add spinning disks. This introduces the question of whether the USB file I/O speeds will slow things down, and that is definitely a concern since it's USB 2.0 only, not 3.0, and even then it might still be a concern. Windows 10 USB VM file I/O is one thing, but the bigger concern seems to be ensuring the RAID array can handle writing the clips without fail. To further improve performance, I am going to change the 4 spinning disk RAID-10 volume to a RAID-0 volume. So both XenServer and my video clips will be stored on RAID-0 volumes with no redundancy or failover protection. I will add a USB drive storage repository to save recent clips/.jpgs as a backup in the event of drive failure. I'll have my VM on a USB, also backed up somewhere, allowing easy redeployment. If a drive fails, I'll have to replace it and either reinstall XenServer or re-initialize the clip storage RAID volume, and I lose all my clips if a drive in the clip storage array fails. Also, the server fails if a drive fails. There's no limp mode here. These are acceptable risks to me, at least while I am learning and testing. Eventually, I will decide how I want to spend my money to add more spinning disks for redundancy and failover protection.

So my solution is: move Windows 10 VM to a USB flashdrive and use 4-disk RAID-0 array for clip storage. Figure out redundancy and failover later. Does this seem like it will correct my write errors?

Also, I used the default options in both of the PERC H700 RAID-0 configurations, namely: Adaptive Read, Write Back, and I left Force WB w/ no battery UNCHECKED. How do the Read/Write policies affect RAID performance for my use case? I am a bit confused about Read/Write policies. I am also unsure of the correct setting for the "Copy host BIOS strings to VM" option that must be checked or unchecked prior to launching a VM for the first time...

 

And one last thing, I am getting an error when booting up XenServer:
 

[ some numbers] mce: Unable to init device /dev/mcelog (rc: -16)
[ some numbers] systemd[1]: Failed to insert module 'autofs4'
[FAILED] Failed to start Load Kernel Modules.
see 'systemctl status systemd-modules-load.service' for details.
		Starting Apply Kernel Variables...

And then everything proceeds with status: [ OK ]

From this thread, it  appears to just be a benign bug though:
https://bugs.xenserver.org/browse/XSO-712

 

Link to comment

8 answers to this question

Recommended Posts

  • 0

A 4-disk RAID array is probably not going to be able to handle the I/O very well, especially if you have your other VMs running on it, as well. A USB drive is also not going to be very fast, either. Is there any way you could get a small SSD drive dedicated onto your system as long as a local SR would be OK>

Do you have sufficient resources on dom0 (your XS host configuration) to handle loads? Run top and see if either your CPUs are saturated or you're out of swap space because of insufficient memory. Does your VM running this service have sufficient resources, as well?

Another issue might be network overload; have you monitored for dropped packets? Check eith ethtool and/or ifconfig. Also, storage I/O errors are generally written to the /var/log/SMlog area and you ought to look for specific errors there.

 

-=Tobias

Link to comment
  • 0
On 9/2/2019 at 4:55 PM, Tobias Kreidl said:

A 4-disk RAID array is probably not going to be able to handle the I/O very well, especially if you have your other VMs running on it, as well. A USB drive is also not going to be very fast, either. Is there any way you could get a small SSD drive dedicated onto your system as long as a local SR would be OK>

Do you have sufficient resources on dom0 (your XS host configuration) to handle loads? Run top and see if either your CPUs are saturated or you're out of swap space because of insufficient memory. Does your VM running this service have sufficient resources, as well?

Another issue might be network overload; have you monitored for dropped packets? Check eith ethtool and/or ifconfig. Also, storage I/O errors are generally written to the /var/log/SMlog area and you ought to look for specific errors there.

 

-=Tobias

Ok, yeah I'm cutting lots of corners just trying to see what it takes to get this done cheaply. Trying to keep costs down, but I've already messed that up by going with a server that only holds 6 spinning disks, whereas I probably could have gotten it done with one R510 with 12x 2.5" bays. So let's just see if I can get it performing with the one VM and one camera. Step by step. I can't figure out every last detail in one try. I sort of need to fail to understand why important settings matter. Once I've got a better understanding of all the details and the hardware requirements, I'll probably add several more spinning disks.

So my new configuration has XenServer installed on a single 15k RPM disk in RAID 0. The remaining five disks now have my W10 VM (with video clips stored within W10 file system).

I made a template out of a base Windows 10 Pro install, ran XenServer Tools and sysprep, made that sysprepped VM into a template, then made a fresh copy of the template. This new VM has 6 cores on 2 sockets, 96 GB / 2000+ GB for storage, 36 GB RAM.

Before this, I remade both my RAID volumes and changed the strip size to the largest setting allowed for my PERC H700, which I think was 1 MB. This may be too high. Certain aspects of the UI are benignly laggier, like I notice loading bars are jumpier. They seem to move larger jumps between percentages when installing XenServer or Windows, which would make sense if the minimum read or write is larger. Short video clips take longer to appear in Blue Iris after triggering and clipping a video. My goal with increasing the strip size was to better accommodate the file sizes of video clips and .jpg captures. Neglecting the prospect of other VMs with different needs, this surveillance server VM seemed to me like it would do better with larger strip size. I think if I do this over again , I'll redo my RAID volumes with a lower strip size, larger than the default for sure, but not 1 MB. Maximum strip size is probably overkill and not best performance.

One thing I didn't think about until too late is that I made BOTH volumes have the maximum strip size of 1 MB, but this doesn't seem smart since the XenServer volume probably has different I/O needs than the surveillance volume, right? So what is the ideal strip size for a single disk RAID-0 XenServer installation volume? Default is 64 KB, which seems absurdly low. Both of these volumes would probably be better with larger strip sizes right? I mean I'm not indexing millions of tiny files.

Anywho, the server works much better now. I was still getting some dropped clips from the same Blue Iris Write Error as before, but I seemed to have found a sweet spot with videos that clip themselves every 3 minutes. With this particular setting, the video files are being written reliably. I've been testing in continuous mode instead of motion activated mode, and there are no write errors with the exception of when I was doing something like starting or stopping Blue Iris or changing the camera's settings. I've yet to see it have a write error on its own, without me causing it. Every 3 minutes it clips a new video.

Problem is that's only at 1080p and 20 fps, but my camera maxes out at 4k at 20 fps. Tried max video quality with H.265+ direct to disk, no re-encoding. Results not good. But there are so many variables, I have no idea which matters most. I'm just twiddling knobs and seeing what happens. The VM lags really hard. Maybe I should have given it more resources. But I suspect the 1 MB strip size wasn't helping me as RAID I/O started maybe bottlenecking me.

As far as insufficient memory, storage, or network... I'm pretty sure none of those are at fault here. I have 128 GB of RAM in the R710 and 2000+ GB of storage available to the VM at max. Settings are currently 36 GB RAM and 96 GB storage, neither of which I'm anywhere close to touching. Write errors were coming right out of the gate before, and now only while I'm working in the VM/Blue Iris, like changing the camera's settings. Network capacity should not be an issue, yet. I have a Zyxel GS1900-8HP managed PoE ethernet switch. I changed the speed setting on the Hikvision camera to be "1000M full duplex" instead of "Auto", which is what it was on previously. As far as I know, this switch, Cat-5 cables, and Broadcom NICS in the Dell server are all gigabit compatible, which should crush H.265+, right?

Link to comment
  • 0

Many manufacturers suggest an ideal strip size too use for their arrays, and you can test this with programs like iometer and see for yourself what works best. You're probably correct that for camera data that is typically streams in or out that a larger strip size than the default may work better. You're correct that identifying bottlenecks and successively dealing with the weakest component is the way to get you to the point where things will hopefully work sufficiently well.

Finally, I/O depends heavily on the number of spindles hence more, smaller disks will win. I have seen an array of 20 spinning disks beat a 4-disk SSD array in writes by a factor of two (naturally, the SSD array beats the spinning disk array easily in reads by a factor of ten or more!). With storage, you have to take both reads and writes into account and remember that they will always be fighting against each other!

 

-=Tobias

Link to comment
  • 0

It's working pretty good now. I feel like I'm ready to add more cameras and see what this beast can really do. I may end up starting with an entirely fresh installation of XenServer and my W10 VM to fix the 1MB strip size issue. Couple questions here:

1. Apparently it's possible to change the strip size of a functional RAID array, is that recommended or no? I guess that's more of a Dell question than a Citrix one. It may require the trick of deleting and reconfiguring the array without initialization...

2. I may start from scratch one more time because I have this habit of digital self-flagellation. When I start a new VM from my W10 template, in which I've already run sysprep and XenServer Tools, I am forced to go through all the Windows settings again. All the initial privacy settings during installation as well as all many of the other Windows Settings have to be re-done. All the software that Windows includes, which I previously deleted before making the template, has to be re-deleted. Is there no way to start a VM from template without having to go through Windows installation procedure again? How can I make copies or templates of a base VM that I might want to start over with in the future without having to re-do all this Windows crap?

I'm strongly considering purchasing this Dell R510 that TechMikeNY configured for me to be a camera clip file server for this Blue Iris system I'm building. What do you think, if I add this to my setup, it would probably be plenty of spinning disks and resources right? I'm thinking one R510 configured like this would be enough to do the job. Why did I buy the R710 with 6 drive bays? I could have done this so much cheaper if I went with the R510 to begin with, assuming it would ok by itself.
https://www.amazon.com/dp/B07T9NFXJ1/

Link to comment
  • 0

Strip size is a question of both the manufacturer's recommendations and the type and access of data that are primarily going to be used. With large-format videos I'd guess that a larger strip would be beneficial, but again, the manufacturer may advise against it. For general-purpose Nexenta storage, for example, the recommendation is 32k, which seems small but the majority of our files are not that large.

 

Have you tried to just clone your existing VM and tweak it to make changes to correspond to the new VM? That's one option. Or you convert an existing VM into a new template and use it.

 

-=Tobias

Link to comment
  • 0
On 9/5/2019 at 12:10 PM, Tobias Kreidl said:

Have you tried to just clone your existing VM and tweak it to make changes to correspond to th enew VM?

Yeah I tried copying the VM by following the official Citrix instructions, which include running sysprep first. Running sysprep has always caused my VM's to go through the Windows setup process again, as if I'm installing for the first time. I lose all customizations. Total blank slate. I struggle to see how this is considered a "copy."

But subsequently I realized that I can just copy a VM in XenCenter without ever running sysprep, and it boots up fine, with everything intact, including all my settings changes, installed software, deleted programs, etc. This is a full copy, not a fast clone  or snapshot.

Why do the official Citrix instructions for copying a XenServer VM call for running sysprep when this is apparently not necessary and, in fact, actually does not do what I require??!!!

I am able to launch multiple instances of the same VM with no issues. I thought there might be some issue since I've not "generalized" the VM with sysprep, but the VMs booted just fine and threw no errors. Am I doing this right, or am I ignoring some kind of best practice?

Link to comment
  • 0

Oh I think I'm confusing terminology. Is cloning different than copying? Like you are saying I can create a cloned VM which is dependent on the state of the original? If I make settings changes or add/remove programs, these changes propagate to the clone? That would certainly be better than editing the original, making a full copy, and then redeploying every time I want to change the base VM from which my production VM is derived.

Link to comment
  • 0

It can be different, yes, depending if you do a full or fast clone. You'd ideally want to do a full clone for better performance and the ability to move the VM around with more agility; a fast clone just copies and tracks the differences from the original VM. You can make any number of copies of a clone, if you want the same fundamental base and would not have to keep duplicating the original VM. See also: https://support.citrix.com/article/CTX224040

 

Sounds to me that you may want to make a snapshot of a VM and create a template out of it which can then be easily duplicated to one or more new VMs.

 

-=Tobias

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...