my build a smallish renderfarm article


#1

I wrote a piece on building a mixed-platform render farm for freelancers:

http://arstechnica.com/information-technology/2014/05/how-to-network-lots-of-dumb-computing-muscle-in-a-fast-efficient-render-farm/

I point out that this isn’t designed to be a one-stop reference for scalable farms so please don’t think I’m telling companies to go this route for a huge VFX production. It’s just meant as a reference and Linux primer for those who want to get started with a small render farm but don’t know where to start.


#2

Nice article. This is a good intro article, that I’m sure many people will have critiques on what they think is are possibly better practices.

    It's a complicated subject that's more in the IT realm than CG creation, but many of us have begun to dip our toes into linux as it's become more viable for smaller studios.
    
    
    My only comments about the linux setup is that though disabling SElinux is probably the simplest way to get linux to permanently leave you alone with network sharing issues, you can alternatively enter (as root):
    
    setsebool -P samba_enable_home_dirs 1
    for enabling home directories to be shared
    and
    chcon -R t samba_share_t /your-directory
    
    these commands make SElinux allow those directories to interface with samba while still allowing SElinux to do its job.
    
    Maybe I missed it, but was there any mention of how to configure and set up a mounted directory so all your render nodes can grab textures, etc from a server when rendering? That becomes a fundamental thing when you start having more than just a handful of machines.
    
    Unless I didn't see it, there's no mention of tools that simultaneously control multiple machines at once like Cluster-SSH (CSSH) or like that one vfx IT guy mentions like Salt. IMO this is important for after you've deployed your machines and you need to say for instance uninstall an old version of maya and install the latest service pack. No one wants to go through that process for each machine as they maintain their render farm.
    
    Also, I just want to point out there is a definite difference between renders between linux and windows on mental ray at least. It's not always present, but if for instance you make use of really intensive AO shading, the sampling noise patterns are different between platforms. If the sampling is cranked beyond normal sane usage though, I'm sure it's not very noticeable.

1/6 of the article was dedicated to how to configure linux to set machines to sleep, but most people have their render farm going 24/7 with multiple projects or test renders etc. Anyway, I’m sure some people really want that feature, but I’d think most people set up a render farm because they don’t have enough rendering performance and the times when the farm isn’t floored are rare.

Power and air conditioning weren’t mentioned anywhere. Most CG artists have no idea how much power a rendering machine will draw, how many each power outlet circuit can handle, and how much A/C you need to cool the machines. It might even be noteworthy to mention a few of the companies that make USB temperature monitors that can send emails or phone calls if the temps go above a given amount in case the A/C fails.

   Anyway, good article. I myself sure wouldn't have attempted to write it because of how deep the subject can go. I feel like I barely know enough about it to handle my own work with my own people with our relatively small 20-computer farm.

#3

I’ll respectfully disagree with this. Everywhere I worked, and that ranges from 20 nodes all the way to 3000, power management was a huge deal, the farm is hardly ever floored all year, and considerable amounts of time have been spent to refine the most efficient ways to offline and online units transparently.

Farms in CGI are almost everywhere textbook examples of high delta loads. The work load in commercials, film and animation is naturally super spikey and urgent, which produces bursts, unlike scientific or rented computational power which tends to be long queued or just generally even and scaled to client base.

I think he did well to delve on it. I’ve only skimmed sections of the article, but it seems like a pretty good overview. I could probably nitpick a few things here and there, but it’d verge on personal and matters of perspective. As it is I think it has a good enough audience that will find it interesting.


#4

Yeah I think I read the other day that lucasfilm only uses a portion of their total renderfarm. I’m sure it becomes an even bigger deal with the large render farms. I obviously can’t speak for every studio, but I guess I was assuming large studios have overlapping projects going on around the clock all year.

On our small 20-node farm, it pretty much always is floored and we make an effort to have low-priority test animations ready to render on the rare occasions when our urgent projects are done. IMO the extra money spent on those renders more than makes up for the power costs in terms of saving us from making future mistakes we might have made on urgent projects if they weren’t tested first. I’m sure it’s a different case for large studios, but then again this article isn’t targeting seasoned vfx IT pros.


#5

Keeping the farm busy at all times, if there’s reason to, even in a pre-emptive fashion is a good practice, but you start bleeding back into production (when and what to produce and how to pre-empt through machine power, which is generally cheaper than man power).
I don’t find it mutually exclusive with good power management though. I believe you when you say you guys can keep it floored all year round, you probably have the turn-over and managerial foresight to do so efficiently if you chose to do it and like the results, but many places, even leaving out large centres (I have one only metres away from me :slight_smile: ), are in different circumstances and experience workloads that are much, much spikier than that, and often times that’s not a production deficiency, it’s simply tied to how they circulate work and expand/contract workforce.

Even in the super small, like the rendergarden in the article, many freelance artists who might want to build one have a varied enough array of deliverables that will often leave many nodes idling.

Not everybody will need it, I agree with that, all I was saying was that I do find it an important part of managing computational centres of any scale, and I find the space allotted to it more than appropriate. Hell, at any scale it probably warrants its own article if you want to investigate more than one option and the managerial correspondence of the choices :wink:


#6

Thanks for this article Dave, will certainly look to improve power efficiency on my small farm :slight_smile:


#7

ya, I realize that there is a lot to cover and I was already about 4000 words over budget so I couldn’t get into the real large-scale environment stuff like cooling and Salt, etc. It was just meant as a primer for those looking to get into network renders and to take some of the fear of Linux and render managers, power management, etc. out for the average user. We got a lot more traffic than I thought it would get so I think I was smart to make it mass-appeal like this. Obviously there’s a level of superficialness to that but there’s only so much you can do in ~10,000 words.


#8

Nice article with lots of detail. You’re a bit harsh on Linux though don’t you think? Thanks for plugging my video! :thumbsup:


#9

Had the time to read it, and I agree on both fronts. Good article, and a bit harsh to Linux.
Then I considered that possibly that’s how a large part of the target audience might feel, and the article gives “fair warning”, so I can’t quite get myself to criticize the harshness too harshly :wink:


#10

ya, it’s harsh if you’re accustomed to how insane troubleshooting can be in Linux. But it’s not harsh if you’re a guy who’s been using Macs his entire professional life and don’t know that you have to manually create execute file permissions for a downloaded rpm. From a usability standpoint in any other OS that’s madness but, to a Linux person, it’s simple logic. Ditto for having to be root to put the machine to sleep or change the screen resolution – even Linus Torvalds said that these distros should die in a fire for stupid decisions like that. I don’t think Dark Souls II was a bad comparison – I love this game but I’ve never visualized my controller hitting the screen so many times before and that’s after I played both Demon Souls and Dark Souls I twice through each. That’s basically Linux – you only master it after years of dealing with mind-boggling issues you’d never face on other OSes. Telling someone to put that in the heart of their small work environment as a do-all workstation without someone who is trained and knowledgable to handle the issues would be irresponsible on my part. It’s fine for render slave though but still not “easy”

I have a lot more experience with BASH and shell scripting than 99% of the readers of that article and it still took me tons of Googling to wrap my head around simple Linux stuff because of how wild west it is.


#11

My only objection to that is that you keep attributing to Linux, which at the end of the day is little more than a kernel, the fault of a distro, and that’s after picking a distro that’s kind of known to be meant for anything but your addressing your complaints of usability/accessibility (it’s simply not meant for that, it just doesn’t care).
Were you using Mint, and had you spent your entire life using that instead of Macs, you would find Macs absolutely and utterly unmanageable and inscrutable, which they are in the eyes of many sysadmins actually when it comes to certain domains.

CentOS is a free, marketing stripped, lagging repack of RHEL. It’s as distant as you can be from being intended to be user friendly, and RPM in general and the enterprise level RPM managers are, by design, horrendous.

CentOS isn’t Linux though, and I’ll be the first to agree clunky doesn’t even begin to describe the experience (and I’ve used it a minimum of 40 hours a week for the last six years).


#12

yeah, but the article was focused more on setting up a CG render farm which means you’re more likely to use the linux distros those rendering apps are more supported by.

Mint and Ubuntu are perfectly great for standard desktop usage, but probably few people are going to go to the trouble of recompiling their redhat-based software for it. They already know their way around linux anyway.

I do wish certain aspects of centOS were more user friendly. Stella is a neat repackage of centOS that comes bundled with some extra stuff that makes it more desktop-friendly. Probably too much extras for a lean render farm machine though.


#13

If you are setting up a renderfarm though the deployment mechanism, clunky as it might seem, is actually quite good, and friendliness isn’t really that coveted an attribute.
All that said, I don’t know why you think apps would need to be recompiled to work on Mint, they don’t.
I’ve had no issues whatsoever getting everything I needed running on it with hardly any work at all (no work at all in most cases in fact).

Anyway, as I said I thought the articles was pretty good, and I don’t feel it’s unduly harsh towards CentOS’ (many) faults given the audience context. I do disagree with the blanketing of it as Linux issues, but as I said before that’s nitpicky at best.


#14

I actually said in the article that it’s not the fault of Linux itself when something like an installer is badly done - but it gets through because developers assume its users will try plan B, like setting the working directory to the installer folder or fixing permissions. I agree that it has nothing to do with the kernel but you can make certain blanket statements like “it’s not nearly as easy to install all your graphics programs and getting them up and running on Linux as it is on a Mac or Windows”. Even the most die-hard Linux fanboy would not argue with that. Before recent releases, Deadline was a PITA to get running on Linux because of mono, which wasn’t a straightforward install on Linux where it was on OS X. They ended up bundling a version with the Deadline installer but you still have to hack at yum repos manually.