Renderfarm bottlenecks?

Become a member of the CGSociety

Connect, Share, and Learn with our Large Growing CG Art Community. It's Free!

THREAD CLOSED
 
Thread Tools Search this Thread Display Modes
  07 July 2013
Renderfarm bottlenecks?

For those that have renderfarms, where do you typically find your speed bottlenecks? Is it disk i/o? Is it memory i/o? Is it the processors? Is it the job handling configuration? etc?

Thanks.
 
  07 July 2013
I think ram is always the most important thing first, CPU 2nd

I don't think networking becomes a bottleneck until you get into some really huge numbers of nodes or start dealing with some massive scene files or textures

so to summarize, I'd say it has to do specifically how large your farm, scenes, and textures are.
 
  07 July 2013
Originally Posted by cgs-john: For those that have renderfarms, where do you typically find your speed bottlenecks? Is it disk i/o? Is it memory i/o? Is it the processors? Is it the job handling configuration? etc?

Thanks.

Memory I/O, is never really a problem, even on 4CPU configs with banks far apart NUMA is hardly ever that impactful in offline rendering, if at all.

Disk I/O locally again is just not that big of a deal within the context of a frame, but planning early for local caching of key items on some nodes will spare you a lot of headaches.
Network I/O with poor or no caching and pre-fetch can be disastrous on comp jobs. For 3D it depends on the assets and the target time per frame (big difference between facilities doing TV episodical work and requireing sub 6m per frame and a 30-120m per frame film VFX budget).

Job Handling has always been, and is only going to become more, tricky, and great policies and management go a long way if the workload is considerable. The interface between production and wrangling and the job management are likely to make a half speed or double the speed of throughput difference without laying a finger on any given blade.
__________________
Come, Join the Cult http://www.cultofrig.com - Rigging from First Principles
 
  07 July 2013
just to clarify my post, I was meaning the amount of ram, not its speed.

If a scene needs 2x as much memory and relies on virtual memory to get the job done, the CPU is going to often be waiting for the hard disk to feed it.
 
  07 July 2013
Originally Posted by sentry66: just to clarify my post, I was meaning the amount of ram, not its speed.

If a scene needs 2x as much memory and relies on virtual memory to get the job done, the CPU is going to often be waiting for the hard disk to feed it.

I think it was obvious, but I also believe it to be becoming too common-place a mantra.

Yes, having large amounts of RAM is great insurance, and the current cost of it makes it an obvious choice to have some, but not everybody renders the same things the same way.

If you have low concurrency per blade and little memory footprint (I have friends who work on local TV shows who deal with exactly that actually), then RAM takes a distant second seat compared to CPU.

If you run high concurrency with PRMan because of its poor resource management across width, and work to feature quality, then you can't possibly ever have enough ram to truly tap out all the CPUs properly.

It's usually cost (CPUs have a big jump in a stepped curve at one point while ram tends to be uniformly and linearly cheap) that makes ram a no-brainer advice, but there are plenty situations where you don't necessarily need to deck out every one of your 1U single CPU units in there with 64GBs right away, and can leave it for later, or even never.

I'm not a huge fan of blanket advice, and "zomg put moar raaam in!" is becoming one. It might be valid for a large number of cases, maybe even the majority, but it's not the totality of cases that see benefit from it, it's far more than fringes when some blades will more than happily trod along with 8 or 16. Believe it or not
__________________
Come, Join the Cult http://www.cultofrig.com - Rigging from First Principles

Last edited by ThE_JacO : 07 July 2013 at 04:13 AM.
 
  07 July 2013
Originally Posted by cgs-john: For those that have renderfarms, where do you typically find your speed bottlenecks? Is it disk i/o? Is it memory i/o? Is it the processors? Is it the job handling configuration? etc?


The question sounds simple but there's a tremendous number of variables that vastly affect the outcome. Help us help you, why are you asking this question? Are you running into bottlenecks, what application or task, what have you tried, how many nodes, what kind of network, the more information you can share the more insightful feedback we'll be able to give.
__________________
http://www.whenpicsfly.com
 
  07 July 2013
Originally Posted by olson: The question sounds simple but there's a tremendous number of variables that vastly affect the outcome. Help us help you, why are you asking this question? Are you running into bottlenecks, what application or task, what have you tried, how many nodes, what kind of network, the more information you can share the more insightful feedback we'll be able to give.


While between student projects, we are going to be expanding our small renderfarm. It will still be small, but here is an opportunity for us to start from scratch on maybe a half rack worth of render nodes (maybe 192 cores worth). Our typical usage of our current render farm is to render with RendermanProServer with the jobs controlled by Tractor, or to render with Mantra with the jobs controlled by HQueue. The file system is ext3 on a server on the same subnet as the render nodes with each being connected with a gig link. I think that if we were to give more time to tweaking the job/resource management in the job handling programs, we could improve our performance. However, I did not want to overlook some important consideration that maybe someone is experiencing now, that maybe we might not run into till later; i.e. if we were to experiment with 4K or stereography, etc. After this, it might be a while before we can upgrade again.

Thanks.
 
  07 July 2013
I think all that considered you'll find the bottlenecks, in the literal sense of the word, are usually enabling/disabling factors.
RAM being sufficient, storage being sufficient, file proximity when I/O is intense and so on.

Unlike large data centers or webfarms machines tend not to be specialized too much at the scale you work at.
Render nodes like what you seem to be going for are largely all uniform mules where you pack as much ram as you can (since you suggest 4k stereo, I imagine in Nuke, you will need some, badly), and then work out the best price per core that you can license, power and cool within budget.

Of course you DO have specialization and bottlenecks elsewhere, on the filer, routing, job management and so on, but that's way too much and too specific to the facility to discuss, and I'm under the impression you're asking about the mule nodes only anyway.

The software, configuration and management make a tremendous difference, the difference between a job literally not moving along for days VS a uniform constant churn of shots that keeps every core melting. That, I'd say, is the one factor often overlooked by small farms with many users. Other than that, there is hardly anything exotic these days at that scale.

P.S.
If you have more comp-sci centric courses, or are allowed some level of experimentation with software, I would consider a (minimal) expenditure on one GPU node of sorts. Mind, minimal, very minimal, but if this will have to last you years and you have people who wish to be on the modern side of the work they do, or have a shot at some of the GPU engines, a cheap one to test the waters and at least emulate the workflow that might be found in bigger facilities might not be a bad idea.
If you tend to be software locked for years, and have compsci, or other scientific, or graphic programming courses, ignore this P.S. altogether.
__________________
Come, Join the Cult http://www.cultofrig.com - Rigging from First Principles

Last edited by ThE_JacO : 07 July 2013 at 09:31 PM.
 
  07 July 2013
Thread automatically closed

This thread has been automatically closed as it remained inactive for 12 months. If you wish to continue the discussion, please create a new thread in the appropriate forum.
__________________
CGTalk Policy/Legalities
Note that as CGTalk Members, you agree to the terms and conditions of using this website.
 
Thread Closed share thread



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
CGSociety
Society of Digital Artists
www.cgsociety.org

Powered by vBulletin
Copyright 2000 - 2006,
Jelsoft Enterprises Ltd.
Minimize Ads
Forum Jump
Miscellaneous

All times are GMT. The time now is 09:43 PM.


Powered by vBulletin
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.