PDA

View Full Version : Trouble on the farm


Linds
05-12-2006, 12:39 AM
I have a few questions about render farms.

I follow this forum, and the C4D forum on Postforum pretty closely, and rarely see posts from folks having problems with Cinema net rendering. Now either hardly anyone is using render farms, or they are so reliable, as to be not worth mentioning.

If the latter is true, then it certainly hansnít been my experience...

Iíve been running a small render farm for Cinema for several years now - canít remember if we were on version 7 or 8 at the time. We started with four Dual Athlon boxes custom built for us by our IT provider, and have more recently added another four Intel boxes - off the shelf HP models.
The original Dual Athlons were nothing but trouble, crashing relentlessly, a problem we put down to overheating. After building them their own special comfy air conditioned room (constant at a chilly 18 degrees) with little improvement, we eventually swapped out the boards and chips for single Pentiums.

So we now have eight render boxes, and when they all work, itís great.
Trouble is, they hardly ever do.

I usually set off big renders when I leave the studio in the evenings. Typically two or three multi pass renders, 750 frames PAL. Maybe six or eight hours of rendering.


On a good morning Iíll get in and only one or two if the clients will have quit, or shut down. This is cause for celebration, because more usually, the server will have crashed and/or shutdown part way through the job.

Sometimes everything seems to be normal, except the job has just dropped off the render queue at (say) 73% ant the nodes are all just sitting there doing nothing while we have to ring our clients and explain that there will be ďno WIP to see today after all sorry - maybe tomorrow...Ē

Iíve given up trying to spot any pattern. Sometimes a file will fail to render one night, but go through fine the next. One node will crash every night for a week, then run for days with no problem.

This has now been loosing me sleep for several years. Weíve spent endless hours checking version no.s, swapping out plug-ins - weíve changed power supplies, fans, suspect memory DIMMs. Short of human sacrifice to the render god, Iím at a loss to know what to try next.

Now, I know there are folks out there who run render farms much bigger than ours, Dan Stubbs, Steve in Sydney - there must be others, you canít do commercial animation without distributed rendering - so what is the secret to stable, reliable rendering?

Exactly how do you set up your nodes. Whatís the cleanest minimum installation of the MAXON folder, for the sever machine and the nodes?

Does anyone else have these kind of issues?



A little info.
Four nodes are 2.8gh Pentium 4s
Four are 3.2gh Pentium 4s
All running Windows XP Pro + SP2

All connected via a Belkin 8 way KVM switch

These are all dedicated render machines with no other software installed. I monitor and control the farm from a spare mac in the studio via web browser.

Thanks for listening, Iím off for a stiff drink and a lie down.

Linds

neonghost
05-12-2006, 04:27 AM
Hi Linds,

I have been using a renderfarm since v8. Due to the wild processor market it is now a mix of AMD and Intel machines. I have never experienced the sort of erratic behaviour you've described, but I can go through the issues I've had previously.

Clients quitting > often the result of out of memory errors, or quicktime not installed
Artifacts/glitches > plugin mismatches, TP not pre-rolling, FPU differences between the AMD & Intel chips
Stalls > Camera animation radiosity

I also do not use a dedicated server, it runs concurrently with a client on a dual Athlon MP. This has never seemed to cause any issues. All my clients have a minimum of 2gb ram.

99% of the time I have been content with stability.

Vozzz
05-12-2006, 07:54 AM
Firstly i recomend (if possible) installing windows without service packs on the pc's Maybe with SP1, but in no case 2. just a bitch. The only reason i have it one of my pc's is because i want to run 64 bit, and it dosn't come without it.

Now saying the above, comes some common sense.
1) Do not plug them into the common network or the internet. Transfer files on memory sticks, cd's or dvd's watever. Scan them for viruses before transfer.

2) Now if step one is obeyed, there will be no need for firewalls or anti-viruses, or any other crap on those pc's.

3) Formatting and defragmenting pc's once in a while is a very nice idea.

4) unscrew the walls off the pc's, helps with airflow if that's the issue.


Ok and i know this might sound dodgy, but it does decrease my render times surprisingly. Once you got everything open. Press ctrl-alt-del, and terminate explorer.exe. It shoul din theory also prevent anything unnecessary launching.

Oh yeah and don't forget to disable windows update.

I think that should get you covered.

If not, well then i guess you should probably sacrifice something to the render gods. Like some people just have a negative influence on pc's. My little brother being one of them. not only is the computer that he uses laggy, and freezes often ( I used the computer for abouta week or so, had no problems) But even the computer i run stuffs up when he comes into the room. All the other time it works flawlessly.


Anyway have fun :D

Rantin Al
05-12-2006, 11:04 AM
Is your brother by any chance called Damien?:curious:

tcastudios
05-12-2006, 11:49 AM
Starting to use NET for real around R8.5 it has been rocksolid using Dual2gz Macs.
However, with the newer Quads, we have encountered problems with to fast rendering for the NET. Typically if many of the machines are rendering close parts (timewise)of the scene with very little geometry. Dann has said that it happens for him as well.

There is no "break" in the Server to handel the trafic as it looks.

Other than that we usually do as mentioned above, have the setup no connected to the web.


Cheers
Lennart

dann_stubbs
05-12-2006, 12:27 PM
I follow this forum, and the C4D forum on Postforum pretty closely, and rarely see posts from folks having problems with Cinema net rendering. Now either hardly anyone is using render farms, or they are so reliable, as to be not worth mentioning.



sorry to hear your troubles, i can only suspect your IT is not very good - that or poor power to the computers (i had to have commecial electrical lines run to my farm) as well as commercial grade cooling (which you've said you have)

as for Vozzz - i do everything the opposite of him. i have SP2, i keep up to date with udates and my farm is rock solid. the NET clients routinely run for weeks to months before i restart them - and then it is usually because i want to do an update (i only run updates every now and then) taking the sides off the PC can be bad as proper airflow is important and manufacturers don't recommend doing that.

good quality power supplies are very important and usually the consumer grade PC's (like off the shelf ones) don't have high enough quality PS to run 24/7 at least for very long.

my farm is just one month away from turning 4 years old and i've been running NET 24/7 since v7 and other then out of memory errors (which are an unfortunate side effect of windows itself) i've not had any problems.

dann

Linds
05-13-2006, 01:26 AM
Hi guys, thaks for taking the time to respond.

Hmm. well, as usual quite a lot of conflicting suggestions there... What to do?

I'm beginning to suspect internet connections. the render machines do have access to the net, but are never used except on the server machine for downloading Cinema updates etc.

I have dissabled Windows Update, but Norton s is installed on the server. Could this be the culprit?

The machines all have a gig of ram instaled. Maybe I need to think about more. Comming from a Mac background I don't fully understand the way PCs manage virtual memory (or page files as they seem to be called) but I don't seem to get any error messages that would suggest this is a memory related issue.

Exactly what files should be in the Maxon folder for the server machine, and on a typical client machine.

Linds

Vozzz
05-13-2006, 01:50 AM
Is your brother by any chance called Damien?:curious:

Wasn't last time i checked. Who's damien?

To dan: Well hey, it's all up to the user, but seeing as he's doing everything like youare right now, maybe he should try it the other way around? I jsut don't see the point of updates if you only use the pc's for rendering, most of the updates are user-end (wmp9) and security related. Why clutter your os with them, if it's not on the net and only used for rendering?

windows XP was alot more stable before all the service packs. That's 100%.

And yes the PS's do have to be pretty decent. I forgot to mention that, but i think Linds said he did replace them.

JIII
05-13-2006, 03:45 AM
Damien = the son of lucifer

-John

jackb602
05-13-2006, 03:51 AM
The machines all have a gig of ram instaled.

With my limited experience using render farms, I think that's your most likely problem right there. You can get 1GB RAM chips for around $100, if not less. Add at least 1 GB to each of your nodes, and I bet you'll have far fewer crashes.

Jack

Linds
05-13-2006, 04:01 AM
My main workstation is a Mac also with 1gb of ram. I guess I've always assumed that if I can render a frame of animation on that machine OK, it should work OK on the PC's down at the farm...

Thats one more thing on my list of things to try though. Thanks.

Linds

dann_stubbs
05-13-2006, 11:55 AM
Hi guys, thaks for taking the time to respond.

Hmm. well, as usual quite a lot of conflicting suggestions there... What to do?

I'm beginning to suspect internet connections. the render machines do have access to the net, but are never used except on the server machine for downloading Cinema updates etc.

I have dissabled Windows Update, but Norton s is installed on the server. Could this be the culprit?

The machines all have a gig of ram instaled. Maybe I need to think about more. Comming from a Mac background I don't fully understand the way PCs manage virtual memory (or page files as they seem to be called) but I don't seem to get any error messages that would suggest this is a memory related issue.

Exactly what files should be in the Maxon folder for the server machine, and on a typical client machine.

Linds


i have virus protection on my PC server(s), i also keep virus protection on one render client (mostly for testing and to be a litumus check on the render slaves exposure) i have my farm on my network and they can see the internet for updates (granted i have a pretty good firewall between my LAN and WAN) i ran a PC server for the first two years and just for my own preference for unix moved it to an OSX server two years ago.

i leave windows update on - but i have it set to let me decide when to install - just so it can keep the updates downloaded and ready for when i want to - it is no fun waiting for 30+ computers to download those huge updates when you want to do an upgrade.

i ran with 512mb for two years with no crashes (other then out of memory due to windows) the past year i had 1gb and now i have 2GB - no changes in stability the main speed increase being the flushing of VM between big render jobs.

my installs are the basic maxon install - i don't go taking stuff out since it really is just a few MB saved and that is pretty irrelevant in these days of GB. i've also got twenty or more plugins in each one (list in on my FAQ) - which is my biggest concern of conflicts but so far so good - have you tried going over your plugins and maybe you have one that is causing it? (that has been a big cause of C4D instability in the past for some)

i don't know what to tell you - has anybody messed with windows or the settings? that is the number one thing to cause issues - these things are commoditys now, like a toaster - plug them in and for the most part they will work - but there are people (some users, some IT) who LOVE to mess around with settings they don't know what they do and eventually they create the problem - i've got a couple users at my day job who do this constantly and they are the only two who ever have system problems... despite multiple different computer and OS upgrades over the years the "troubles" stay with them. and after discussing enough i can get them to "fess up" and say they've been messing with settings of this or that. one even locked out his monitor controls just pressing around all the buttons on it - yet he called all panicked one day about it totally forgetting that HE did it.

i'd say do a clean install on one of the PC's - don't mess with anything you don't need to and as time goes on keep notebook next to it where you write down every change you make (not all at once) - if it goes from stable to unstable you should have a record of where it went wrong.

but i suspect plugins right now - so maybe you should write a list and all the versions and do some checking to make sure they are all ok. (but i've also heard not so many good things about norton - so i'm suspect there, but that would only affect your server in that case)

dann

lllab
05-13-2006, 01:47 PM
i also have several pcs as renderfram, they where always very, very stable.
all are winxp sp2, no matter how much run, no crashes.

havent used AR since month, but at least until then it was rock solid under all conditions, even if they had nt enough ram, they gave just a appropriate massage but no crash.

maybe it is a plugin?
or a wrong setup for the network?
maybe very bad machines again?
bad ram?- this is very critical, only use brand name highend ram!!! 1gb is enough, better 2 though.

i guess it has nothing to do with netrendersetup, you cant really do something wrong, but with one thing above listed.

hope you find a solution, but i would get a better it consultant, this sounds very abnormal.

cheers
stefan

Linds
05-15-2006, 11:04 AM
A progress report.
(Mainly because Iím sitting here at midnight on my own, nurse-maiding the render machines because I have a bunch of clients coming in tomorrow for final approvals on a TVC, and I have nothing else to do but wait and watch...)

Anyway, thought is spend the morning having a real good look at the server machine.
I want through everything and deleted absolutely anything, including apps that wasnít required for rendering.

I went through Norton Internet Security and disabled Automatic Updates, and anything else I could find that accessed the net.

I deleted the whole Maxon folder, then defragged the disk. (It was badly fragmented, and there was not much free space - Maybe this could be is it I thought.

Did a clean install of Net Server. First V9, then 9.5, then 9.512.

Removed all plugins except Lumen and ZBlur

All started great. All clients showed up, and I set a scene file rendering. After seven hours, no crashes or shutdowns, and Iím starting to congratulate myself when...

At 70%, the job just drops off the render queue, into Inactive Jobs. Nothing else wrong, no error messages, no ďout of Memory/Missing TextureĒ. Nothing in the event log. Just stopped.

WHY WOULD IT DO THAT???

Sorry...sorry..quietly...

Why would it do that?

Iíve set it going again, and it seems to be fine, but I darenít leave it. I know the moment I turn by back it will fall over again.

I donít really expect anyone to solve this for me BTW. Just venting.

Night all.

neonghost
05-15-2006, 11:11 AM
perhaps try sending a file that is known to knock out the server to Maxon for analysis.

AdamT
05-15-2006, 02:33 PM
In case this hasn't been mentioned, do you have QT--and the same version of QT--on all machines?

JeremyW
05-15-2006, 03:26 PM
I believe multi-pass is known to be problamatic on NET.

AdamT
05-15-2006, 04:06 PM
I believe multi-pass is known to be problamatic on NET.
I've not had any problems with it.

Curugon
05-16-2006, 12:00 AM
Anyone having problems with the Client after upgrading to 9.6? After I upgraded, the Client won't launch anymore - it instantly quits itself. Have tried this on multiple systems with the same result.

This is just on Macs by the way - on PCs the upgrade seems to work fine.

Linds
05-16-2006, 02:09 AM
In case this hasn't been mentioned, do you have QT--and the same version of QT--on all machines?

Hmm. good point Adam, pretty certain but I'm off to check now.

BTW, been meaning to ask, but do you have any plans to change your headgeer for something a bit more seasonal, it being May and all? Or are you just in a permenently festive state of mind?

Linds

AdamT
05-16-2006, 03:49 AM
Well, with hurricane season rolling around I might replace it with a blue tarp. :)

sketchbook
05-16-2006, 02:00 PM
i had these type of issues on some animations i was rendering a bit ago. bought a brand new quad G5, all 4 of my macs had lots of ram (>6GB), and the new quad actually was crashing the client almost exclusively. it was very sporadic. on macs to be on the safe side i always restart before a big render, and that is the safest way to go, but the quad has been less stable for whatever reason.

the other 3 machines were solid.

i hope maxon has bucket rendering in the next release for net render. something easier than tiled camera. might just make myself a farm.

Linds
05-17-2006, 01:41 AM
I don't have experience of any other render admin sofrware, but I always wish NetRender would give a bit more information as to what is going on, and particular, more comprehensive error reporting and logging.

NetRender hasn''t really been touched since it was launched, except for the reset buttons, which were added for 9.5. Perhaps we will see a bit of a makeover for version 10.

Linds

krelnarb
04-13-2007, 11:17 PM
Sorry to bump such an old thread but it fits a question I had.

Shouldn't you be using a hub instead of a KVM?

I'm setting up a home (ie hobby) farm.

Did you ever sort yours out Linds?

Cheers

sonic-blade
04-15-2007, 01:57 PM
Hi guys, thaks for taking the time to respond.

I have dissabled Windows Update, but Norton s is installed on the server. Could this be the culprit?

Linds

sorry for being rude but Norton is a bitch and i had problems related to it on every machine where it was installed.

beside that point (wich is just my personal opinion/experience) it could be worth taking a day to stress-test all render-nodes (runing cinebench, memtest etc. multiple times)
jm2c
Volker

CGTalk Moderation
04-15-2007, 01:57 PM
This thread has been automatically closed as it remained inactive for 12 months. If you wish to continue the discussion, please create a new thread in the appropriate forum.