CGTalk > Development and Hardware > Technical and Hardware
To minimize the ads you see on this page create a CGTalk account and log in HERE
Thread Closed share thread « Previous Thread | Next Thread »  
 
Thread Tools Search this Thread Display Modes
Old 12-05-2012, 03:07 PM   #1
Cryptite
Speeds by Surface
 
Cryptite's Avatar
portfolio
Tom Miller
Technical Director
Dallas, USA
 
Join Date: Jun 2003
Posts: 513
Send a message via AIM to Cryptite
Lightbulb Switch Traffic Theory

This is a call for advice on switch traffic theory for a small-sized (~13 artists) VFX studio. For the most part, we do fine, but we have a couple (read, one big) problems that cause bandwidth choking for us, and I'd love to get some ideas from you guys about how your studios run and what has and hasn't worked for you guys in the past.

Here's the setup for us:
  • Roughly 5 fileservers (we'll shorten to FS for the rest of the post), one main one that serves all of the production files, the other 4 are rare-use and can be legitimately forgotten about for the time being.
  • 60 Render Farm Blades; our 20 fastest render our Fusion/Nuke comps (where the problem lies which I'll get to in a moment).
  • ~13 Artist Workstations presumed equal in speed, but this is a bandwidth issue, so it doesn't much matter here.

The switch setup is the following:
  • 40 render blades, all 3d renderers, no Fusion rendering are on switch A
  • The other 20 aforementioned comp/3d rendering blades are on switch B
  • Switch B is wired to Switch A with 2 CAT6 cables.
  • Switch A then routes traffic from both itself and B (daisy-chain, if you will) to the "Main Switch" through 4 CAT6 cables.
  • The "Main Switch" hosts all of our file servers and workstations as well as all of the traffic from the render farm switches.

The main bandwidth choke point is when we launch Fusion jobs to the farm. While we only have 10 render licenses for it, when 10 of our fastest machines (or probably any 10 for that matter) get to rendering Fusion; because of the constant pull of Very Large EXRs as well as how fusion handles EXRs (poorly, but that's a topic for another day) nearly everybody on the network gets a reasonably noticeable (our AE guys complain the most) network hit and pretty much just have to work through it until those jobs finish. No other jobs seem to have this problem as all of our 3d scene files and assets are copied locally before they render.

If it needs mentioning, we use Royal Render as our render farm manager.

Also, Nuke is new to our pipeline and we're slowly introducing it. We know it handles EXRs better, but is it orders of magnitude better in that this problem may not even exist anymore once we've fulled transitioned?

My question to ye vfx types is how does your company route your network traffic? We've had many internal theories about trying to separate switch connections so that the farm and the main FS can talk to each other directly without going through the same switch that the workstations go through. Either way, we know something needs to change; we're just not 100% certain on what that is.

Also for extra credit, how do you host your files in terms of production assets and renders for compositing? Do you keep them all in the same project folder? Do you separate your renders on an entirely different FS so that they may be accessed without affecting the production FS? Do you all use SSD's? Do your computers sit atop wireless unicorns?

Thanks in advance!
-Crypt
__________________
Solitude: $.changethe****ingstartframe = 1600
Solitude: fun fact: that's the script for everything
 
Old 12-05-2012, 03:41 PM   #2
DePaint
Banned
 
DePaint's Avatar
portfolio
Emre M.
Istanbul, Turkey
 
Join Date: Jun 2008
Posts: 1,156
You may get better help asking on forum dedicated to IT Networking, like here:

http://www.daniweb.com/hardware-and...CFYqvzAodITEAHg

Google for "Networking Forum" or "IT Networking Forum" and you'll get a bunch of places where the Networking Pros hang out...
 
Old 12-05-2012, 07:19 PM   #3
cojam
Banned
portfolio
chris
varible, Afghanistan
 
Join Date: Nov 2012
Posts: 177
Quote:
Originally Posted by DePaint
You may get better help asking on forum dedicated to IT Networking, like here:

http://www.daniweb.com/hardware-and...CFYqvzAodITEAHg

Google for "Networking Forum" or "IT Networking Forum" and you'll get a bunch of places where the Networking Pros hang out...






Or hire a professional?
 
Old 12-05-2012, 08:24 PM   #4
MDuffy
Trained Monkey
 
MDuffy's Avatar
portfolio
Michael Duffy
Prog. Dept. Manager
McKinney, USA
 
Join Date: Aug 2002
Posts: 761
Maybe have a cache server on the same switch/subnet as your renderfarm, and update the cache as the first operation of the render job. Some caches can be set up so they refresh on file request, making it a bit more transparent at the expense of a few more calls back to the main file server.

That way you will only pull new/changed content over the users's switch, and the farm can hammer its own cache and switch as much as it likes without the normal humans noticing. You can also even out network access by staggering the start times of the jobs a bit so they aren't all requesting big files at the same time, and some renderers/compositors also internally stagger the order they process nodes (if possible) to even out file access as well.

Cheers,
Michael
 
Old 12-05-2012, 11:29 PM   #5
sentry66
Expert
 
sentry66's Avatar
portfolio
node crazy
USA
 
Join Date: May 2008
Posts: 1,980
switches that support 10 gigabits
network cards that are faster than 1 gigabit
and possibly faster hard drives would help
 
Old 12-06-2012, 12:46 AM   #6
olson
Houdini|Python|Linux
portfolio
Luke Olson
Dallas, USA
 
Join Date: Jan 2007
Posts: 2,889
A cache server like Duffy has suggested could improve the felt hit for the rest of the artists. There are specialized caching servers available that can do this transparently or commodity hardware could be used if the pipeline handles the caching updating and path changes.

I don't think that's the best solution though because it sounds like the production needs have simply outgrown the network and file servers. Even if you have the ideal caching setup there's still only so much bandwidth on the network (assuming we're talking gigabit Ethernet here). All of the compositing render nodes on switch B are getting less than 250 MB/s between all of them (two gigabit Ethernet connections from switch B to A). If the file server is on a single gigabit Ethernet connection it will be able to offer less than 125 MB/s.

It might be time to get a new switch with some 10 gigabit Ethernet ports and a new file server or file server cluster with 10 gigabit Ethernet as well. If you look at the numbers no matter how you configure things with gigabit Ethernet and multiple switches there will always be severe bottlenecks on the network. If the budget won't allow for any major upgrades the best option would probably be to put a cache server on the switch of each group of nodes as needed. For example put a cache server on the switch with the compositing nodes.

If nobody has done this yet look into monitoring the network traffic and file server traffic. In the past I've used Cacti for this which supports many switch models and all workstations and servers when configured properly. It will keep track of what uses bandwidth and when along with lots of other useful information so you can see exactly where the bottlenecks are coming from. It logs everything too so you can look back over the last month or two and correlate things with stuff from the render queue.
__________________
http://www.whenpicsfly.com
 
Old 12-06-2012, 05:00 AM   #7
tswalk
Lord of the posts
 
tswalk's Avatar
portfolio
Troy Walker
USA
 
Join Date: Jan 2012
Posts: 708
Thumbs up

it sounds like to me you are using a single VLAN, so no matter how you daisy those switches... the broadcasts are going to kill you... but i'm guessing on that, cause you didn't really describe your segmentation, only how to cabled them. and by default, everyone will be on a single VLAN.

if it were me, i would segment the backend traffic on a different VLAN if you have multiple NICs on the blades (one NIC for VLAN1 and another for VLAN2). so when you launch a job, it runs on VLAN2 (which in theory can be on the same switch).. but optimally since you have 3 switches.. you could dedicate that VLAN2/switch for this "backend" and it will keep your client and other server segments clear of traffic.

also, if you choose too (and have the capability), with that File Server, do the same with bridging between the VLANs.

I would also segment my client workstations on another VLAN with your "main" switch.

I'm just guessing a lot, .. no idea what kinda of switches you have or blades/ servers, etc...

so, if this fixes the problem... does it mean a free lunch at Sonny's or just a beer at the WestEnd Pub? and if you really want, i got a CCNA in Keller, and another living in Flowermound... i'm however, out here in Mansfield.. they're great guys I use to work with up at Nokia.

here's a diagram (from what I think you've described versus the segmentation):

http://sdrv.ms/11XIn3S

.. right now, i'ld probably take the beer..

Last edited by tswalk : 12-07-2012 at 12:43 AM.
 
Old 12-06-2012, 05:00 AM   #8
CGTalk Moderation
Lord of the posts
CGTalk Forum Leader
 
Join Date: Sep 2003
Posts: 1,066,481
Thread automatically closed

This thread has been automatically closed as it remained inactive for 12 months. If you wish to continue the discussion, please create a new thread in the appropriate forum.
__________________
CGTalk Policy/Legalities
Note that as CGTalk Members, you agree to the terms and conditions of using this website.
 
Thread Closed share thread


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
CGSociety
Society of Digital Artists
www.cgsociety.org

Powered by vBulletin
Copyright 2000 - 2006,
Jelsoft Enterprises Ltd.
Minimize Ads
Forum Jump
Miscellaneous

All times are GMT. The time now is 10:34 PM.


Powered by vBulletin
Copyright ©2000 - 2016, Jelsoft Enterprises Ltd.