Vray - multi-threading


#1

I have several hundred frames of hair to render soon, so I’ve been looking into optimizations, one of them being that if I use half my available cores (12 physical / 24 virtual) on a render, it does not take twice as long. Therefore, running 2 copies of maya/vray, each running 12 cores, finishes more rendering in the same amout of time. I’ve done this a few times, and all the render times show a benefit.

But I didn’t realize how much of a benefit there could be.

My results on the latest scene tests;

24 threads = 7min.

12 threads = 8min. 40 sec.

8 threads = 11min. 50sec.

So even using just 1/3 of the available cores did not take twice as long, much less 3 times - so 3 copies of maya/vray will render 3 frames in 12 min., where one copy with 24 cores would only finish a little more 1 and a half frames.

4 threads = 21min.

At 4 threads it’s even more drastic; that’s 1/6 the cores, but only takes 1/3 as long to render a frame. If I had enough ram to run 6 seperate renders, my machine would literally get twice as much rendering done.

But obviously it depends on your scene and your hardware.

This is using vray 2.4, I hear 3.0 is much faster, maybe he’s improved the multi-threading. But in the meantime I’ll be splitting up my renders as much as my ram allows-


#2

Most big studios run 4 render jobs per machine in parallel for this exact reason. There’s often a lot of single-threaded calculations at the start of each frame.


#3

I know, I just didn’t know how drastic an effect it had, that it would be so practical for hobbyists or indepedent artists; even more practical than for big studios… if I had a little more ram I could double my rendertime! :keenly:


#4

This is cool so I took a bit of time to make a script for V-Ray Tuner that would help with this. It splits the command line render jobs up with a prompt on how many jobs to split it up into for the current machine – any recommendations on the math there?

Once I’ve worked out the last bit that cleans up leftover frames for non-evenly divisible frame counts, I’ll post it.


#5

So I just realized, after writing out this script :argh: , that it’s best to use a render manager like Deadline in this case to populate your machine’s jobs since you can never be sure that all frames will take the same amount of time, so you will have idle cores. If you were rendering 100 frames of the exact same thing – unlikely – then it would make sense but otherwise, scripting this intelligently involves basically writing a job manager from scratch. With Python 3.2’s thread pooling, that would be pretty easy though. I made an image converter I use from some basic template code I got online: http://polygonspixelsandpaint.tumblr.com/post/15187344510

Unfortunately, Maya doesn’t have Python 3.x as an option and V-Ray Tuner is written in MEL, so that’s the end of this one for me. Here it is in action:

https://vimeo.com/115947513


#6

I didn’t mention it before, haven’t tested it yet, but, who says you really have to set the jobs to fewer threads?! It should be even better to leave them all at the max thread count, so if any job is using less than its maximum, the other jobs will take advantage of it right away.

What’s Vray Tuner?


#7

(So I think your splitting tool would still be awesome!)


#8

well, I can’t imagine that it would be good to have CPU cores fighting over CPU resources. That would incur a performance hit I think. Then you have to think about distributed rendering. So a job manager that dispatches an ideal amount of jobs with correct threads, would be the best way to do this.

V-Ray Tuner is my script toolbox that I wrote to add a panel to Maya to access a ton of instantly-accessible V-Ray features and custom scripts to the Maya UI:

http://www.creativecrash.com/maya/script/v-ray-tuner-for-maya


#9

I used to overlap full core render jobs all the time on the same machine based on OS priority. IMO it works better than having segregated core counts for each job. Say one frame/pass gets done before the other, why should the longer render have to sit there and only use a few cores when it could use the remaining cores and finish sooner?


#10

ya I get that part. Anyway, I can still post this update to V-Ray Tuner once I get the remaining frames jobs added to the batch.


#11

ok - I wrote the final part of the script where it appends the straggler frames to the last job script so it should be ready for testing. It also uses all available threads for rendering each job so there will never be any idle threads. It’s only a variation of a script I know works on all three platforms so it should work but let me know if there are problems on Windows or Linux. Here’s the beta of the script:

http://www.can-con.ca/tumblrpics/vraytuner.mel

To access the render script, it’s in the Render menu at the top: “Split animation batch render into x concurrent jobs”


#12

Bueller? Bueller? Bueller? Bueller? Bueller? Bueller?


#13

I didn’t get any notification of your post - maybe no one else did either - I just decided to check back.

Can’t wait to try this, thanks!! I’ll let you know how it goes (OSX)


#14

thanks - I wrote it and tested it on OS X so I don’t think you’ll have any issues there.


#15

So I did a test with 8 frames of animation with BF/LC rendered scene (SSS character) and here’s the result:

  • 1 job (8 frames): 59m 36s
  • 2 jobs (4 frames each): 1h 2m 50s
    – 4 jobs (2 frames each): 1h 3m 32s

So I think this split-job thing needs more data to back up that it’s beneficial because it’s not here. V-Ray 3 on OS X 10.10.1. Embree enabled


#16

Ah well like I said, maybe he’s improving multi-threading for 3.0. I’m using 2.4.

I bet it also depends a lot on how much pre-processing it needs to do. If you have no sss, hair, deformation, or irradiance/light caches, just raw geo and brute force GI, vray probably goes to full multi-thread almost immediately. I’ve been meaning to test a scene like that. My test from above had hair, motion blur, deformation and light cache.

About your plugin - I ran it, no errors, but I can’t find anything that it did. Does it create a menu or button somewhere? It didn’t seem to change any vray settings -


#17

with it in your scripts (not plug-in) folder, just type vraytuner; and it will load a new V-Ray Tuner panel at the left of your Maya UI.

It doesn’t change anything by default. It’s not one of those “1-click fix” type scripts. It also doesn’t create any dependencies like a plug-in can. I suggest you read the read me and look at the vimeo links in there to understand the features.


#18

Ah, got it. I was confused when you said “read the read me and vimeo links in there” because you only linked to your .mel, not the .zip, which I’ve since found on your site.

Very nice!! I don’t even know enough about vray to use all the features you have for it, and I’m no amateur. Are you already aware that docking the render view makes the main window resize, whenever you hit ‘render’? I’m betting that you’re tackling that annoying bug that keeps vray from closing the render view, bravo, it’s such a pain.


#19

you need to disable Auto Resize in the Render window to fix that. But V-Ray 3 automatically removes the Maya render window so you don’t need that anymore.


#20

Bit of a misleading statement.
Rendering Engines with inefficient threading aren’t the only reason, or even the chief reason, why that’s done.
Peaking memory and keeping as many nodes dark as possible is a much bigger factor in making that choice.

It’s also a bit of old news, mostly coming from when everybody and their dog used RenderMan, which has always had, and seems to keep having, the poorest and most wasteful multithreading possible.

Studios that moved to more modern and better scaling models and engines don’t do it nearly as frequently.

The scaling outlined in the original post is actually quite poor, so it’s either a particularly tricky scene, due to its nature or possibly even poor settings choice, or there’s a bottlneck somewhere (which might be anything from thrashing the memory to storage bound issues), or VRay has work to do (and apparently they have done a lot of work to scale better in recent times).

Lets not keep propagating relatively dated knowledge as verb, the most modern models and engines don’t have “a lot” of single threaded heading any longer, or not very often at least in the context of rendering most frames of a movie :wink: