lower Titan performance on MBO - looking for advice


#1

Hello,
I have this problem for a while now but was too bussy to properly test and address.
So.

configuration 1 have Asus p9x79-e ws motherboard, i7 3930k and 4x EVGA Titan SC
configuration 2 is on Asus maximus VI hero motherboard, i7 4770 cpu and with 1 Asus 780 card, and another zotac 770

I’ve noticed in cinebench and using an testing script in Softimage that 780 outperfoms titan by a lot! Not in RS rendering but viewport performance.

Cinebench r15 (ignore SLI as it is not used in these test at all)
Titan: score around 90
780: score around 140

Softimage:
Titan: around 200fps
780:around 250fps

Finaly today I;ve got some time and moved 1 titan into 2nd configuration.
On that MBO titan got cinebench score of 140 as well and while in Softimage viewport it hit around 310fps

Now what it seems to be is that this MBO in configuration 1 is limiting factor, as I really don’t think so that CPU alone can make such huge difference in these cases.

To have thing weird, on configuration 1 with titans MBO have all 4 PCI 3.0 @ x16 (confirmed with GPU-z)
2 configuration have additional 770 card for GPU rendering but that lowers PCI down to x8 on both lanes.

All in all I would expect two times more expensive WS MBO and titan combo to outperform or at least have same result than simple gaming board.
But I may be wrong?
Any ideas what could be happening here?
Anyone else around with p9x79-e ws MBO to compare results?

Now when GPU rendernig they performn great, Titans I mean, but I wouldn’t spend so much money on both motherboard and cards just to have such big performance hit.

Would appreciate any help or ideas.
Thank you


#2

Now what it seems to be is that this MBO in configuration 1 is limiting factor, as I really don’t think so that CPU alone can make such huge difference in these cases.

The CPU is making all the difference. The GPU can only render the frames as fast as the cpu gives the data to the GPU. With cards as high end as the 780 and titan, the cpu will almost always be the limiting factor, your computer is just a series of bottlenecks, the best upgrade is always to open up whichever component is causing the biggest slowdown. For 3D applications, even when testing the realtime viewport performance, that’s almost always the CPU when using any gfx card over £300.

The PCIe lanes will make little difference as the cpu isn’t flooding the GPU with new data, it just can’t send enough to keep the gpu busy.

In a nutshell, the 4 core cpu is 2 years newer and is running at a faster speed, it will be faster at single threaded stuff.


#3

You are right about CPU BUT in this case, 3930K is not that much slower.
Forgot to mentioned that even OCing 3930 up to 4.5 Ghz made no difference, actually made so little difference that I don’t wanna risk stability over such poor performance increase. It is water cooled so no problem with that but still…
So even with OC 3930k there is still huge performance difference.
Still, wondering of some of newer 4960x or 4930k would made such big difference, but after OC-ing 3930k and still seeing such low performance I donl;t think that is biggest issue here?


#4

I would be surprised in the extreme if a 3930K bottlenecked a Titan at all compared to a 4770k.
There are plenty benches of 3930Ks clocked silly running SLI Titans at 85% effectiveness vs 75% (SLI) non OCed, and 95%+ single Titan without OCing or odd hacks (or without OC and a bus hack in case the mobo is one of the many wonky models).

More likely a combination of revision, bios version and bios settings is at play.
The X79 is also known to have seen a ton of mobos plagued by the most variant and oddest timings, but also to be safely hackable, with a patch from nVIDIA itself nonetheless:
http://nvidia.custhelp.com/app/answers/detail/a_id/3135/session/L3RpbWUvMTM0MDIyMzU2OC9zaWQvaDEzbE45X2s=

Mirko, you have to check the load and find more precisely where the bottleneck is.
Just those two tests with a frame rate attached don’t tell much of a story. Have you monitored the videocard during those tests?

Anyway, look at updating and tweaking the bios and apply the pci bandwidth hack, you might simply be bus starving the card.


#5

thanks will check about that patch. I saw someone mentioning patch before but I thought it is an old thing and solved so far.
Bios on MBO is latest version updated recently.
Also gpu-z test shows that all 4 titans are working on PCI 3.0 @ x16.
When using Auto in bios for PCI it was down to PCI 2.0, manually setting to GEN3 shows like full PCI 3.0 @ x16
Now figuring if it is safe to just run patch and see what happens…
Also I’ve noticed taht other bench tests shows a bit lower then expected performance of titan, like across all other gaming benchmarks. 3d mark, valley etc… getting ok results due to all 4 of them but still was expecting a bit more…


#6

Oh wait a second, I hadn’t noticed you listed 4x Titans. Sorry, but do you mean you have a quad SLI on that box?
In that case, are you aware that it will most likely perform slightly slower than a single one on practically everything except GPU rendering (where ideally they should be decoupled and not SLI anyway)?
To compare apples to apples you should test with just the one card on all of them.

What in a blazing hell do you need a quad SLI Titan for, man? :slight_smile:

You are very likely to bus starve them on practically any CPU.


#7

force-enable-gen3 didn’t do anything :slight_smile:

yes 4 of them for GPU rendering, using Redshift and it is amazing.
For that Titans eats everything and outperform 780. (single GPU used for render test)

But both Cinebench and Softimage viewport are using only 1 card.
Just did some testa nd during cinebench GPU is used only like 30%
In softimage it is approx 70% GPU usage.
Well I guess only way to know is pull out allother titans leave one and test it…
Ok well lets do that just to take that out of way as well :slight_smile:
Be right back… just to let tehse cool down a bit


#8

no luck,

removed everything but 1 titan, run same tests… results were only marginally better, like 10-20 frames faster in softimage and maybe 5fps higher in cinebench.
GPU % usage identical.
if it was due to number of cards I would expect at least a bit bigger difference :frowning:

it is not that it isn’t good but it bugs me what is going on.


#9

and now after OC, asus auto tune, 4.5 for CPU getting some better results btu still not as good as on another MBO. So I need to push 3930k at 4.6Ghz to get slightly lesser results then on 4770k default clock on another mbo :slight_smile: still suspicious a bit but, could it be…
Upgrading frmo 3930k to 4930k or even 4960x would be pretty costly especially as I would have perfectly good 3930k CPU that I cant use anywhere and still not sure if it would use maximum potential of the setup.
so after OC

cinebench 96 fps (it was 45 with SLI enabled, 96 SLI disabled :))
softimage ~250 fps

again it was around 300 fps on maximus VI hero and 4770.

So it is a question how much if at all difference would be if 3930k is replaced with 4930k or 4960x…


#10

[quote="mirkoj
So it is a question how much if at all difference would be
if 3930k is replaced with 4930k or 4960x…

[/quote]

Do not do that bro’, they are pretty much the same procs, even in CPU only tasking, so in the viewport the difference will be even lower.

I can not test it, as it is not in PC anymore, but if Quadro 4000 /non-k/ feats your needs you can PM. The only way is to find some test over internet, or to find some user with softimage+quadro 4000.


#11

The difference isn’t the CPU itself, which, as far as dealing with the videocards, don’t really have much of a gap. The difference is largely chipset and bridge bus handling based, but it still doesn’t make a lot of sense.
You might very well find changing the mobo might fix it even on a lower end CPU (for the sake of argument, I’m not recommending you do).
I got to ask though, since you have a 4770 around already, why not convert that into your GPU rendermule (it will also consume a chunk less power) and turn the 3930k into a gaming rig?

If you were to upgrade solely for the sake of unthrottling the GPUs, there would be no point in going for a premium piece like the 49s.
That aside, I remain somewhat puzzled by those results. Have you compared your GPU rendering times as well to see to what extent the bottlenecking is actually affecting you?


#12

I’ve found Cinebench Open GL can easily vary ±15fps based on drivers. Are both updated to the latest ?
My 3930k @4.4ghz with GTX 680 gets 116fps with the latest drivers.
At work I have 4930k @ 4.2ghz on the basic p9x79 board with 770gtx & get similar scores.
But both have had less than 100fps recently on different drivers.


#13

Both computers are fully updated latest bios and drivers for everything.
Reason for not trasnfering cards to 4770 cpu comp is simple.
Asus maximus VI hero can’t support all 4 titans :slight_smile:
p9x79e-ws is… or should be perfect MBO for GPU rendering as it supports all 4 PCI lanes at speed of x16

https://www.asus.com/Motherboards/P9X79E_WS/specifications/

7 x PCIe 3.0/2.0 x16 (single x16 or dual x16/x16 or triple x16/x16/x16 or quad x16/x16/x16/x16 or seven x16/x8/x8/x8/x16/x8/x8, black+blue) *2

Maximus hero can only do SLI.

In rendering Titan outperform 780 by expected margin. Rendering using single GPU for comparison gives expected results. Rendering using all 4 of them also scales great so no issues there as well.

I don;t have problems in view port like that it is slow or something it is just getting a lower results then 780 and I’m wondering what is breaking it. Not that it would give too much difference really but I’m curious to see what is going on.
I had similar conclusion that CPU itself wouldn’t help much and it would be too big investment… would rather add more GPUs to couple of my other computers and add them to rendering as well.

Replacing 1 o titans with quadro 4000, even K version would make huge hit on rendering performance as titans are waaay faster.

Waiting from someone on redshift forum with same MBO to check results.
But still strange…


#14

just for the fun of testing I’ve disabled hyper threading.
And funy it does give just slightly better result…

Softimage viewport test is now at ~240, cinebench around 108fps, and arion render bench got just a bit better score but nothing groundbreaking.
CPU still at OC so HT on or off doesn’t make any diference


#15

did some more test and is seems to be taht PCI3 is not enabled at all no matter what I setup in bios.
here are couple test done with cinebench, arion GPU renderer and screenshot of GPUz full screen render test to give info about PCI.

First I went to BIOS, turned all PCIE to Auto.
After restart here are results:
https://www.dropbox.com/s/7g23r4vwye6vjiv/PCI_auto.jpg

Then I used that old PCI3 nvidia patch and reboot. It says that new cards supports everything already but just in case:
https://www.dropbox.com/s/z7ofyls6blk2kn0/PCI_auto_nvidiaPCI3patch.jpg

Finaly, I went back to bios, selected GEN3 for PCIE slots with titans in it and results are here:
https://www.dropbox.com/s/yw7h5hjv0jklseh/PCI_GEN3.jpg

So it really seems that there are no changes at all even worse results with PCI3. Wondering if it is actually turned on and used at all?
Also to pint out that bios is latest version for MBO and latest graphic drivers as well.
Any ideas around??


#16

Ok finally got some time to do this test.
1st configuration is: p9x79e-ws, 4 titans, 3930k, 32gb ram

2nd config: maximus VI hero, asus gtx 780 direct cu II, 4770k, 16gb ram

1st one will be there as extra info as I have problems with PCIE 3.0 so I can compare it with results from 2nd one.

Will run cinebench r15, softimage viewport test, and Arion render test.
Finally for RS I found some old restaurant scene someone shared here.
On 2nd config will do round of tests with PCIE 3.0 and then turn it down to GEN2 in bios.
Also SI tests are run for both configs on same resolution 1440p so no differences there.
Everything is at default stock clocks.
And I couldn’t make Arion use only 1 titan even after using right strings so Arion result is 4 titans used for 1st config.

1st config:
Softimage viewport: ~220fps
Cinebench 1st run: 95.27 fps
Cinebench 2nd run: 94.09 fps
Cinebench 3rd run: 92.98 fps
Arion bench: 7862.46
RS 4 Titans: Rendering time: 317.579 s (4 GPU(s) used)
RS 1 Titan: 1122.88 s (1 GPU(s) used)

2nd config PCIE 3.0:
Softimage viewport: ~258 fps
Cinebench 1st run: 133.32 fps
Cinebench 2nd run: 131.26 fps
Cinebench 3rd run: 126.44 fps
Arion bench: Bench value 1844.73, 02m42sec
RS 1 780: Rendering time: 1272.96 s (1 GPU(s) used)

2nd config PCIE 2.0:
Softimage viewport: ~258 fps
Cinebench 1st run: 134.66 fps
Cinebench 2nd run: 128.86 fps
Cinebench 3rd run: 125.75 fps
Arion bench: Bench value 1840.14, 02m43sec
RS 1 780: Rendering time: 1297.49 s (1 GPU(s) used)

With results nearly identical I went to GEN1 as well
2nd config PCIE 1.0:
Softimage viewport: ~254 fps
Cinebench 1st run: 137.59 fps
Cinebench 2nd run: 135.29 fps
Cinebench 3rd run: 133.96 fps
Arion bench: Bench value 1827, 02m44s sec
RS 1 780: Rendering time: 1286.73 s (1 GPU(s) used)

Sooo, PCIE GEN1 GEN2 and GEN3 setup in bios confirmed with GPU-z. We can freely say that there are no differences what so ever. Now any ideas what is going on? Also this puts on hold my idea to upgrade from 3930k to 4930k as another 500 EUR for no improvement what so ever… no thank you.

One more test that I will do is to again take 1 titan and put into maximus board with 4770k and do new round of tests with PCIE3.0.
I have 875W PSU in that comp so not sure that I would wanna play with 2 titans in tehre testing difference between PCIE3 x16 and x8
Having in mind that same results that I have up there with Titan I had regardless if there is 1 titan in computer or all 4 as I tested that couple days ago (Talking about cinebench and SI viewport test ofc, rendering is different pair of socks) I’m not sure if there will be any difference in PCIE2 and PCIE3 with 4 cards as I obviously can’t test that here unless I get new CPU smile

Wondering if there is anyone else with p9x79-e ws or similar 4 GPU board that can do same bios setting GEN test.
I recently did run into an forum where guy was testing difference with 4 cards and there was pretty big improvement with PCIE3 over 2 with higher resolutions. But so far can’t confirm that at least when testing with 1 GPU only.
So that is all for now, as soon as I get titan to 2nd config will do another round of tests just to see how much performance hit I have with cards in config 1 compared to config 2.
And yes I do wanna use them to maximum, paying premium price for all components to have sub performance at the end is not cool at all :frowning:


#17

Im not surprised the pci generation makes no difference, remember thats just affecting the bandwidth when sending data to or from the card. Seeing as most apps will send the data to the card pretty fast anyway, the card will run these tests entirely self contained, very little data will be sent back and forth between the cpu and gpu.

If you like, you can practically think of the pcie gen as a bottle neck for loading times, only making a second or two difference at the start of a test


#18

That does make a sense.
But it still leaves the question why does my Titan or all of them performing a lot slower then most online test I saw, and also slower then when I get one of them into this other comp :slight_smile:


#19

That, I have no idea.

All I could suggest is different driver versions, different software versions, different driver settings, something else taking away GPU time (windows aero interface, bitcoin mining trojan…) or something else lagging out the load times such as missing motherboard drivers (often dont show up in hardware manager, run the motherboard driver install disc to be sure)


#20

performance difference can vary for lots of reasons. Benchmark app versions - gpu driver versions - hardware bios versions etc etc…

Also not all GPU’s are equal. Often times vendors give cherry picked hardware to reviewers, that are more stable or are OCed, what not. It’s show off business after all.

Your own test scenes can also not be a healthy measure, if they were created and tested with older versions of your renderer, on hardware with older drivers. Scenes created with older Octane renderer versions, render slower with version 2.0 for example.

Another thing, AFAIK there are mainboards that support 4 PCI-E X16 lanes, but none of them support all of them at those speeds at the same time.