PDA

View Full Version : New Coproccesors = Desktop Renderfarms?


mustique
11-13-2012, 12:56 PM
Nvidia, AMD and Intel recently anounced their new coproccesors bringing more than 1 Teraflop of DP compute power on our desks.

We know that there are new renderengines out there for the GPU but Intel has anounced the Xeon Phi which has about 50-60 cpu cores on one PCI card. And they claim minimal effort for code optimisation due to the X86 architecture advantage.

Does that mean that we could get almost realtime raytracing for MR, Vray, Maxwell(!), Arnold etc soon?

mister3d
11-13-2012, 01:09 PM
Can you provide the source? Anyway, they will be very high priced.

mustique
11-13-2012, 01:19 PM
There are lots of sources from semiaccurate.com to theregister.co.uk, actually most tech sites cover them.

As for prices it's around
$2000-2700 for the XeonPhi cards (add a normal Xeon cpu price to that)
$3200-4000 for nvidia tesla K20&K20X

At semiaccurate.com, the talk is that XeonPhi takes a single line of code for multithreaded software to take advantage of massive speedups. With no modification at all, your application would already run 2-3 times faster. If that's true this is huge news and can mean a desktop renderfarm for current offline raytracers.

AJ1
11-13-2012, 01:23 PM
These look pretty cool. They concept was announced a while ago, but It looks like the brand just rolled out today.

I don't see these having a huge impact on the CG world. There're meant for the HPC sector, where high single node performance is a must. I would image they fill the same niche market role as the tesla cards.

Looks like the cheapest one is about 3k.

http://www.pcmag.com/article2/0,2817,2412048,00.asp

-AJ

pokoy
11-13-2012, 01:26 PM
The 50-60 cores per card are a bit misleading. They're running at ~1 GHz so it's not the same kind of GHz-per-core ratio that we know from desktop CPUs.
Additionally, according to the numbers here (http://www.tomshardware.com/reviews/xeon-phi-larrabee-stampede-hpc,3342-5.html), speedup factor is somewhere at 2-3x compared to a Dual E5 setup running at 2.6 GHz. Plus, and that's something I haven't been aware of until today, they come with on-board memory, around 6-8 GB, it appears they won't be able to use your system's RAM. Price tag around 2500 USD.
I have only been loosely reading the article so correct me if I'm wrong...

ambient-whisper
11-13-2012, 01:29 PM
By the time one of those is within a decent price range ( maybe used even? ), a desktop cpu will be faster.

I usually spend money in a way where I think of the long term investment. For example, I could buy a quadro card today for $2000 ( just a hypothetical number ), Or I could buy a few geforce cards today for a few machines.. or maybe spend that same $2000 over time to buy and upgrade new cards for the next 5 years. Within that 5 year period, that $2000 quadro will be useless, while the new $300 geforce will run circles around that old quadro. Same with CPUs, and other expensive parts.

The one question you have to find out is, how fast will those 50 cores be?! If they are running on a single pciE card, then I cant imagine it would be very fast per core, otherwise the temperatures would run extremely hot.

I am not saying I dont like the idea, and who wouldnt want faster rendering... but realistically speaking, this tech isnt made for the average joe.

Because the ram is fixed as well, it means that your bucket size will be limited as well on huge scenes. If you have 50 cores, and 6gb ram, then that means that each core will get approximately 120 mb per bucket/core Currently on my machine I get approximately 5333mb ram per bucket/core.

mustique
11-13-2012, 01:33 PM
...I don't see these having a huge impact on the CG world. There're meant for the HPC sector, where high single node performance is a must. I would image they fill the same niche market role as the tesla cards...J

Isn't a renderfarm a HPC too? I assume with GPU renderers making good use of Tesla, the same could apply to Intel's solution. Well, I hope so at least.

Would be cool to know what software developers think about this, from a CG POV.

mustique
11-13-2012, 01:50 PM
@ambient-whisper

I have a similar habbit when buying hardware, but what got me excited about that tech, is that we could potentially get speedups for rendering with ordinary offline renderers that would otherwise take, maybe 10 years to come to our desks.

Cause AFAIK, current top Xeon cpus deliver just 125 DP GFlop of compute power, versus 1.1 Tflop (10 times more)

ambient-whisper
11-13-2012, 01:59 PM
@ambient-whisper

I have a similar habbit when buying hardware, but what got me excited about that tech, is that we could potentially get speedups for rendering with ordinary offline renderers that would otherwise take, maybe 10 years to come to our desks.

Cause AFAIK, current top Xeon cpus deliver just 125 DP GFlop of compute power, versus 1.1 Tflop (10 times more)

Thats when you learn to..

http://www.e-onsoftware.com/products/vue/vue_9_infinite/videos/cameramapping.html

Though obviously for things like fluids solving and such, theres less you can do to optimize. It will take long either way.

For most things however, optimizing/ and breaking up of shots into smaller elements can really take you a long way.

AJ1
11-13-2012, 02:13 PM
Isn't a renderfarm a HPC too?

Yes they are. But render farms aren't typically funded by taxpayers, research grants, or a large corporation desperately trying to secure a beefy defense contract. :sad:

Looks like they've been using these things to upgrade existing supercomputers that are running on Xeon E-5 boards. From what I've been reading, they are designed to crunch the type of computations needed for financial, defense, and astrophysics work. No mention of trying to make my renders go faster :sad: .

-AJ

DePaint
11-13-2012, 02:16 PM
I personally believe that Intel is aiming Xeon Phi not just at "High Performance Computing", but also at accelerating DCC applications like 3D Rendering Software and Editing/Post Production tools.

So the notion that your favorite CG renderer will soon run 5 - 10 times faster on a Xeon Phi coprocessor card is not terribly far fetched.

Neither is being able to Colorgrade 8K resolution footage in realtime, or being able to compress 1080 HD video 5 - 10 times faster than was possible on a Core i7 or similar CPU.

As for "automatic multithreading" or "intelligent multithreading", Microsoft's .NET software architecture has had that option for some time.

You can run a PARALLEL FOR loop in .NET languages like C++/C# for graphics/pixel operations, scanning a say 800 x 600 pixel large image left-to-right, top-to-bottom in the process.

PARALLEL FOR will automatically farm different parts of the image/pixel operation out to available CPU threads, thereby causing an automatic speedup of the operation using multiple CPU cores.

Xeon Phi probably has something similar to this. When it detects a function that can be parallelized across many cores, it will do that automatically.

This is not as efficient as writing properly optimized multithreaded code by hand.

But yes, in some cases automatic multithreading can give you a 2 - 4 times speedup.

If you want to use the full parallel-processing potential of the co-processor card though, you need to break functions up into proper threads by hand, which means that the imaging operation uses multiple scanlines or render tiles to get the job done.


Of course Intel could have done something really clever with Xeon Phi, such as detecting intelligently virtually any scenario where X86 code can be parallelized, and having the Xeon Phi board parallel-execute it nearly as fast as though a programmer had written multithreaded code to begin with.

If this is the case, Xeon Phi boards may be able to speed up even legacy X86 applications without a programmer having to wade through 1000s of lines of performance-sensitive code,
and optiziming it manually for multithreaded execution.

A function like this would definitely separate Xeon Phi from Nvidia and AMD's GPU-based solutions.

Just imagine: You write regular, single-threaded X86 code, and Xeon Phi automatically parallelizes it during execution!

If it does this, and does it efficiently, that would be a HUGE benefit to have over the Nvidia/AMD solutions, where you have to rewrite ALL of your code in CUDA/OpenCL just to get it running on the GPU at all...

Lomax
11-13-2012, 02:25 PM
There's always going to be something to keep render times slow. Hardware and software is like an arms race, with both constantly trying to catch up with and out-do the other. :shrug:

pokoy
11-13-2012, 02:37 PM
If it does this, and does it efficiently, that would be a HUGE benefit to have over the Nvidia/AMD solutions, where you have to rewrite ALL of your code in CUDA/OpenCL just to get it running on the GPU at all...

QFA. Still, if you have to invest in another ecosystem not fully using your system resources (RAM) it's not going to change my world. Why would I want to spend money on a card using only its on-board memory when my system already has 24 or 32 GBs and it's not going to kick in on almost all of the stuff I'm rendering?

It's this either-or that makes hesitant about investing in it. Be it CUDA or Phi, they'll only help you with the easier stuff.

sentry66
11-13-2012, 02:37 PM
new article just came out yesterday
http://www.tomshardware.com/reviews/xeon-phi-larrabee-stampede-hpc,3342.html

It talks about the supercomputers that are making use of the phi's.

According to intel's research, the phi can accelerate raytracing by up to 1.88x on a 2.7gz 16-core dual xeon E5-2680 when compared to just the dual E5-2680 alone. This makes me think real world performance would be more like 1.5-1.6x for most renderers on a dual xeon system.

In January the price of their initial released phi will be around $2700 and then they'll release a faster, higher end version later in the year.

Since its performance is somewhat fixed and its performance was estimated on an almost top of the line dual xeon system, I'd guess that if you had a regular 6-core i7 workstation, you could possibly get around 2x the rendering performance and for a 4-core i7 you'd get almost 3x the rendering performance. That's an interesting prospect considering a i7 workstation usually costs around $2-4k depending on what you get. At $2700, a phi might be worth considering for an i7 workstation, and absolutely worth buying for a $7000-14000 dual xeon workstation

IMO these aren't worth the money for small renderfarms unless you're buying $7000+render nodes and are building a high density super computer or a pro renderfarm to compete with Pixar and Lucusfilm. IMO most of us probably get much more renderfarm CPU performance for the money buying cheap $1500-2000 i7 boxes and possibly overclocking. Even considering the price of render software licenses, I'm not sure the price/performance of the phi make it completely attractive for small renderfarms.

I'm also not sure how its onboard 6 gigs of memory plays into rendering. Maybe for heavy scenes it just won't work. Or maybe each processor uses a few megs for caching data for render tiles and it'll render heavy scenes just fine? I don't know.

DePaint
11-13-2012, 02:51 PM
The very first release of something new - like Xeon Phi - is always going to be "pricy" for what it actually does.

Give it 3 - 5 years, so Intel can make their solution smaller and more efficient, and we may very well see Xeon Phi co-processor boards with say 500 Cores & 32GB RAM @ 3,000 Dollars or so.

At some point in the game, Xeon Phi will probably become efficient enough in terms of Price:Performance ratio for multiple software 3D renderers to be ported to it.

The fact that the architecture is based on plain old X86 is a HUGE plus for Xeon Phi.

One of the overlooked advantages of this is that existing, mature, advanced X86 Compilers can be modified quite easily for use with the Xeon Phi board.

Also hand-optimized X86 Assembly code, which is in almost all render engines, in the really speed-sensitive parts, should be easier to port to Xeon Phi than to, say, Nvidia CUDA or AMD OpenCL.

mustique
11-13-2012, 02:53 PM
http://semiaccurate.com/2012/11/12/what-does-it-take-to-code-for-a-xeon-phi/

For those who are interested in what it takes to code for xeonphi

bottom line is, that any kind of multithreaded code (hence every raytracer) is already acceleratable by this tech. But for "substantial speedups", a single line of code would be enough for optimisation.

I don't know if there's some sort of catch here. Real world tests will clarify things soon.

darthviper107
11-13-2012, 03:09 PM
Do we even have any renderers that can use that hardware? Besides the GPU renderers, since those cards don't do very good that way.

DePaint
11-13-2012, 03:31 PM
Do we even have any renderers that can use that hardware? Besides the GPU renderers, since those cards don't do very good that way.

Nobody has publicly announced Xeon Phi support so far.

But knowing Intel, they probably gave a few of the important render engine manufacturers Xeon Phi sample boards well in advance of mass production starting.

Or maybe they gave them just a "Xeon Phi Software Compiler", so they can start getting used to Phi programming without having an actual Xeon Phi hardware board at hand.

Time will clarify all this...

axezine
11-14-2012, 06:12 PM
http://semiaccurate.com/2012/11/12/what-does-it-take-to-code-for-a-xeon-phi/

For those who are interested in what it takes to code for xeonphi

bottom line is, that any kind of multithreaded code (hence every raytracer) is already acceleratable by this tech. But for "substantial speedups", a single line of code would be enough for optimisation.

I don't know if there's some sort of catch here. Real world tests will clarify things soon.

Well, that article really is quite simplistic. You can definitely use OpenMP to parallelize a for() loop like that, but this has been true for many years (as in, you might use OpenMP or some other threading library to parallelize code on CPUs as well, and it might be similarly easy to do).

But this fails to take into account a lot of things. There's a reason getting good threaded performance (ie, scaling) out of many applications is hard, and it has a lot to do with the fact that most applications are not in fact collections of tight, independent loops that you could simply parallelize without thinking about it. On the software side, sometimes the unit of parallelization is large (think a renderer which parallelizes on the ray-tracing for each pixel. There's tons going on for that one pixel in any production renderer). Even if the workload is what some would term "embarrassingly parallel" (again, rendering is a typical example of this) you run into problems of hardware and software limitations - memory bandwidth, cache coherency, shared data structures whose access must be serialized, portions of the code that can't be parallelized, etc.

What the Xeon Phi enables is the use of tools and paradigms that have been worked on for a very long time in a "compute" (as in the GPGPU sense) setting. No need to mess around with a new(-ish) programming language like CUDA.

The best analogy is to consider a Phi card inside a computer to be a rack of servers. Phi internally runs a Linux distro, and you can "log in" to that as if it were a remote computer and execute your code there. It will look like a massively parallel server, of course - 60 cores! But there's no "secret sauce" in Phi that auto-magically parallelizes code, if you run a single threaded application on Phi it will still run on a single thread (ie, Phi core). If your code is poorly threaded, it will still run poorly on Phi - probably more so when compared to a desktop CPU, where each core is probably much much faster.

Cheers!

Jorge

AJ1
11-14-2012, 06:26 PM
Cool, thanks for the explanation Jorge.

Is it fair to say that these work like the Tesla cards?

-AJ

DePaint
11-14-2012, 06:54 PM
Is it fair to say that these work like the Tesla cards?

Yes and no.

Yes, its a many-cored coprocessor card aimed at accelerating computations, like Tesla.

No, it doesn't run on CUDA or OpenCL code. It uses regular X86 code/instructions.

So while it is aimed at the same use as Tesla, it functions more like many X86 processors jammed together on a card.

raffo
11-15-2012, 02:03 AM
No, it doesn't run on CUDA or OpenCL code. It uses regular X86 code/instructions.



OpenCL runs on normal a x86-cpu already. For instance VrayRT can already utilize the cpu for OpenCL-Rendering.

I heard you can also run CUDA on standard x86 architecture...

ThE_JacO
11-15-2012, 03:47 AM
It's not strictly like a tesla, nor the similarity to a tesla would be predicated on what library you use to write for it anyway.
It does try to shift a market that right now is looking at teslas though, but the offer is different.

It's a lot more similar to a networked renderfarm with the networking bottleneck reduced significantly by the connectivity being internal instead of cabled. It even has discrete unit management through TCP/IP.

The breakthrough isn't in how fast it is or any of that, the breakthrough is the sheer amount of units it can pack into a certain volume and at what power cost.

It's clearly aimed at computational centres first, and the potential push in the future for cloud resourcing, which might one day move to super global, far away large aggregations to a much denser cloud of smaller units in closer reach.
IE: every building gets one much like ISPs and telcos now provide some buildings with what are basically fully fledged servers and routing smaller bandwidth users inside that building.

Also, unlike the Tesla, this isn't a very large set of small and relatively specialised units, it's a smaller set of more generally purposed units that can take on more, be virtualised and managed more efficiently, and try to make the main CPU become the coordination centre and shoulder unit for a larger network of other units contained in these cards.

In a way it's Larrabee making a comeback.
Its consequences will at some point reach end users, I'm sure, but I doubt the first couple years are intended to produce anything you'd feel the need for at home.
On top of that, the moment enough development is done for it in the form of infrastructure and actual software, that also allows for the home CPUs to keep scaling horizontally in width rather than try to hit higher in cycles, which are fast approaching the physical limitations of the materials involved and become harder every year to beat and manufacture competitively.

tswalk
11-15-2012, 06:44 AM
i tried to google up some photos of the old 486 DX 66 parallel rendering expansion cards we use to use (mid '90s) with Digital Arts (DGS) without success.. i saw my linked in profile listed, which is kinda scary (guess i should get rid of that old crap).

this all reminds me of those days when you could stick a bunch of them in a box to offload rendering... wouldn't that be fun to do again?

CGTalk Moderation
11-15-2012, 06:44 AM
This thread has been automatically closed as it remained inactive for 12 months. If you wish to continue the discussion, please create a new thread in the appropriate forum.