Haswell not as slow as thought, just misunderstood!


#1

The non programmers might balk at the text, but some of the graphs should be plain enough:
http://www.phoronix.com/scan.php?page=article&item=intel_core_avx2&num=1

AVX2 turns out to be much better than originally expected if properly optimized for.
Intel’s latest compiler (which has always been ridiculously strong on numerical) already hinted at that, but it was hard to get past the rumours of Intel making it purposedly so for the sake of making them appear better than they are. When GCC pulls it off though you know it’s outside of marketing scheme and well into ā€œit is soā€ territory.
4.9 of GCC might do an even better job again.

Turns out if the compiler does a good job of autovectorization and is used to specifically target AVX2 (the newest SSE like instruction set on Haswell) there is a very real benefit to be had on the cheap.

Sadly, most of the apps you use day in and day out not only aren’t built to that level of optimization, they are often compiled largely, if not entirely, with ancient compilers that will barely tap SSE2.


#2

…most of the benchmarking that happens at the other review sites is done using generic pre-compiled binaries rather than building from source with optimizations for a given architecture.

Yeah, because we all have source code for Maya, Nuke, Houdini, After Effects and everything else we use on a daily basis. Don’t get me wrong, I love Phoronix, but for the average user those benchmarks won’t mean anything for another five years. Thanks for the link!


#3

Well, as far as choosing a CPU for existing DCC apps, sure, but then for a lot of people you might as well get a dual core as long as you can OC it to 5Ghz given how few actually really squeeze their money out of multicore.

But for anybody working in software this is very relevant.
The Intel compiler also does an excellent job of exploiting those instructions and can do downright black magic with autovectorization, and more prominent chunks of Maya every version go through that and TBB.

Plus, I was tired of the hardware forum being just a ā€œhelp me build my first machine!ā€ forum, and thought an actual insightful hardware article written by people with a clue, as opposed to completely ignorant hacks reviewing CPUs on a lot of websites, would have been a breath of fresh air :wink:


#4

it’s cool stuff

hopefully everyone gets on top of it within the next few software releases


#5

The breath of fresh air was definitely appreciated! I was just grumbling about the state of technology in the business both software and hardware. :thumbsup:


#6

this is nice but it’s not likely to hit much software. Most CG software barely makes it out the door running so they can get to work on the next thing to upgrade to. SUCKS LESS ON HASWELL isn’t a bullet point that sells software, unfortunately. Autodesk still needs to make mental ray on Windows not run slow as hell.


#7

Intel is increasingly trying to address needs of specific markets but it looks like the ā€œone cpu architecture fits allā€ model isn’t enough anymore.

AVX2 is yet another extension which currently is only available to very few people who happened to buy a comp with a Haswell cpu. So what? How many software devs will make the effort to optimize their apps for a possible %30 performance gain? Very few. Devs are not living in the late 90’ or early 2000’s anymore.

It also amazes me how Intel is bringing extensions like AVX2 on its consumer cpu’s first. :argh:

Intel should build cpu’s which have ā€œ32-64 low clocked cores plus one very fast coreā€ A cpu that can handle the best of both worlds. Otherwise ARM will eat away the bottom, and GPU’s will eat away the top of intel’s remaining cake. :wink:


#8

i couldn’t disagree more. Nobody has ever tried to make a ā€˜one cpu fits all’ product. There has always been an array of high speed, many cores, low power, small size, specialist cpus for all sorts of applications.

CPU manufacturers have continuously added extra instruction sets and units; FPUs, AVX, 64bit, SIMD, SSE 1-4. Hardware cant be retroactively upgraded, the best you can do is make the improvement, then when they have enough market penetration, software makers will start optimising for them. If you never add them because there wont be enough to make it worthwhile, then no improvement ever gets made.

And a 30% speed bump? jeebus christo thats a hell of a speed up, if compiling for that gets a 30% speed bump, most programmers would bite your arm off for such an improvement. But to then say that a 64-core cpu would benefit people more is just silly, optimising for many cores is probably the most difficult thing to do, we’re still many years away from general consumer applications being able to scale much higher than 3-4 cores. Other than video encoding and 3D rendering which are much easier to run parallel across many cores, youre not going to get any games using more cores.


#9

If you look at the efficiency of a setup like that it doesn’t make sense to implement. The only way to keep processors reasonably efficient is to dynamically change the clock speed on the processor cores. So any of the slow cores can become the very fast core for a moment when needed (a lot of slow cores or a fast core, not both at the same time). That’s why modern processors have ā€œturboā€ as the marketing folks call it instead of a dedicated ā€œfast coreā€ for single threaded tasks.


#10

I know that Arnold has been utilizing Intel’s SSE4 for a few years now but don’t know about AVX. I remember at Siggraph 2012 in Los Angeles Marcos Fajardo talked about working with Intel for optimizing their renderer for their CPUs.

Interesting look at what is possible with AVX2 but like others have said I don’t expect to see it used in most CG applications.


#11

This thread has been automatically closed as it remained inactive for 12 months. If you wish to continue the discussion, please create a new thread in the appropriate forum.