Just want to give you a friendly piece of advice. I can’t help that it’s going to come out sounding like a put-down so I want to make it clear that I’m just advising you, not putting you down. Please take this as just advice from someone who has been through this.
Almost every piece of writing that you may come across about assembly language (tutorial, book, etc.) they will say pretty much the same thing: don’t get into Assembly thinking you’ll make faster code.
Using assembly does not gaurentee faster code. In fact, if you are new and don’t fully know what you’re doing you’ll end up writing bad assembly code that may run slower, be more buggy and prone to error, and in all cases: harder to debug and find errors.
Here’s my story:
I was just playing around with some beginner algorithms for writing a ray tracer. Was able to draw a few squares and circles on the screen, cool. Then I heard about SSE, SSE2, 3DNow! processor extensions. Normally compilers won’t know how/when to properly use them (mine didn’t, MS VC++ 6.0) There are some specialized compilers that will make use of those SIMD extensions but it’s not always great, the trick is to organize your data in such a way that they can be streamed (in other words fed) to the CPU and the SIMD instructions executed upon them for optimal performance.
So I tweaked my rountine by replacing them with assembly code, in this scenario I was able to use assembly instructions my C/C++ compiler did not know about (the SSE, SSE2 instructions). The result? Turns out my assembly code took longer than the regular “unoptimized” code the compiler generated. Why? Switching between the FPU and the ALU (in other words floating point math processor to integer math processor) causes a brief stall. It’s nano seconds, you wouldn’t notice it. But… in a loop that runs 1000 times, the delay adds up and becomes a performance bottleneck.*
I will agree that learning assembly does open up your eyes to how much goes on in the background and what your programs are doing. But it is forcing you to work at the hardware level, and you pretty much have to think like a machine. Don’t attempt this until you’ve grasped C/C++. Just take it as a learning experience, using assembly language isn’t always practical. You have more to gain in terms of improving performance of your programs simply by improving the algorithm and high level code. People spend years developing quality C/C++ compilers, you’re going to have a very hard time competing with that not to mention the compiling power of your computer.
I don’t mean to just discourage you from learning, assembly is very interesting and can be fun. But you’ll be spending much more time learning assembly as oppose to any other language (spoken or programmatic). That time is better spent actually utilizing what you’ve learned in C/C++. Or depending on what you’re doing, multi-threaded programming is even more valuable and can be done easier in C/C++.
In case you’re wondering, why did my assembly code run slower than the compiler code? Even when I used SIMD instructions (SSE, SSE2) that was suppose to boost performance? Here’s what my code sort of looked like:
for( int i = 0; i < numVertices; i++)
{
// I inlined assembly instructions here
}
Remember what I said: switching between floating point math and integer math causes a speed penalty, not to mention the penalty in branching code. My FOR loop was based on an integer (i), the code in the body of that loop operated on floating point data. Moreover it was using the SIMD registers and execution units (some of which are just remapped to the FPU registers). The speed penalties added up and made this code slower. My compiler probably would have seen that problem coming and converted the i variable into a float in order to bypass one speed penalty. Even when I changed my C code to make i a float, the improvement was very negligible. Not worth the many hours I spent on it.