Just want to give you a friendly piece of advice. I canât help that itâs going to come out sounding like a put-down so I want to make it clear that Iâm just advising you, not putting you down. Please take this as just advice from someone who has been through this.
Almost every piece of writing that you may come across about assembly language (tutorial, book, etc.) they will say pretty much the same thing: donât get into Assembly thinking youâll make faster code.
Using assembly does not gaurentee faster code. In fact, if you are new and donât fully know what youâre doing youâll end up writing bad assembly code that may run slower, be more buggy and prone to error, and in all cases: harder to debug and find errors.
Hereâs my story:
I was just playing around with some beginner algorithms for writing a ray tracer. Was able to draw a few squares and circles on the screen, cool. Then I heard about SSE, SSE2, 3DNow! processor extensions. Normally compilers wonât know how/when to properly use them (mine didnât, MS VC++ 6.0) There are some specialized compilers that will make use of those SIMD extensions but itâs not always great, the trick is to organize your data in such a way that they can be streamed (in other words fed) to the CPU and the SIMD instructions executed upon them for optimal performance.
So I tweaked my rountine by replacing them with assembly code, in this scenario I was able to use assembly instructions my C/C++ compiler did not know about (the SSE, SSE2 instructions). The result? Turns out my assembly code took longer than the regular âunoptimizedâ code the compiler generated. Why? Switching between the FPU and the ALU (in other words floating point math processor to integer math processor) causes a brief stall. Itâs nano seconds, you wouldnât notice it. But⌠in a loop that runs 1000 times, the delay adds up and becomes a performance bottleneck.*
I will agree that learning assembly does open up your eyes to how much goes on in the background and what your programs are doing. But it is forcing you to work at the hardware level, and you pretty much have to think like a machine. Donât attempt this until youâve grasped C/C++. Just take it as a learning experience, using assembly language isnât always practical. You have more to gain in terms of improving performance of your programs simply by improving the algorithm and high level code. People spend years developing quality C/C++ compilers, youâre going to have a very hard time competing with that not to mention the compiling power of your computer.
I donât mean to just discourage you from learning, assembly is very interesting and can be fun. But youâll be spending much more time learning assembly as oppose to any other language (spoken or programmatic). That time is better spent actually utilizing what youâve learned in C/C++. Or depending on what youâre doing, multi-threaded programming is even more valuable and can be done easier in C/C++.
In case youâre wondering, why did my assembly code run slower than the compiler code? Even when I used SIMD instructions (SSE, SSE2) that was suppose to boost performance? Hereâs what my code sort of looked like:
for( int i = 0; i < numVertices; i++)
{
// I inlined assembly instructions here
}
Remember what I said: switching between floating point math and integer math causes a speed penalty, not to mention the penalty in branching code. My FOR loop was based on an integer (i), the code in the body of that loop operated on floating point data. Moreover it was using the SIMD registers and execution units (some of which are just remapped to the FPU registers). The speed penalties added up and made this code slower. My compiler probably would have seen that problem coming and converted the i variable into a float in order to bypass one speed penalty. Even when I changed my C code to make i a float, the improvement was very negligible. Not worth the many hours I spent on it.