Fun with Compilers 2: Optimizations

For my second post about compilers, I thought I’d show what happens with various optimization levels in GCC.  Note I’m still focusing on gcc because I’m too lazy to fire up VirtualBox and Windows right now.

As a reminder from the first, here’s my wonderfully useless sample program.  All it does is go through a loop and does not use the output.

int main(int argc, char* argv[])
{
  int i = 0;
  int loop;
 
  for (loop = 0; loop < 50; loop++)
    ++i;
  return 0;
}

So last time they were purposely compiled without optimization as I just wanted a quick and dirty assembler output.  Below we’ll examine the assembly output at different optimization levels of just the main function, as that’s the one we’re more interested in (there are actually more sections in an ELF executable, but that’s a story for another day).

The option flag -O1 turns on the first level of optimizations in gcc/g++.  Instead of copying them here, I’ll refer you to the GNU manpage for optimizations for gcc/g++.

gcc -O1 main.c -o main_c1

main():
55                   push   ebp
89 e5                 mov    ebp,esp
b8 32 00 00 00       mov    eax,0x32
83 e8 01             sub    eax,0x1
75 fb                 jne    804839c
b8 00 00 00 00       mov    eax,0x0
5d                   pop    ebp
c3                   ret  
90                   nop
90                   nop
90                   nop
90                   nop
90                   nop
90                   nop
90                   nop
90                   nop

Compared to the previous post (no optimization), we see here that the assembly is better optimized.  This version does not allocate space for the local variables i and loop,set them to zero,  and increment loop and i at each pass. Instead, it loads 50 into the EAX register, subtracts 1, and checks if the zero flag is set.  If not, it loops back to the subtraction and continues until it reaches zero.  It then nukes it’s local stack and returns while being padded out (again, I’ll go over that another day).  As we turn more optimization on, the compiler does more analysis and realizes that it doesn’t need to increment the variables since we’re not doing anything with them.  For brevity, note that the assembly output from g++ is identical to the main function from gcc.

gcc -O2 main.c -o main_c1

At -O2, the compiler does even more analysis and has more “smarts” turned on.  Let’s look at the assembler output below.

main():
 55                   push   ebp
 31 c0                 xor    eax,eax
 89 e5                 mov    ebp,esp
 5d                   pop    ebp
 c3                   ret  
 90                   nop
 90                   nop
 90                   nop
 90                   nop
 90                   nop
 90                   nop
 90                   nop
 90                   nop
 90                   nop

Here the compiler realized that nothing is done with the variables, and nothing was done with the output.  So, the compiler doesn’t even do the loop any more.  It still pushes the stack frame pointer.  It moves 0 into EAX (the xor eax,eax generates shorter op codes and is a touch faster than actually pushing zero there).  It still sets up the stack frame (main is a function after all) and then brings back the previous frame and returns.  Again, with optimization the g++ assembly matches the gcc output for main().  gcc/g++ -O3 generates the same output as -O2.

You might ask yourself “Why do the compilers even set up the stack frame for the main function?”  The answer is, they have to.  main() HAS to exist in a C or C++ program, even if it really doesn’t do anything.

So have fun, and poke around your programs to see what all gets generated from the compiler.  You’ll see that there’s a lot more in your program than you realized.  Plus, the Art of Assembly Language is now available in a Linux version if you want a decent reference to learn assembler.