Loops in Assembly

One of the primary components of a program are its flow control structures. Being able to identify and understand flow control in the assembly output is key to a deeper understanding of the program when debugging at the lowest levels. With this in mind we’ll look at loops as they exist in machine code.

High level languages offer a plethora of looping constructs each with a different set of strengths related to expressing a workflow in a way the computer can easily execute. CPUs however don’t know anything about while loops, for loops, do while loops, or any other type of loop. CPUs only know about GOTO statements in the form of jump instructions. GOTOs might be considered harmful for human beings but in the reality that a CPU knows, GOTOs are all that exist. All high level program flow constructs such as loops ultimately boil down to condition checks coupled with jump instructions.  Let’s dive right in with a few C examples to see how human friendly loops get converted into CPU executed conditional jumps

int idx = 10;
while(idx > 0)
{
printf(“I’m in loop %d”, idx);
idx–;
}

Here we are going to execute the loop 10 times , printing out which part of the loop is executing each time through. Let’s look at how this is implemented (with no optimizations) at the processor’s level.

; Function prologue, set up the stack
5 004711c0 8bff mov edi,edi
5 004711c2 55 push ebp
5 004711c3 8bec mov ebp,esp
5 004711c5 51 push ecx
; Move the value 10 into the variable idx. Remember that local variables live directly below the stack base pointer (held in register ebp)
6 004711c6 c745fc0a000000 mov dword ptr [ebp-4],0Ah
; Begin the while loop, this instruction compares idx to the value 0. 
7 004711cd 837dfc0a cmp dword ptr [ebp-4],0
; If the previous comparison came back as idx less than 0, jle will jump to the address specified which is the first instruction
; after the end of the while loop

7 004711d1 7e1d jle explore_loops!main+0x30 (004711f0)
; If we get here, we are still executing the loop. Here we move the value in idx into the eax register and push that value on to the stack for use by printf
9 004711d3 8b45fc mov eax,dword ptr [ebp-4]
9 004711d6 50 push eax
; In the next 3 lines we execute the call to printf to print our position in the loop
9 004711d7 68bc104700 push offset explore_loops!`string’ (004710bc)
9 004711dc ff1570104700 call dword ptr [explore_loops!_imp__printf (00471070)]
9 004711e2 83c408 add esp,8
; Here we copy the value of variable idx into register ecx and then we subtract 1 from ecx and put that new value back into the variable idx
10 004711e5 8b4dfc mov ecx,dword ptr [ebp-4]
10 004711e8 83e901 sub ecx,1
10 004711eb 894dfc mov dword ptr [ebp-4],ecx
; Remember that all processors really understan are goto statements. This goto (an unconditional jump) returns us to the beginning of the while loop
11 004711ee ebdd jmp explore_loops!main+0xd (004711cd)
; The first instruction right after our loop, when the loop terminates this is where we land and in this case we are just zeroing out the eax register
; which has the effect of setting the return from main to be 0 (success)

12 004711f0 33c0 xor eax,eax
; Function epilogue, clean up and return
12 004711f2 8be5 mov esp,ebp
12 004711f4 5d pop ebp
12 004711f5 c3 ret

In this example we saw how the compiler converted the while loop in C into the instruction stream that is executed by the processor. The beginning of the while loop is converted into a comparison followed by a conditional jump instruction. Conceptually this is “if (foo < bar) then goto line #”. A while loop is not guaranteed to execute, it is conceivable that the loop condition will not be met even on the first pass and the loop will simply be skipped. We see this in the fact that the very beginning of the loop is a test which when met causes the loop to terminate.

The end of the loop is an unconditional jump back to the instruction where the comparison is made. In the final iteration the unconditional jump back will occur followed by a comparison that meets the exit criteria and an immediate jump past the loop to the first instruction following it.

“do while” loops are guaranteed at least one iteration regardless of the loop condition. In this case we will find only one jump, a conditional jump at the very end of the loop. If the condition is met the loop will continue (the jump will be taken back to the start of the loop). If the condition is not met then the jump is skipped and execution continues on to the next instruction in the stream. Let’s see an example.

int idx = -1;
do
{
printf(“I’m in loop %d”, idx);
idx–;
} while(idx > 0);

 We expect this loop to execute only once and since its continuation condition is not met, we will fall through and continue on to the code just past the end of the loop.

Let’s look at the assembly output for this function:

; Move the value -1 into the variable idx. Remember that the two’s compliment value for -1 is all bits set to 1.
13 003011e0 c745fcffffffff mov dword ptr [ebp-4],0FFFFFFFFh
; Move the value in the variable idx in to the register edx and push that register value on to the stack for use by printf
16 003011e7 8b55fc mov edx,dword ptr [ebp-4]
16 003011ea 52 push edx
; Print the value
16 003011eb 68bc103000 push offset explore_loops!`string’ (003010bc)
16 003011f0 ff1570103000 call dword ptr [explore_loops!_imp__printf (00301070)]
16 003011f6 83c408 add esp,8
; Copy the value in variable idx into the register eax and subtract 1 from it, place that new value into the variable idx
17 003011f9 8b45fc mov eax,dword ptr [ebp-4]
17 003011fc 83e801 sub eax,1
17 003011ff 8945fc mov dword ptr [ebp-4],eax
; Compare the value in idx to the value 0
18 00301202 837dfc00 cmp dword ptr [ebp-4],0
; If the variable idx’s value is greater than 0, jump back to the beginning of the loop (execute the loop one more time)
18 00301206 7fdf jg explore_loops!main+0x37 (003011e7)
; When we get here the loop is done executing. In our example we return from main at this point so clean up the stack and return
19 00301208 33c0 xor eax,eax
19 0030120a 8be5 mov esp,ebp
19 0030120c 5d pop ebp
19 0030120d c3 ret

As expected for “do while” there was only one jump for this loop and it is a conditional jump at the very end of the loop.

Exploring Further

The best way to gain deeper insight into loop structure at the processor level is to create example loops in your favorite language and then use the debugger to look at the resultant assembly code. As an excercise create a for loop in C / C++ and compile it with your favorite compiler (either Visual Studio or the Windows Driver Development toolkit is a good option here).  Be sure to disable optimizations. At this point you just want to see what the basic structure of a loop is at the processors level without having to worry about how the optimizer rearranges things for speed and code compactness.

After compilation run your program to make sure it does what it is supposed to and only then open the executable using windbg. Use the uf command to disassemble the function where your loop lives and explore layout of the loop. See where the jumps are placed and what conditional statements occur to control the looping jumps.

Tips

Loops always involve a jump back in the instruction stream. When trying to determine if a segment of disassembly is a loop look for jumps that go backwards in the code. Remember that “while” loops will start with a conditional check that will jump to the first instruction directly after the looping jump (this will terminate the loop). “do while” loops only have one jump and it is a conditional jump backwards at the end of the loop structure. It is important to note that optimized loops are often not loops at all once they have undergone the optimization phase of compilation. Jumps (particularly reverse jumps) are slower than executing a continuous stream of instructions because of their impact on speculative execution and possible cache misses for instructions at earlier addresses in the instruction stream. Optimizers will try to convert loops into a series of instructions if it can determine how many times a loop will execute. This unrolling of the loop makes the code larger but in many cases the code executes more quickly. In a future article we will compare unoptimized and optimized loop structures in assembly

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Your email address will not be published. Required fields are marked *