Intel X86 Assembly Basics

General Purpose Registers

There are 8 32 bit general purpose registers available on x86 processors (when running in 32 bit mode). Registers that must be preserved across function calls are highlighted in bold, registers which do not need to be preserved are in green. We will cover function calling conventions and register value preservation in a later section. The registers’ typical uses are outlined here

  1. EAX
    This is normally where return values are stored when functions return (any values that will fit in 32 bits, more complex values are pushed on to the stack). During function execution this can be used for general purpose 32 bit value storage.
  2. EBX
    Commonly used for indexing into arrays, can be used for general purpose 32 bit value storage.
  3. ECX
    Used for holding counter values. When not used for counters it can be used for general purpose 32 bit value storage.
  4. EDX
    General data register. This is commonly used for general purpose 32 bit value storage.
  5. ESI
    Used as a place holder for the source index memory value when using the string operations instructions. Note here that string means simply a string of arbitrary byte values and not necessarily ASCII or any other printable encoding.
  6. EDI
    Used as a place holder for the destination index memory value when using the string operations instructions. Can be used for general purpose 32 bit value storage when not using string related instructions.
  7. ESP
    This is the current stack pointer. As new values are pushed on to the stack, the value stored here moves downward the appropriate amount (based on the size of the data pushed on to the stack). This points at the bottom most entry on the stack and not the next following it.
  8. EBP
    This is the stack frame’s base pointer. The base pointer is the topmost region of the stack for a function and remains a fixed point of reference during function execution. Function arguments typically live above the address in memory pointed to by this pointer. They are followed by the return address which is directly above the address held in EBP. The addresses following EBP contain function local variables and any values pushed on to the stack by the function during its execution.

 

EIP
Though EIP is not a general purpose register it merits mention here. It is the instruction pointer and is not directly managed by application software.  EIP is managed by the processor (and can be manipulated by an attached debugger). The memory value in EIP is the address of the next instruction that is about to execute.

Section 3.4.1 of Chapter 3 “Basic Execution Environment” in the Intel Software Developers Combined Manual is recommended further reading on register layout and usage.

Stack Organization

The stack is used primarily for

  1. Holding the arguments being passed to a function
  2. Holding the return address (instruction pointer address) where execution should start after a function is done executing
  3. Holding the stack base pointer value for the function callee
  4. Holding the called functions local variables
  5. Holding any register values that must be preserved for registers that are to be manipulated by the function during its execution.

On X86 the stack is aligned on 4 byte (32 bit) values. That means that ESP and EBP should always hold values that are evenly divisible by 4. Pushing 16 or 8 bit wide data on to the stack will cause mis-alignment and incur a significant performance penalty during further stack access operations.

During a typical function call the typical stack usage scenario goes like this

  1. Push the right most argument onto the stack, continuing to push each argument until you have pushed the left most argument onto the stack.
  2. Issue a call instruction to the function you are calling
    The call instruction pushes the current value of EIP on to the stack for you as it jumps to the called functions starting address. This is the return address. Since EIP points to the next instruction after the one that is executing, it points to whatever code lives in the caller right after the call instruction itself.
  3. In the called function immediately push the current EBP value on to the stack.
  4. After completing the standard function prologue (we’ll cover this in the first example), push local variable values on to the stack
  5. If you use any registers that require preservation, their current values must first be pushed on to the stack before you modify their values.
  6. After the function body executes and it is time to return , the called function must first restore any modified registers (those that need preservation) by popping the their value off the stack back into the register
  7. The stack pointer is normally set to be equal to the base pointer. This allows the ret (return) instruction to find the return instruction pointer and execution can continue in the caller function.

Mnemonics

Remembering the actual bytes for each instruction would be unthinkable and painfully tedious. Assemblers use mnemonics for each instruction. There are two common ways to lay out x86 instructions in an assembly file. Intel and Microsoft assemblers use the Intel syntax which goes as such

<instruction> <destination operand>, <source operand>

For example

mov ebx, eax

This copies the value in EAX into EBX

GNU assemblers use what is called the ATT syntax which is in the more natural form (at least in my opinion)

<instruction> <source operand>, <destination operand>

movl    %eax, %ebx

This accomplishes the same task as the Intel example above. Note that the GNU movl mnemonic is the same as the Microsoft Assembler mov mnemonic.

While the ATT syntax feels more natural (at least to me), I will use the Intel syntax in my examples because that is what the Microsoft macro assembler and Intel documentation both use and all of the examples are intended to be assembled by MASM.

  1. Adam
    March 26, 2012 at 7:12 pm | #1

    I seriously knew about a majority of this, but with that in mind, I still believed it had been beneficial. Very good post!

  1. No trackbacks yet.

Leave a Reply

Your email address will not be published. Required fields are marked *