The Processor’s World

The world as the processor sees it

Before we can talk about assembly instructions we first have to talk about what the processor’s resources are that it works with as it executes a stream of instructions. The processor works only with streams of bytes and works with those bytes in one of two places. Some processors can do their work in the register file or in memory. On some processors all calculations are done on registers and all values are loaded from and then stored to memory. On processors like the x86, the processor can do calculations both in memory and in registers or some mix of the two. This detail is unimportant for now but we will cover it in more detail when talking about specific instructions.

Registers

The register file is a collection of small memory locations called registers that are built into the processor itself. Each general purpose register usually holds either 32 bits or 64 bits of data (or up to 128 bits for many SIMD instruction sets which we’ll cover later) depending on the processor type. Registers are named and many have specific purposes as specified by the application binary interface (ABI). Registers are very fast to access so if a processor can do most of its work in the registers, it can work more quickly. Unfortunately register files are small, no more than a kilobyte or so in size. For example x86 processors have eight 32 bit general purpose registers and their 64 bit cousins have sixteen 64 bit general purpose registers. Not only are register file sizes small, not all of the registers available can be used for doing calculations. Some registers are used for stack frame pointers, function arguments, and other non calculation oriented purposes.

Memory

At some point all processors must load data out of and store results  into the computer’s RAM. While we may think of RAM as fast, your processor sees RAM as quite slow. Imagine having a small desk covered in sticky notes (analogous to registers) and anytime you wanted to update the value on a sticky note you had to get up and walk 5 blocks down to the library where all of the books (memory) are kept and then walk back. Modern processors do have local caches (imagine bringing back a backpack full of related books to store in the room at the end of the hall) but even cache access takes a while from the perspective of the processor. One of the complexities in assembly is dealing with effective loading and storing while keeping memory traffic low and register usage as high as possible. Compiler writers obsess over efficient use of registers and memory. They count clock cycles in their sleep.

Common Currency

Every operation that the processor performs is accounted for in terms of clock cycles. Reading and writing registers usually takes only one clock cycle no matter how fast the processor is. Doing multiplication and division might take 2 to 4 clock cycles, addition is 1 to 2 cycles, loading and storing to cache memory can be around 10 to 20 clock cycles, reading and writing to RAM might take 10,000 or more clock cycles for a modern processor, reading or writing to disk will cost millions of clock cycles. These numbers are just points of reference, processor speed and other factors will affect them. It is the relative scale that is important. These numbers should make clear that the more you can keep your work close to the processor, the faster it can do the work for you. If you constantly have to read from disk or even memory, you’ll turn your fast new processor into a high priced thumb twiddler.

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Your email address will not be published. Required fields are marked *