REN

Ph.D. in Computer Science at Rutgers University

Basic Overview on x86 Architecture

x86 is one of the most popular architecure in desktop micro-processor. In this article, I'll make a brief overview on x86 architecture, registers in x86, x86 addressing mode and x86 micro-architecture related.

Overview on x86 Architecture

x86 is a microprocessor architecture of Intel. The origin of this great giant is 16 bit Intel 8086 CPU in 1978, after that, 80286, 80386 and 80486 came out; the address width was extended to 32 bit from 80386, APU and Multiplier was first shown in 80486. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86". In the early 1990s, Pentium series, as the replacement of previous 80486, came out with supporting of 4GB memory and even 64GB memory with PAE in Pentium Pro; 64 bit support in Pentium 4; pipelining and superscalar greatly exploited the usage of CPU resource. In the middle of new millennium, Core came out with dual core inside the CPU, which means more and faster. Now, Core i series have the largest share in desktop CPU market, with hyper-threading, QPI, the 64 bit era is now comming, and x86 becomes x64 or x86_64. One thing to point out is that x64 is still IA-32 arch, IA-64 Itanium is another architecture in Intel used to be a high-end architecture but now abandoned by Intel.

Registers in x86

Registers, like variables in hardware, are fast and small storage inside the CPU, which could be used for different functionalities such as calculating, status recording and checking. In x86 architecture, there's over 100 registers, but only a small fraction is visible to programmer. The rest bulk of those are used for special purposes such as control registers; preserved for the further upgrading in instruction set extension; registers renaming in out-of-order executing in deep pipelined and superscalar CPU.

Programmer visible registers in x86 could roughly divided into the following types:

	General registers (16 bit)
	AX BX CX DX

	Segment registers (16 bit)
	CS DS ES FS GS SS

	Index registers (16 bit)
	SI DI

	Pointer registers (16 bit)
	IP BP SP

	Indicator (16 bit)
	FLAGS

General Registers

As the title says, general register are the one we use most of the time Most of the instructions perform on these registers. They all can be broken down into 16 and 8 bit registers:

	64 bits : RAX RBX RCX RDX
	32 bits : EAX EBX ECX EDX
	16 bits : AX BX CX DX
 	 8 bits : AH AL BH BL CH CL DH DL

The "H" and "L" suffix on the 8 bit registers stand for high byte and low byte. With this out of the way, let's see their individual main use

	RAX,EAX,AX,AH,AL : Called the Accumulator register. 
               		   It is used for I/O port access, arithmetic, interrupt calls,
               		   etc...

	RBX,EBX,BX,BH,BL : Called the Base register
               		   It is used as a base pointer for memory access
               		   Gets some interrupt return values

	RCX,ECX,CX,CH,CL : Called the Counter register
               		   It is used as a loop counter and for shifts
               		   Gets some interrupt values

	RDX,EDX,DX,DH,DL : Called the Data register
               		   It is used for I/O port access, arithmetic, some interrupt 
               		   calls.


Segment Registers

Segment registers hold the segment address of various items. They are only available in 16 values. They can only be set by a general register or special instructions. Some of them are critical for the good execution of the program and you might want to consider playing with them when you'll be ready for multi-segment programming.

	CS	 : Holds the Code segment in which your program runs.
		   Changing its value might make the computer hang.

	DS	 : Holds the Data segment that your program accesses.
		   Changing its value might give erronous data.

	ES,FS,GS : These are extra segment registers available for
		   far pointer addressing like video memory and such.

	SS       : Holds the Stack segment your program uses.
		   Sometimes has the same value as DS.
		   Changing its value can give unpredictable results,
		   mostly data related.


Index and Pointer Registers

Indexes and pointer and the offset part of and address. They have various uses but each register has a specific function. They some time used with a segment register to point to far address (in a 1Mb range). The register with an "E" prefix can only be used in protected mode.

	ES:EDI EDI DI : Destination index register
                	Used for string, memory array copying and setting and
                	for far pointer addressing with ES

	DS:ESI EDI SI : Source index register
                	Used for string and memory array copying

	SS:EBP EBP BP : Stack Base pointer register
                	Holds the base address of the stack
                
	SS:ESP ESP SP : Stack pointer register
                	Holds the top address of the stack

	CS:EIP EIP IP : Index Pointer
                	Holds the offset of the next instruction
                	It can only be read.


The EFLAGS register

The EFLAGS register hold the state of the processor. It is modified by many intructions and is used for comparing some parameters, conditional loops and conditionnal jumps. Each bit holds the state of specific parameter of the last instruction. Here is a listing :

	Bit   Label    Desciption
	---------------------------
	0      CF      Carry flag
	2      PF      Parity flag
	4      AF      Auxiliary carry flag
	6      ZF      Zero flag
	7      SF      Sign flag
	8      TF      Trap flag
	9      IF      Interrupt enable flag
	10     DF      Direction flag
	11     OF      Overflow flag
	12-13  IOPL    I/O Priviledge level
	14     NT      Nested task flag
	16     RF      Resume flag
	17     VM      Virtual 8086 mode flag
	18     AC      Alignment check flag (486+)
	19     VIF     Virutal interrupt flag
	20     VIP     Virtual interrupt pending flag
	21     ID      ID flag


Intel and AT&T assembly format

In DOS and Windows, x86 assembly is in Intel format, while in UNIX-based OS, x86 assembly is in AT&T format. There're some significant difference between Intel and AT&T assembly format:

In Intel format: Source operand is on the right, destination operand is on the left; There's no symbol before a register; A number in the position of an operand is an immediate number; A number with bracket in the position of an operand means the number in the bracket is a memory address; Registers with bracket means indirect addressing.

MOV EAX, 5			;immediate number
MOV EAX, [1234H]		;direct memory addressing
MOV EAX, [EBP-8] 		;register indirect addressing
MOV EAX, [EBX*4 + 1234H]	;register indirect addressing
MOV EAX, [EDX + EBX*4 + 8]	;register indirect addressing

In AT&T format: Source operand is on the left, destination operand is on the right; A register is started with '%'; A number starting with '$' in the position of an operand is an immediate number; A number in the position of an operand means this number is a memory address. Registers with parentheses means indirect addressing.

movl $0x05, %eax		#immediate number
movl 0x1234, %eax		#direct memory addressing
movl -8(%ebp), %eax 		#register indirect addressing
movl 0x1234(,%ebx,4), %eax	#register indirect addressing
movl 0x8(%edx,%ebx,4), %eax	#register indirect addressing

There's an interesting history about the emergence of AT&T standard. In my opinion, I think Intel format is better because it looks clear and more readable than AT&T format. However, in UNIX-based OS, we have to know the AT&T format because in GDB tools when debugging a program, everything you see is GAS assembly. Thus, in most of my articles, x86 assembly will be in AT&T format. But in the following section of this article, I'm gonna use Intel format because it looks clear.


x86 Addressing Mode

Immediate Addressing

In immediate addressing, an immediate number is used as an operand in an instruction.

ADD   EAX, 3	;EAX = EAX + 3 
MOV   AH, 00	;AH = 00
PUSH  5000H	;push immediate number 5000H in stack

Register Addressing

In register addressing, the operand is a register, in which stores the value that the instruction need.

INC   BX	;BX = BX + 1
ADD   EAX, EDX	;EAX = EAX + EDX
MOV   AH, BH	;AH = BH

Direct Memory Addressing

In x86, there's a mechanism called segmentation in memory addressing. The logical memory address is in the form of Segment : Offset. The value of segment is in segment registers. And it's easy put a logical memory address to a linear memory address: Segment*16 + Offset. The processor takes DS as its default segment. If we want to change segment, we need to put a "segment override prefix" before the memory address. There's more funny things in segmentatin of x86 covered in another article of mine.

In direct memory addressing, operand is a number with bracket, which means the number in the bracket is a memory address, the value in this address is what the instruction need. Remember, in an instruction, it's wrong for both two operands to be in direct memory addressing. Considering the following case, assuming DS = 5000H, ES = 1000H

MOV   ES:[1234H], BX	;move the value in BX to memory address 1000H*16 + 1234H = 11234H
ADD   AX, [1234H]	;move the value in memory address 5000H*16 + 1234H = 51234H to AX

Base (Register Indirect) Memory Addressing

In direct memory addressing, operand is a register with bracket, which means the value in this register is a memory address, the value in this address is what the instruction need. Considering the following case, assuming SI = 1234H, BP = 5678H, DS = 5000H, SS = 2000H

MOV   BX, [SI] 		;move the value in memory address 5000H*16 + 1234H = 51234H to BX
ADD   AX, SS:[BP] 	;move the value in memory address 2000H*16 + 5678H = 25678H to BX

Base or Index Plus Displacement Addressing

In base or index plus displacement addressing, operand is a register plusing an offset with bracket, which means the value in this operand is a memory address, the value in this address is what the instruction need.

   

Considering the following case, assuming SI = 1234H, DS = 5000H

MOV   BX, [SI+4] 	;move the value in memory address 5000H*16 + 1234H + 4 = 51238H to BX

Base and Index Plus Displacement Addressing

In base or index plus displacement addressing, operand is a base register plusing an indexing register (may be multiplied by a scale factor) an offset with bracket, which means the value in this operand is a memory address, the value in this address is what the instruction need.

   

Considering the following case, assuming BX = 1234H, SI = 54H, DS = 5000H

MOV   AX, [BX+SI*2+4] 	;move the value in memory address 5000H*16 + 1234H + 54H*2 + 4 = 51346H to BX

Conclusion

The Instruction Set Architecture (ISA) is quite important for an processor. It hides low level stuffs in circuit level and provides interfaces to develop Operating Systems. x86 is an excellent architectures, for programmers, it's quite important to know the addressing modes, which is stressed in this article. In the next following articles, I'm gonna discuss more on x86 architecture.