Chapter 9: CPU Architectures
Introduction
Not all CPUs are created equal. Different architectures (x86, x64, ARM, AArch64) have different instruction sets, registers, calling conventions, and characteristics. Understanding these differences is crucial when writing system software that runs on multiple platforms or when targeting specific hardware.
Why This Matters
System programmers must understand CPU architectures because bootloaders, kernels, and device drivers are architecture-specific. When you write a kernel for ARM, you can't use x86 instructions. When you optimize code, you need to know your target CPU. Cross-platform system software requires understanding of multiple architectures.
How to Study This Chapter
- Compare architectures - Note similarities and differences
- Research your hardware - What architecture is your computer?
- Try cross-compilation - Compile for different targets
- Read specifications - Architecture manuals are comprehensive
x86 Architecture (32-bit)
The Intel x86 architecture dominated personal computing for decades.
History
- 8086 (1978): 16-bit processor, started the x86 line
- 80386 (1986): First 32-bit x86 processor
- Pentium (1993+): Continued x86 evolution
- Still supported in modern CPUs for backward compatibility
Key Characteristics
Registers (32-bit):
General Purpose: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP
Instruction Pointer: EIP
Flags: EFLAGS
Segment: CS, DS, SS, ES, FS, GS
Addressable Memory:
- 32-bit addresses → maximum 4 GB RAM
- Uses paging and segmentation
Instruction Set:
- CISC (Complex Instruction Set Computer)
- Variable-length instructions (1-15 bytes)
- Many specialized instructions
Calling Convention (cdecl):
- Arguments pushed on stack (right to left)
- Caller cleans up stack
- Return value in EAX
Example Assembly (NASM, x86):
section .text
global main
main:
push ebp
mov ebp, esp
mov eax, 42 ; Return value
mov esp, ebp
pop ebp
ret
Advantages
- Extensive software compatibility
- Rich instruction set
- Good performance
Disadvantages
- Limited to 4 GB RAM
- Complex architecture
- Higher power consumption than ARM
x64 Architecture (64-bit)
Also called x86-64, AMD64, or Intel 64. Extends x86 to 64 bits.
Key Characteristics
Registers (64-bit):
General Purpose: RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP
R8, R9, R10, R11, R12, R13, R14, R15 (new!)
Instruction Pointer: RIP
Flags: RFLAGS
Addressable Memory:
- 64-bit addresses → theoretically 16 exabytes
- Practically limited to 48 bits (256 TB)
Calling Convention (System V AMD64 ABI on Linux):
- First 6 integer args in: RDI, RSI, RDX, RCX, R8, R9
- Additional args on stack
- Return value in RAX
- Floating-point args in XMM registers
Example Assembly (NASM, x64):
section .text
global main
main:
push rbp
mov rbp, rsp
; Function call: printf("Number: %d\n", 42)
mov rdi, format ; First arg: format string
mov rsi, 42 ; Second arg: number
xor rax, rax ; No floating-point args
call printf
xor rax, rax ; Return 0
pop rbp
ret
section .data
format db "Number: %d", 10, 0
Advantages over x86
- Much more RAM (> 4 GB)
- More registers (better performance)
- Simpler segmentation
- Backward compatible (can run 32-bit code)
Disadvantages
- Slightly larger code size
- Pointers use 8 bytes instead of 4
ARM Architecture (32-bit)
ARM (Advanced RISC Machine) dominates mobile devices, embedded systems, and increasingly servers.
History
- Developed by ARM Holdings (licenses to manufacturers)
- Powers smartphones, tablets, IoT devices
- Raspberry Pi, most mobile phones
- Apple M1/M2/M3 (AArch64 variant)
Key Characteristics
Registers (32-bit):
General Purpose: R0-R12
Stack Pointer: SP (R13)
Link Register: LR (R14) - stores return address
Program Counter: PC (R15)
Current Program Status Register: CPSR
Instruction Set:
- RISC (Reduced Instruction Set Computer)
- Fixed 32-bit instruction length (ARM mode)
- Also supports Thumb mode (16-bit instructions for code density)
Calling Convention (AAPCS):
- Arguments in R0-R3
- Additional args on stack
- Return value in R0
- Caller saves R0-R3, callee saves R4-R11
Example Assembly (ARM 32-bit):
.global main
main:
push {lr} @ Save return address
mov r0, #42 @ Load 42 into r0
pop {pc} @ Return (pop into PC)
Advantages
- Low power consumption
- Simple, elegant design
- Excellent for embedded systems
- Conditional execution of instructions
Disadvantages
- Limited to 4 GB RAM (32-bit)
- Some operations require more instructions than x86
ARM Unique Features
Conditional Execution:
cmp r0, #10
addgt r1, r1, #1 @ Add only if r0 > 10 (greater than)
movle r1, #0 @ Move only if r0 <= 10 (less or equal)
Every instruction can be conditionally executed - very efficient!
AArch64 Architecture (64-bit ARM)
The 64-bit extension of ARM, also called ARMv8.
Key Characteristics
Registers (64-bit):
General Purpose: X0-X30 (64-bit) or W0-W30 (lower 32 bits)
Stack Pointer: SP
Link Register: X30 (LR)
Program Counter: PC
Zero Register: XZR/WZR (always reads as 0)
Example:
X0 = 64-bit register
W0 = lower 32 bits of X0
Calling Convention (AAPCS64):
- Arguments in X0-X7
- Return value in X0
- X19-X29 are callee-saved
- Stack must be 16-byte aligned
Example Assembly (AArch64):
.global main
.align 2
main:
stp x29, x30, [sp, #-16]! // Save frame pointer and link register
mov x29, sp // Set up frame pointer
mov x0, #42 // Return value
ldp x29, x30, [sp], #16 // Restore frame pointer and link register
ret
Advantages over ARM (32-bit)
- More registers (31 vs 16)
- Larger address space
- Cleaner instruction set
- Better performance
Key Differences from x64
- RISC vs CISC
- Fixed instruction length (32-bit) vs variable (x64)
- Different calling conventions
- Different instruction mnemonics
Comparison Table
| Feature | x86 | x64 | ARM | AArch64 |
|---|---|---|---|---|
| Bit Width | 32 | 64 | 32 | 64 |
| Type | CISC | CISC | RISC | RISC |
| Instruction Length | Variable | Variable | Fixed (32-bit) | Fixed (32-bit) |
| Registers | 8 main | 16 main | 16 | 31 |
| Max RAM | 4 GB | 256 TB | 4 GB | 16 EB (practical: ~1 TB) |
| Power Efficiency | Lower | Lower | Higher | Higher |
| Common Use | Legacy | Desktops/Servers | Mobile/Embedded | Mobile/Servers |
| Examples | Old PCs | Modern PCs | Raspberry Pi 3 | M1 Mac, Raspberry Pi 4 |
Endianness
How multi-byte values are stored in memory.
Little-Endian (least significant byte first):
- x86, x64
- ARM (configurable, usually little-endian)
Big-Endian (most significant byte first):
- Some ARM configurations
- Network protocols
Example: 0x12345678 in memory
Little-Endian:
Address: 0x00 0x01 0x02 0x03
Value: 0x78 0x56 0x34 0x12
Big-Endian:
Address: 0x00 0x01 0x02 0x03
Value: 0x12 0x34 0x56 0x78
Instruction Set Differences
Move Instruction
x86/x64:
mov rax, 42
ARM:
mov r0, #42
AArch64:
mov x0, #42
Function Call
x86 (cdecl):
push 42
call func
add esp, 4 ; Clean up stack
x64 (System V):
mov rdi, 42
call func
ARM:
mov r0, #42
bl func ; Branch with link
AArch64:
mov x0, #42
bl func
Cross-Compilation
Compiling for a different architecture.
Example: Compile for ARM on x64
# Install cross-compiler
sudo apt-get install gcc-arm-linux-gnueabi
# Compile for ARM
arm-linux-gnueabi-gcc -o program_arm program.c
# Check architecture
file program_arm
# Output: ELF 32-bit LSB executable, ARM
QEMU for Testing
Run binaries for different architectures:
# Install QEMU
sudo apt-get install qemu-user
# Run ARM binary on x64
qemu-arm program_arm
Architecture-Specific Optimizations
SIMD Instructions
Modern CPUs have SIMD (Single Instruction, Multiple Data):
x86/x64: SSE, AVX
movaps xmm0, [array] ; Load 4 floats
addps xmm0, [array2] ; Add 4 floats in parallel
ARM: NEON
vld1.32 {q0}, [r0] ; Load 4 integers
vadd.i32 q0, q0, q1 ; Add vectors
Key Concepts
- x86 is 32-bit CISC, limited to 4 GB
- x64 extends x86 to 64-bit with more registers
- ARM is 32-bit RISC, power-efficient
- AArch64 is 64-bit ARM with more registers
- RISC uses fixed-length simple instructions
- CISC uses variable-length complex instructions
- Calling conventions differ between architectures
- Endianness affects multi-byte data storage
Common Mistakes
- Assuming x86 everywhere - ARM is huge in mobile/embedded
- Ignoring calling conventions - Each architecture differs
- Hardcoding register names - Use portable C when possible
- Forgetting endianness - Matters for binary file formats
- Not testing on target - Cross-compiled code might behave differently
Debugging Tips
- Know your target - What architecture are you compiling for?
- Use file command - Check binary architecture
- Read ABI docs - Calling conventions are documented
- Test on hardware - Or in QEMU emulator
- Check compiler flags - Ensure correct target specified
Mini Exercises
- Determine your computer's architecture (x86, x64, ARM, AArch64)
- Write "Hello World" for your architecture
- Disassemble a simple C program and identify registers used
- Cross-compile a program for a different architecture
- Compare assembly output of same C code for x64 vs ARM
- Research calling conventions for each architecture
- Write inline assembly that works on multiple architectures
- Use QEMU to run ARM binary on x64 system
- Identify endianness of your system
- Compare instruction set manuals for x64 and ARM
Review Questions
- What's the main difference between RISC and CISC?
- How many general-purpose registers does x64 have?
- What is the maximum addressable memory in 32-bit architectures?
- What are the calling convention differences between x86 and x64?
- What is endianness and why does it matter?
Reference Checklist
By the end of this chapter, you should be able to:
- Differentiate between x86, x64, ARM, and AArch64
- Understand RISC vs CISC
- Know register sets for each architecture
- Understand calling conventions
- Explain endianness
- Cross-compile for different architectures
- Write basic assembly for multiple architectures
Next Steps
Now that you understand different CPU architectures, the next chapter dives deeper into advanced assembly programming. You'll learn to implement data structures and algorithms in assembly, and how to interface assembly with C code.
Key Takeaway: Different CPU architectures have different instruction sets, registers, and conventions. Understanding these differences is essential for writing portable system software and targeting specific hardware platforms.