CPU Architectures

Chapter 9: CPU Architectures

Introduction

Not all CPUs are created equal. Different architectures (x86, x64, ARM, AArch64) have different instruction sets, registers, calling conventions, and characteristics. Understanding these differences is crucial when writing system software that runs on multiple platforms or when targeting specific hardware.

Why This Matters

System programmers must understand CPU architectures because bootloaders, kernels, and device drivers are architecture-specific. When you write a kernel for ARM, you can't use x86 instructions. When you optimize code, you need to know your target CPU. Cross-platform system software requires understanding of multiple architectures.

How to Study This Chapter

  1. Compare architectures - Note similarities and differences
  2. Research your hardware - What architecture is your computer?
  3. Try cross-compilation - Compile for different targets
  4. Read specifications - Architecture manuals are comprehensive

x86 Architecture (32-bit)

The Intel x86 architecture dominated personal computing for decades.

History

  • 8086 (1978): 16-bit processor, started the x86 line
  • 80386 (1986): First 32-bit x86 processor
  • Pentium (1993+): Continued x86 evolution
  • Still supported in modern CPUs for backward compatibility

Key Characteristics

Registers (32-bit):

General Purpose: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP
Instruction Pointer: EIP
Flags: EFLAGS
Segment: CS, DS, SS, ES, FS, GS

Addressable Memory:

  • 32-bit addresses → maximum 4 GB RAM
  • Uses paging and segmentation

Instruction Set:

  • CISC (Complex Instruction Set Computer)
  • Variable-length instructions (1-15 bytes)
  • Many specialized instructions

Calling Convention (cdecl):

  • Arguments pushed on stack (right to left)
  • Caller cleans up stack
  • Return value in EAX

Example Assembly (NASM, x86):

section .text
global main

main:
    push ebp
    mov ebp, esp
    
    mov eax, 42        ; Return value
    
    mov esp, ebp
    pop ebp
    ret

Advantages

  • Extensive software compatibility
  • Rich instruction set
  • Good performance

Disadvantages

  • Limited to 4 GB RAM
  • Complex architecture
  • Higher power consumption than ARM

x64 Architecture (64-bit)

Also called x86-64, AMD64, or Intel 64. Extends x86 to 64 bits.

Key Characteristics

Registers (64-bit):

General Purpose: RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP
                 R8, R9, R10, R11, R12, R13, R14, R15  (new!)
Instruction Pointer: RIP
Flags: RFLAGS

Addressable Memory:

  • 64-bit addresses → theoretically 16 exabytes
  • Practically limited to 48 bits (256 TB)

Calling Convention (System V AMD64 ABI on Linux):

  • First 6 integer args in: RDI, RSI, RDX, RCX, R8, R9
  • Additional args on stack
  • Return value in RAX
  • Floating-point args in XMM registers

Example Assembly (NASM, x64):

section .text
global main

main:
    push rbp
    mov rbp, rsp
    
    ; Function call: printf("Number: %d\n", 42)
    mov rdi, format      ; First arg: format string
    mov rsi, 42          ; Second arg: number
    xor rax, rax         ; No floating-point args
    call printf
    
    xor rax, rax         ; Return 0
    
    pop rbp
    ret

section .data
    format db "Number: %d", 10, 0

Advantages over x86

  • Much more RAM (> 4 GB)
  • More registers (better performance)
  • Simpler segmentation
  • Backward compatible (can run 32-bit code)

Disadvantages

  • Slightly larger code size
  • Pointers use 8 bytes instead of 4

ARM Architecture (32-bit)

ARM (Advanced RISC Machine) dominates mobile devices, embedded systems, and increasingly servers.

History

  • Developed by ARM Holdings (licenses to manufacturers)
  • Powers smartphones, tablets, IoT devices
  • Raspberry Pi, most mobile phones
  • Apple M1/M2/M3 (AArch64 variant)

Key Characteristics

Registers (32-bit):

General Purpose: R0-R12
Stack Pointer: SP (R13)
Link Register: LR (R14) - stores return address
Program Counter: PC (R15)
Current Program Status Register: CPSR

Instruction Set:

  • RISC (Reduced Instruction Set Computer)
  • Fixed 32-bit instruction length (ARM mode)
  • Also supports Thumb mode (16-bit instructions for code density)

Calling Convention (AAPCS):

  • Arguments in R0-R3
  • Additional args on stack
  • Return value in R0
  • Caller saves R0-R3, callee saves R4-R11

Example Assembly (ARM 32-bit):

.global main

main:
    push {lr}             @ Save return address
    
    mov r0, #42           @ Load 42 into r0
    
    pop {pc}              @ Return (pop into PC)

Advantages

  • Low power consumption
  • Simple, elegant design
  • Excellent for embedded systems
  • Conditional execution of instructions

Disadvantages

  • Limited to 4 GB RAM (32-bit)
  • Some operations require more instructions than x86

ARM Unique Features

Conditional Execution:

cmp r0, #10
addgt r1, r1, #1    @ Add only if r0 > 10 (greater than)
movle r1, #0        @ Move only if r0 <= 10 (less or equal)

Every instruction can be conditionally executed - very efficient!

AArch64 Architecture (64-bit ARM)

The 64-bit extension of ARM, also called ARMv8.

Key Characteristics

Registers (64-bit):

General Purpose: X0-X30 (64-bit) or W0-W30 (lower 32 bits)
Stack Pointer: SP
Link Register: X30 (LR)
Program Counter: PC
Zero Register: XZR/WZR (always reads as 0)

Example:

X0 = 64-bit register
W0 = lower 32 bits of X0

Calling Convention (AAPCS64):

  • Arguments in X0-X7
  • Return value in X0
  • X19-X29 are callee-saved
  • Stack must be 16-byte aligned

Example Assembly (AArch64):

.global main
.align 2

main:
    stp x29, x30, [sp, #-16]!  // Save frame pointer and link register
    mov x29, sp                 // Set up frame pointer
    
    mov x0, #42                 // Return value
    
    ldp x29, x30, [sp], #16     // Restore frame pointer and link register
    ret

Advantages over ARM (32-bit)

  • More registers (31 vs 16)
  • Larger address space
  • Cleaner instruction set
  • Better performance

Key Differences from x64

  • RISC vs CISC
  • Fixed instruction length (32-bit) vs variable (x64)
  • Different calling conventions
  • Different instruction mnemonics

Comparison Table

Featurex86x64ARMAArch64
Bit Width32643264
TypeCISCCISCRISCRISC
Instruction LengthVariableVariableFixed (32-bit)Fixed (32-bit)
Registers8 main16 main1631
Max RAM4 GB256 TB4 GB16 EB (practical: ~1 TB)
Power EfficiencyLowerLowerHigherHigher
Common UseLegacyDesktops/ServersMobile/EmbeddedMobile/Servers
ExamplesOld PCsModern PCsRaspberry Pi 3M1 Mac, Raspberry Pi 4

Endianness

How multi-byte values are stored in memory.

Little-Endian (least significant byte first):

  • x86, x64
  • ARM (configurable, usually little-endian)

Big-Endian (most significant byte first):

  • Some ARM configurations
  • Network protocols

Example: 0x12345678 in memory

Little-Endian:

Address:  0x00  0x01  0x02  0x03
Value:    0x78  0x56  0x34  0x12

Big-Endian:

Address:  0x00  0x01  0x02  0x03
Value:    0x12  0x34  0x56  0x78

Instruction Set Differences

Move Instruction

x86/x64:

mov rax, 42

ARM:

mov r0, #42

AArch64:

mov x0, #42

Function Call

x86 (cdecl):

push 42
call func
add esp, 4    ; Clean up stack

x64 (System V):

mov rdi, 42
call func

ARM:

mov r0, #42
bl func       ; Branch with link

AArch64:

mov x0, #42
bl func

Cross-Compilation

Compiling for a different architecture.

Example: Compile for ARM on x64

# Install cross-compiler
sudo apt-get install gcc-arm-linux-gnueabi

# Compile for ARM
arm-linux-gnueabi-gcc -o program_arm program.c

# Check architecture
file program_arm
# Output: ELF 32-bit LSB executable, ARM

QEMU for Testing

Run binaries for different architectures:

# Install QEMU
sudo apt-get install qemu-user

# Run ARM binary on x64
qemu-arm program_arm

Architecture-Specific Optimizations

SIMD Instructions

Modern CPUs have SIMD (Single Instruction, Multiple Data):

x86/x64: SSE, AVX

movaps xmm0, [array]    ; Load 4 floats
addps xmm0, [array2]    ; Add 4 floats in parallel

ARM: NEON

vld1.32 {q0}, [r0]     ; Load 4 integers
vadd.i32 q0, q0, q1    ; Add vectors

Key Concepts

  • x86 is 32-bit CISC, limited to 4 GB
  • x64 extends x86 to 64-bit with more registers
  • ARM is 32-bit RISC, power-efficient
  • AArch64 is 64-bit ARM with more registers
  • RISC uses fixed-length simple instructions
  • CISC uses variable-length complex instructions
  • Calling conventions differ between architectures
  • Endianness affects multi-byte data storage

Common Mistakes

  1. Assuming x86 everywhere - ARM is huge in mobile/embedded
  2. Ignoring calling conventions - Each architecture differs
  3. Hardcoding register names - Use portable C when possible
  4. Forgetting endianness - Matters for binary file formats
  5. Not testing on target - Cross-compiled code might behave differently

Debugging Tips

  • Know your target - What architecture are you compiling for?
  • Use file command - Check binary architecture
  • Read ABI docs - Calling conventions are documented
  • Test on hardware - Or in QEMU emulator
  • Check compiler flags - Ensure correct target specified

Mini Exercises

  1. Determine your computer's architecture (x86, x64, ARM, AArch64)
  2. Write "Hello World" for your architecture
  3. Disassemble a simple C program and identify registers used
  4. Cross-compile a program for a different architecture
  5. Compare assembly output of same C code for x64 vs ARM
  6. Research calling conventions for each architecture
  7. Write inline assembly that works on multiple architectures
  8. Use QEMU to run ARM binary on x64 system
  9. Identify endianness of your system
  10. Compare instruction set manuals for x64 and ARM

Review Questions

  1. What's the main difference between RISC and CISC?
  2. How many general-purpose registers does x64 have?
  3. What is the maximum addressable memory in 32-bit architectures?
  4. What are the calling convention differences between x86 and x64?
  5. What is endianness and why does it matter?

Reference Checklist

By the end of this chapter, you should be able to:

  • Differentiate between x86, x64, ARM, and AArch64
  • Understand RISC vs CISC
  • Know register sets for each architecture
  • Understand calling conventions
  • Explain endianness
  • Cross-compile for different architectures
  • Write basic assembly for multiple architectures

Next Steps

Now that you understand different CPU architectures, the next chapter dives deeper into advanced assembly programming. You'll learn to implement data structures and algorithms in assembly, and how to interface assembly with C code.


Key Takeaway: Different CPU architectures have different instruction sets, registers, and conventions. Understanding these differences is essential for writing portable system software and targeting specific hardware platforms.