Chapter 8: Introduction to Assembly Language
Introduction
Assembly language is the lowest level of programming before raw machine code. Each assembly instruction corresponds to a single CPU operation. While modern system programming is mostly done in C, understanding assembly is essential for debugging, optimization, writing bootloaders, and understanding how CPUs actually execute your code.
Why This Matters
Assembly language reveals what the CPU is truly doing. When you debug crashes, analyze performance, write device drivers, or develop operating systems, you'll encounter assembly. Understanding assembly helps you write better C code because you understand what your compiler generates. For certain system-level tasks like bootloaders and kernel initialization, assembly is unavoidable.
How to Study This Chapter
- Write code - Assembly makes sense by doing, not just reading
- Use NASM - We'll use NASM (Netwide Assembler) for examples
- Test in small steps - Start with tiny programs
- Read disassembly - See what your C compiler generates
- Be patient - Assembly is verbose but logical
What is Assembly Language?
Assembly is a human-readable representation of machine code.
Machine Code (binary):
10110000 01100001
Assembly (mnemonics):
mov al, 97
Meaning: Move the value 97 into the AL register.
Assembly vs Machine Code
Machine Code (hex): B0 61
Assembly: mov al, 97
The assembler converts assembly to machine code.
Why Learn Assembly?
1. Understanding Your Code
C code:
int x = 5;
int y = x + 10;
Assembly equivalent (conceptually):
mov eax, 5 ; x = 5
add eax, 10 ; x + 10
mov ebx, eax ; y = result
2. Debugging
When your program crashes, debuggers show assembly:
Segmentation fault at: mov [eax], 0
Understanding assembly helps you diagnose the problem.
3. Optimization
Compilers are good but not perfect. Sometimes you need to write critical sections in assembly for maximum performance.
4. System Programming Requirements
- Bootloaders: Start in 16-bit real mode assembly
- Kernel initialization: Switch CPU modes
- Context switching: Save/restore all registers
- Interrupt handlers: Direct hardware interaction
Choosing an Assembler: NASM
NASM (Netwide Assembler) is popular because:
- Clean, readable syntax
- Cross-platform (Linux, Windows, macOS)
- Well-documented
- Used in many bootloader/kernel tutorials
Install NASM:
# Ubuntu/Debian
sudo apt-get install nasm
# macOS
brew install nasm
# Verify
nasm -v
Alternative assemblers: GAS (GNU Assembler), MASM (Microsoft), FASM, YASM.
CPU Registers
Registers are tiny, ultra-fast storage locations inside the CPU.
x86 32-bit General Purpose Registers
EAX - Accumulator (arithmetic operations)
EBX - Base (base pointer for memory)
ECX - Counter (loop counter)
EDX - Data (I/O operations, arithmetic)
ESI - Source Index (string/memory operations)
EDI - Destination Index (string/memory operations)
EBP - Base Pointer (stack frame pointer)
ESP - Stack Pointer (points to top of stack)
Register Hierarchy (x86)
64-bit: RAX (entire register)
|
32-bit: EAX (lower 32 bits)
|
16-bit: AX (lower 16 bits)
|
8-bit: AH AL (high 8 bits, low 8 bits)
Example:
mov rax, 0x1234567890ABCDEF ; 64-bit
; Now:
; RAX = 0x1234567890ABCDEF
; EAX = 0x90ABCDEF (lower 32 bits)
; AX = 0xCDEF (lower 16 bits)
; AL = 0xEF (lower 8 bits)
; AH = 0xCD (bits 8-15)
Special Purpose Registers
EIP - Instruction Pointer (points to next instruction)
EFLAGS - Flags register (status flags: zero, carry, overflow, etc.)
Segment Registers (less commonly used in modern programming)
CS - Code Segment
DS - Data Segment
SS - Stack Segment
ES, FS, GS - Extra segments
Basic Assembly Syntax (NASM)
Instruction Format
label: instruction operands ; comment
Example:
start: mov eax, 5 ; Move 5 into EAX register
Data Sizes
byte - 8 bits (1 byte)
word - 16 bits (2 bytes)
dword - 32 bits (4 bytes)
qword - 64 bits (8 bytes)
Directives
section .data ; Data segment (initialized data)
section .bss ; BSS segment (uninitialized data)
section .text ; Code segment (executable instructions)
global _start ; Make _start visible to linker
db - define byte
dw - define word
dd - define double word
dq - define quad word
resb - reserve bytes
resw - reserve words
Your First Assembly Program
Hello World (Linux x86-64)
section .data
msg db "Hello, Assembly!", 0xA ; String with newline
len equ $ - msg ; Length of string
section .text
global _start
_start:
; write(1, msg, len) system call
mov rax, 1 ; sys_write
mov rdi, 1 ; file descriptor: stdout
mov rsi, msg ; pointer to message
mov rdx, len ; message length
syscall ; invoke system call
; exit(0) system call
mov rax, 60 ; sys_exit
xor rdi, rdi ; exit code 0
syscall ; invoke system call
Compile and run:
nasm -f elf64 hello.asm -o hello.o
ld hello.o -o hello
./hello
Output:
Hello, Assembly!
Breakdown
- section .data: Defines initialized data
- msg db: Defines byte string
- len equ: Calculates length (
$= current position) - section .text: Code section
- global _start: Entry point
- mov: Move data between registers/memory
- syscall: Make system call (Linux x86-64)
Common Assembly Instructions
Data Movement
mov dest, src ; Move data: dest = src
lea dest, [addr] ; Load effective address
push value ; Push onto stack (ESP -= 4, [ESP] = value)
pop dest ; Pop from stack (dest = [ESP], ESP += 4)
Examples:
mov eax, 42 ; eax = 42
mov ebx, eax ; ebx = eax
mov ecx, [var] ; ecx = value at memory location var
mov [var], eax ; Store eax value to memory location var
Arithmetic
add dest, src ; dest = dest + src
sub dest, src ; dest = dest - src
mul src ; eax = eax * src (unsigned)
imul src ; eax = eax * src (signed)
div src ; eax = eax / src, edx = remainder
inc dest ; dest = dest + 1
dec dest ; dest = dest - 1
neg dest ; dest = -dest
Examples:
mov eax, 10
add eax, 5 ; eax = 15
sub eax, 3 ; eax = 12
inc eax ; eax = 13
Logical and Bitwise
and dest, src ; dest = dest & src
or dest, src ; dest = dest | src
xor dest, src ; dest = dest ^ src
not dest ; dest = ~dest
shl dest, count ; dest = dest << count (shift left)
shr dest, count ; dest = dest >> count (shift right)
Examples:
mov al, 0b10101010
and al, 0b11110000 ; al = 0b10100000 (mask)
or al, 0b00001111 ; al = 0b10101111 (set bits)
xor al, 0b11111111 ; al = 0b01010000 (invert)
Comparison and Jumps
cmp op1, op2 ; Compare (sets flags, doesn't change operands)
test op1, op2 ; Logical AND (sets flags, doesn't change operands)
jmp label ; Unconditional jump
je label ; Jump if equal (ZF = 1)
jne label ; Jump if not equal (ZF = 0)
jg label ; Jump if greater (signed)
jl label ; Jump if less (signed)
ja label ; Jump if above (unsigned)
jb label ; Jump if below (unsigned)
Example: If statement:
mov eax, 10
cmp eax, 5
jg greater ; Jump if eax > 5
; eax <= 5
mov ebx, 0
jmp done
greater:
; eax > 5
mov ebx, 1
done:
; Continue
Example: Loop:
mov ecx, 10 ; Counter
loop_start:
; Do something
dec ecx ; ecx--
jnz loop_start ; Jump if not zero
; Loop done
Memory Addressing Modes
Immediate
mov eax, 42 ; eax = 42 (value is in instruction)
Register
mov eax, ebx ; eax = ebx
Direct Memory
mov eax, [var] ; eax = value at memory address 'var'
Indirect
mov eax, [ebx] ; eax = value at address stored in ebx
Indexed
mov eax, [ebx + 4] ; eax = *(ebx + 4)
mov eax, [array + ecx*4] ; eax = array[ecx] (for int array)
The Stack
The stack is a region of memory for temporary storage, function calls, and local variables.
Stack Operations
push eax ; ESP -= 4, [ESP] = eax
pop ebx ; ebx = [ESP], ESP += 4
Visualization:
Before push eax (eax = 0x1234, esp = 0x2000):
0x2000 <- ESP
After push eax:
0x1FFC 0x1234 <- ESP
0x2000
Stack grows downward (toward lower addresses).
Function Calls
call function ; Push return address, jump to function
ret ; Pop return address, jump to it
What call does:
- Push address of next instruction onto stack
- Jump to function
What ret does:
- Pop address from stack
- Jump to that address
Flags Register
The EFLAGS register contains status flags set by operations.
Common Flags
ZF - Zero Flag (set if result is zero)
CF - Carry Flag (set if unsigned overflow)
SF - Sign Flag (set if result is negative)
OF - Overflow Flag (set if signed overflow)
Example:
mov eax, 5
sub eax, 5 ; eax = 0, ZF = 1 (zero flag set)
mov al, 255
add al, 1 ; al = 0 (wrap), CF = 1 (carry flag set)
System Calls (Linux x86-64)
System calls request services from the kernel.
Making a System Call
mov rax, syscall_number
mov rdi, arg1
mov rsi, arg2
mov rdx, arg3
syscall
Common System Calls
rax = 0: read(fd, buf, count)
rax = 1: write(fd, buf, count)
rax = 2: open(filename, flags, mode)
rax = 3: close(fd)
rax = 60: exit(status)
Example: Read from stdin:
section .bss
buffer resb 64
section .text
global _start
_start:
; read(0, buffer, 64)
mov rax, 0 ; sys_read
mov rdi, 0 ; stdin
mov rsi, buffer ; buffer address
mov rdx, 64 ; bytes to read
syscall
; exit(0)
mov rax, 60
xor rdi, rdi
syscall
Key Concepts
- Assembly is human-readable machine code
- Registers are fast storage inside the CPU
- Instructions perform operations on registers and memory
- The stack stores temporary data and function call information
- Flags indicate results of operations
- System calls request kernel services
- Addressing modes access data in different ways
Common Mistakes
- Wrong operand order - NASM/Intel syntax is
dest, src - Forgetting stack alignment - x64 requires 16-byte alignment
- Register size mismatch - Can't
mov al, ebx - Not preserving registers - Caller/callee save conventions
- Stack imbalance - Every push needs a pop
Debugging Tips
- Use GDB - Step through assembly instructions
- Print registers -
info registersin GDB - Start simple - Get "Hello World" working first
- Read disassembly -
objdump -dshows machine code - Check flags - Understand how flags are set
Mini Exercises
- Write "Hello, World!" in assembly
- Add two numbers and print the result
- Implement a loop that counts from 1 to 10
- Write a function that returns a value
- Use the stack to save and restore registers
- Read a character from stdin
- Implement a simple if-else statement
- Create a program that exits with specific code
- Use bitwise operations to test/set bits
- Write inline assembly in a C program
Review Questions
- What is the difference between assembly and machine code?
- Name four general-purpose registers in x86.
- What does the
syscallinstruction do? - How does the stack grow (up or down)?
- What flag is set when the result of an operation is zero?
Reference Checklist
By the end of this chapter, you should be able to:
- Understand what assembly language is
- Know x86 register names and purposes
- Write basic assembly programs with NASM
- Use common instructions (mov, add, sub, jmp)
- Make Linux system calls
- Understand the stack
- Use different addressing modes
- Compile and run assembly programs
Next Steps
Now that you understand basic assembly, the next chapter explores CPU architectures in detail. You'll learn about x86, x64, ARM, and AArch64 architectures, their differences, and how to write code for different processors.
Key Takeaway: Assembly language provides direct control over the CPU. While verbose, it gives you precise understanding and control over what the computer does, which is essential for system-level programming, bootloaders, and kernels.