Introduction to Assembly Language

Chapter 8: Introduction to Assembly Language

Introduction

Assembly language is the lowest level of programming before raw machine code. Each assembly instruction corresponds to a single CPU operation. While modern system programming is mostly done in C, understanding assembly is essential for debugging, optimization, writing bootloaders, and understanding how CPUs actually execute your code.

Why This Matters

Assembly language reveals what the CPU is truly doing. When you debug crashes, analyze performance, write device drivers, or develop operating systems, you'll encounter assembly. Understanding assembly helps you write better C code because you understand what your compiler generates. For certain system-level tasks like bootloaders and kernel initialization, assembly is unavoidable.

How to Study This Chapter

  1. Write code - Assembly makes sense by doing, not just reading
  2. Use NASM - We'll use NASM (Netwide Assembler) for examples
  3. Test in small steps - Start with tiny programs
  4. Read disassembly - See what your C compiler generates
  5. Be patient - Assembly is verbose but logical

What is Assembly Language?

Assembly is a human-readable representation of machine code.

Machine Code (binary):

10110000 01100001

Assembly (mnemonics):

mov al, 97

Meaning: Move the value 97 into the AL register.

Assembly vs Machine Code

Machine Code (hex):  B0 61
Assembly:            mov al, 97

The assembler converts assembly to machine code.

Why Learn Assembly?

1. Understanding Your Code

C code:

int x = 5;
int y = x + 10;

Assembly equivalent (conceptually):

mov eax, 5      ; x = 5
add eax, 10     ; x + 10
mov ebx, eax    ; y = result

2. Debugging

When your program crashes, debuggers show assembly:

Segmentation fault at: mov [eax], 0

Understanding assembly helps you diagnose the problem.

3. Optimization

Compilers are good but not perfect. Sometimes you need to write critical sections in assembly for maximum performance.

4. System Programming Requirements

  • Bootloaders: Start in 16-bit real mode assembly
  • Kernel initialization: Switch CPU modes
  • Context switching: Save/restore all registers
  • Interrupt handlers: Direct hardware interaction

Choosing an Assembler: NASM

NASM (Netwide Assembler) is popular because:

  • Clean, readable syntax
  • Cross-platform (Linux, Windows, macOS)
  • Well-documented
  • Used in many bootloader/kernel tutorials

Install NASM:

# Ubuntu/Debian
sudo apt-get install nasm

# macOS
brew install nasm

# Verify
nasm -v

Alternative assemblers: GAS (GNU Assembler), MASM (Microsoft), FASM, YASM.

CPU Registers

Registers are tiny, ultra-fast storage locations inside the CPU.

x86 32-bit General Purpose Registers

EAX - Accumulator (arithmetic operations)
EBX - Base (base pointer for memory)
ECX - Counter (loop counter)
EDX - Data (I/O operations, arithmetic)
ESI - Source Index (string/memory operations)
EDI - Destination Index (string/memory operations)
EBP - Base Pointer (stack frame pointer)
ESP - Stack Pointer (points to top of stack)

Register Hierarchy (x86)

64-bit: RAX  (entire register)
         |
32-bit: EAX  (lower 32 bits)
         |
16-bit: AX   (lower 16 bits)
         |
 8-bit: AH AL (high 8 bits, low 8 bits)

Example:

mov rax, 0x1234567890ABCDEF  ; 64-bit
; Now:
; RAX = 0x1234567890ABCDEF
; EAX = 0x90ABCDEF (lower 32 bits)
; AX  = 0xCDEF (lower 16 bits)
; AL  = 0xEF (lower 8 bits)
; AH  = 0xCD (bits 8-15)

Special Purpose Registers

EIP - Instruction Pointer (points to next instruction)
EFLAGS - Flags register (status flags: zero, carry, overflow, etc.)

Segment Registers (less commonly used in modern programming)

CS - Code Segment
DS - Data Segment
SS - Stack Segment
ES, FS, GS - Extra segments

Basic Assembly Syntax (NASM)

Instruction Format

label:  instruction  operands  ; comment

Example:

start:  mov eax, 5    ; Move 5 into EAX register

Data Sizes

byte    - 8 bits  (1 byte)
word    - 16 bits (2 bytes)
dword   - 32 bits (4 bytes)
qword   - 64 bits (8 bytes)

Directives

section .data      ; Data segment (initialized data)
section .bss       ; BSS segment (uninitialized data)
section .text      ; Code segment (executable instructions)

global _start      ; Make _start visible to linker

db  - define byte
dw  - define word
dd  - define double word
dq  - define quad word

resb - reserve bytes
resw - reserve words

Your First Assembly Program

Hello World (Linux x86-64)

section .data
    msg db "Hello, Assembly!", 0xA   ; String with newline
    len equ $ - msg                   ; Length of string

section .text
    global _start

_start:
    ; write(1, msg, len) system call
    mov rax, 1          ; sys_write
    mov rdi, 1          ; file descriptor: stdout
    mov rsi, msg        ; pointer to message
    mov rdx, len        ; message length
    syscall             ; invoke system call

    ; exit(0) system call
    mov rax, 60         ; sys_exit
    xor rdi, rdi        ; exit code 0
    syscall             ; invoke system call

Compile and run:

nasm -f elf64 hello.asm -o hello.o
ld hello.o -o hello
./hello

Output:

Hello, Assembly!

Breakdown

  1. section .data: Defines initialized data
  2. msg db: Defines byte string
  3. len equ: Calculates length ($ = current position)
  4. section .text: Code section
  5. global _start: Entry point
  6. mov: Move data between registers/memory
  7. syscall: Make system call (Linux x86-64)

Common Assembly Instructions

Data Movement

mov dest, src       ; Move data: dest = src
lea dest, [addr]    ; Load effective address
push value          ; Push onto stack (ESP -= 4, [ESP] = value)
pop dest            ; Pop from stack (dest = [ESP], ESP += 4)

Examples:

mov eax, 42         ; eax = 42
mov ebx, eax        ; ebx = eax
mov ecx, [var]      ; ecx = value at memory location var
mov [var], eax      ; Store eax value to memory location var

Arithmetic

add dest, src       ; dest = dest + src
sub dest, src       ; dest = dest - src
mul src             ; eax = eax * src (unsigned)
imul src            ; eax = eax * src (signed)
div src             ; eax = eax / src, edx = remainder
inc dest            ; dest = dest + 1
dec dest            ; dest = dest - 1
neg dest            ; dest = -dest

Examples:

mov eax, 10
add eax, 5          ; eax = 15
sub eax, 3          ; eax = 12
inc eax             ; eax = 13

Logical and Bitwise

and dest, src       ; dest = dest & src
or  dest, src       ; dest = dest | src
xor dest, src       ; dest = dest ^ src
not dest            ; dest = ~dest
shl dest, count     ; dest = dest << count (shift left)
shr dest, count     ; dest = dest >> count (shift right)

Examples:

mov al, 0b10101010
and al, 0b11110000  ; al = 0b10100000 (mask)
or  al, 0b00001111  ; al = 0b10101111 (set bits)
xor al, 0b11111111  ; al = 0b01010000 (invert)

Comparison and Jumps

cmp op1, op2        ; Compare (sets flags, doesn't change operands)
test op1, op2       ; Logical AND (sets flags, doesn't change operands)

jmp label           ; Unconditional jump
je  label           ; Jump if equal (ZF = 1)
jne label           ; Jump if not equal (ZF = 0)
jg  label           ; Jump if greater (signed)
jl  label           ; Jump if less (signed)
ja  label           ; Jump if above (unsigned)
jb  label           ; Jump if below (unsigned)

Example: If statement:

mov eax, 10
cmp eax, 5
jg  greater        ; Jump if eax > 5

; eax <= 5
mov ebx, 0
jmp done

greater:
; eax > 5
mov ebx, 1

done:
; Continue

Example: Loop:

mov ecx, 10        ; Counter

loop_start:
    ; Do something
    dec ecx        ; ecx--
    jnz loop_start ; Jump if not zero

; Loop done

Memory Addressing Modes

Immediate

mov eax, 42        ; eax = 42 (value is in instruction)

Register

mov eax, ebx       ; eax = ebx

Direct Memory

mov eax, [var]     ; eax = value at memory address 'var'

Indirect

mov eax, [ebx]     ; eax = value at address stored in ebx

Indexed

mov eax, [ebx + 4]           ; eax = *(ebx + 4)
mov eax, [array + ecx*4]     ; eax = array[ecx] (for int array)

The Stack

The stack is a region of memory for temporary storage, function calls, and local variables.

Stack Operations

push eax           ; ESP -= 4, [ESP] = eax
pop ebx            ; ebx = [ESP], ESP += 4

Visualization:

Before push eax (eax = 0x1234, esp = 0x2000):

     0x2000 <- ESP

After push eax:

     0x1FFC    0x1234 <- ESP
     0x2000

Stack grows downward (toward lower addresses).

Function Calls

call function      ; Push return address, jump to function
ret                ; Pop return address, jump to it

What call does:

  1. Push address of next instruction onto stack
  2. Jump to function

What ret does:

  1. Pop address from stack
  2. Jump to that address

Flags Register

The EFLAGS register contains status flags set by operations.

Common Flags

ZF - Zero Flag (set if result is zero)
CF - Carry Flag (set if unsigned overflow)
SF - Sign Flag (set if result is negative)
OF - Overflow Flag (set if signed overflow)

Example:

mov eax, 5
sub eax, 5    ; eax = 0, ZF = 1 (zero flag set)

mov al, 255
add al, 1     ; al = 0 (wrap), CF = 1 (carry flag set)

System Calls (Linux x86-64)

System calls request services from the kernel.

Making a System Call

mov rax, syscall_number
mov rdi, arg1
mov rsi, arg2
mov rdx, arg3
syscall

Common System Calls

rax = 0:  read(fd, buf, count)
rax = 1:  write(fd, buf, count)
rax = 2:  open(filename, flags, mode)
rax = 3:  close(fd)
rax = 60: exit(status)

Example: Read from stdin:

section .bss
    buffer resb 64

section .text
    global _start

_start:
    ; read(0, buffer, 64)
    mov rax, 0       ; sys_read
    mov rdi, 0       ; stdin
    mov rsi, buffer  ; buffer address
    mov rdx, 64      ; bytes to read
    syscall

    ; exit(0)
    mov rax, 60
    xor rdi, rdi
    syscall

Key Concepts

  • Assembly is human-readable machine code
  • Registers are fast storage inside the CPU
  • Instructions perform operations on registers and memory
  • The stack stores temporary data and function call information
  • Flags indicate results of operations
  • System calls request kernel services
  • Addressing modes access data in different ways

Common Mistakes

  1. Wrong operand order - NASM/Intel syntax is dest, src
  2. Forgetting stack alignment - x64 requires 16-byte alignment
  3. Register size mismatch - Can't mov al, ebx
  4. Not preserving registers - Caller/callee save conventions
  5. Stack imbalance - Every push needs a pop

Debugging Tips

  • Use GDB - Step through assembly instructions
  • Print registers - info registers in GDB
  • Start simple - Get "Hello World" working first
  • Read disassembly - objdump -d shows machine code
  • Check flags - Understand how flags are set

Mini Exercises

  1. Write "Hello, World!" in assembly
  2. Add two numbers and print the result
  3. Implement a loop that counts from 1 to 10
  4. Write a function that returns a value
  5. Use the stack to save and restore registers
  6. Read a character from stdin
  7. Implement a simple if-else statement
  8. Create a program that exits with specific code
  9. Use bitwise operations to test/set bits
  10. Write inline assembly in a C program

Review Questions

  1. What is the difference between assembly and machine code?
  2. Name four general-purpose registers in x86.
  3. What does the syscall instruction do?
  4. How does the stack grow (up or down)?
  5. What flag is set when the result of an operation is zero?

Reference Checklist

By the end of this chapter, you should be able to:

  • Understand what assembly language is
  • Know x86 register names and purposes
  • Write basic assembly programs with NASM
  • Use common instructions (mov, add, sub, jmp)
  • Make Linux system calls
  • Understand the stack
  • Use different addressing modes
  • Compile and run assembly programs

Next Steps

Now that you understand basic assembly, the next chapter explores CPU architectures in detail. You'll learn about x86, x64, ARM, and AArch64 architectures, their differences, and how to write code for different processors.


Key Takeaway: Assembly language provides direct control over the CPU. While verbose, it gives you precise understanding and control over what the computer does, which is essential for system-level programming, bootloaders, and kernels.