Kernel Fundamentals

Chapter 14: Kernel Fundamentals

Introduction

The kernel is the core of an operating system. It manages hardware resources, provides essential services to applications, and enforces security. Understanding kernel fundamentals is crucial for system programmers, whether you're developing your own OS, writing kernel modules, or debugging system-level issues.

Why This Matters

Everything you do on a computer goes through the kernel. File access, network communication, memory allocation, process creation - all require kernel services. Understanding how kernels work gives you deep insight into computer systems, enables you to write efficient code, and opens doors to systems programming careers.

How to Study This Chapter

  1. Think layers - Understand kernel vs user space separation
  2. Follow syscall path - Trace how applications interact with kernel
  3. Study real kernels - Read Linux kernel code for examples
  4. Build incrementally - Start with minimal kernel, add features gradually

What is a Kernel?

The kernel is privileged software that:

  • Manages hardware resources (CPU, memory, devices)
  • Provides abstraction layer for applications
  • Enforces security and isolation
  • Handles interrupts and exceptions
  • Schedules processes
  • Manages memory
  • Implements file systems
  • Provides system calls

Kernel vs User Space

Modern systems separate kernel and user code:

+---------------------------+
|     User Space            |  Ring 3 (unprivileged)
|   (Applications)          |
+---------------------------+
         | System Calls
         v
+---------------------------+
|     Kernel Space          |  Ring 0 (privileged)
|   (Operating System)      |
+---------------------------+
|      Hardware             |
+---------------------------+

User Space (Ring 3):

  • Applications run here
  • Limited privileges
  • Cannot directly access hardware
  • Protected from other processes
  • Crashes don't affect whole system

Kernel Space (Ring 0):

  • Full hardware access
  • Unlimited privileges
  • Bugs can crash entire system
  • Shared by all kernel code

Kernel Architectures

Monolithic Kernel

All kernel services run in kernel space (single address space).

+---------------------------------------+
|            Kernel Space               |
|  +----------+  +--------+  +--------+ |
|  |   VFS    |  | Sched  |  |  Net   | |
|  +----------+  +--------+  +--------+ |
|  | Device Drivers | Memory Mgmt      | |
|  +----------------+------------------+ |
+---------------------------------------+
|             User Space                |
+---------------------------------------+

Examples: Linux, Unix, BSD

Advantages:

  • High performance (no IPC overhead)
  • Efficient resource sharing
  • Simple design

Disadvantages:

  • Driver bugs can crash kernel
  • Large code base
  • Security issues affect entire kernel

Microkernel

Minimal kernel, most services in user space.

+---------------------------------------+
|            User Space                 |
|  +----------+  +--------+  +--------+ |
|  | File Srv |  | Net Srv|  |  Drivers|
|  +----------+  +--------+  +--------+ |
+---------------------------------------+
      ^              ^             ^
      |  IPC (messages)           |
      v              v             v
+---------------------------------------+
|         Kernel (Microkernel)          |
|   IPC | Scheduling | Memory Mgmt      |
+---------------------------------------+

Examples: MINIX, QNX, seL4

Advantages:

  • More stable (driver crashes don't kill kernel)
  • Better security (isolation)
  • Easier to debug

Disadvantages:

  • Performance overhead (IPC)
  • More complex design

Hybrid Kernel

Combination of monolithic and microkernel.

Examples: Windows NT, macOS (XNU)

Some services in kernel space for performance, others in user space for stability.

System Calls

System calls (syscalls) are the interface between user space and kernel.

How System Calls Work

User Application:
    open("file.txt", O_RDONLY)
        |
        v
    Library wrapper (libc)
        |
        v
    Trigger interrupt/syscall instruction
        |
        v
    Switch to kernel mode
        |
        v
    Kernel syscall handler
        |
        v
    Execute kernel function (sys_open)
        |
        v
    Return to user mode
        |
        v
    Library returns result

x86/x64 System Call Mechanisms

x86 (Legacy - INT 0x80):

; syscall: write(1, "Hello", 5)
mov eax, 4              ; syscall number (sys_write)
mov ebx, 1              ; fd = 1 (stdout)
mov ecx, message        ; buffer
mov edx, 5              ; length
int 0x80                ; Trigger syscall
; Return value in eax

x64 (Modern - SYSCALL):

; syscall: write(1, "Hello", 5)
mov rax, 1              ; syscall number (sys_write on x64)
mov rdi, 1              ; arg1: fd
mov rsi, message        ; arg2: buffer
mov rdx, 5              ; arg3: length
syscall                 ; Fast system call
; Return value in rax

ARM (SVC instruction):

; syscall: write(1, "Hello", 5)
mov r7, #4              @ syscall number
mov r0, #1              @ fd
ldr r1, =message        @ buffer
mov r2, #5              @ length
svc #0                  @ Supervisor call
@ Return value in r0

Implementing System Call Handler

x64 System Call Table:

#define SYSCALL_WRITE 1
#define SYSCALL_READ  2
#define SYSCALL_OPEN  3
#define SYSCALL_CLOSE 4

typedef long (*syscall_fn)(long, long, long, long, long, long);

syscall_fn syscall_table[] = {
    [SYSCALL_WRITE] = sys_write,
    [SYSCALL_READ]  = sys_read,
    [SYSCALL_OPEN]  = sys_open,
    [SYSCALL_CLOSE] = sys_close,
    // ...
};

long syscall_handler(long syscall_num, long arg1, long arg2,
                      long arg3, long arg4, long arg5) {
    if (syscall_num < 0 || syscall_num >= NUM_SYSCALLS) {
        return -ENOSYS;  // Invalid syscall
    }

    return syscall_table[syscall_num](arg1, arg2, arg3, arg4, arg5, 0);
}

Setting up SYSCALL instruction (x64):

void init_syscall(void) {
    // Enable SYSCALL instruction
    uint64_t efer = read_msr(MSR_EFER);
    write_msr(MSR_EFER, efer | EFER_SCE);

    // Set kernel entry point
    write_msr(MSR_LSTAR, (uint64_t)syscall_entry);

    // Set segment selectors
    write_msr(MSR_STAR, ((uint64_t)KERNEL_CS << 32) | (USER_CS << 48));

    // Set syscall flags mask
    write_msr(MSR_SFMASK, 0x200);  // Clear IF (interrupts)
}

Assembly syscall entry (x64):

; syscall_entry.asm
global syscall_entry

syscall_entry:
    ; Save user stack
    mov [user_rsp], rsp
    mov rsp, [kernel_rsp]

    ; Save registers
    push rcx                ; Return RIP
    push r11                ; Return RFLAGS

    ; Arguments already in: rdi, rsi, rdx, r10, r8, r9
    ; Move r10 to rcx (syscall uses r10 instead of rcx)
    mov rcx, r10

    ; Call C handler
    ; rax = syscall number (already set)
    call syscall_handler

    ; Restore registers
    pop r11
    pop rcx

    ; Restore user stack
    mov rsp, [user_rsp]

    ; Return to user mode
    sysretq

Interrupts and Exceptions

Interrupt Descriptor Table (IDT)

The IDT maps interrupt numbers to handler addresses.

x64 IDT Entry:

struct idt_entry {
    uint16_t offset_low;    // Handler address bits 0-15
    uint16_t selector;      // Code segment selector
    uint8_t  ist;           // Interrupt stack table
    uint8_t  type_attr;     // Type and attributes
    uint16_t offset_mid;    // Handler address bits 16-31
    uint32_t offset_high;   // Handler address bits 32-63
    uint32_t reserved;
} __attribute__((packed));

struct idt_ptr {
    uint16_t limit;         // Size of IDT - 1
    uint64_t base;          // Address of IDT
} __attribute__((packed));

Setting up IDT:

struct idt_entry idt[256];
struct idt_ptr idtr;

void set_idt_entry(int num, uint64_t handler, uint16_t selector, uint8_t flags) {
    idt[num].offset_low = handler & 0xFFFF;
    idt[num].selector = selector;
    idt[num].ist = 0;
    idt[num].type_attr = flags;
    idt[num].offset_mid = (handler >> 16) & 0xFFFF;
    idt[num].offset_high = (handler >> 32) & 0xFFFFFFFF;
    idt[num].reserved = 0;
}

void init_idt(void) {
    // Set up exception handlers (0-31)
    set_idt_entry(0, (uint64_t)divide_error_handler, KERNEL_CS, 0x8E);
    set_idt_entry(13, (uint64_t)general_protection_handler, KERNEL_CS, 0x8E);
    set_idt_entry(14, (uint64_t)page_fault_handler, KERNEL_CS, 0x8E);

    // Set up IRQ handlers (32-47)
    set_idt_entry(32, (uint64_t)timer_handler, KERNEL_CS, 0x8E);
    set_idt_entry(33, (uint64_t)keyboard_handler, KERNEL_CS, 0x8E);

    // Load IDT
    idtr.limit = sizeof(idt) - 1;
    idtr.base = (uint64_t)&idt;

    asm volatile("lidt %0" :: "m"(idtr));
}

Exception Handlers

// Divide by zero (#DE)
void divide_error_handler(void) {
    printf("Divide error!\n");
    while(1);  // Halt
}

// General protection fault (#GP)
void general_protection_handler(struct interrupt_frame *frame, uint64_t error) {
    printf("General Protection Fault!\n");
    printf("Error code: 0x%lx\n", error);
    printf("RIP: 0x%lx\n", frame->rip);
    while(1);
}

// Page fault (#PF)
void page_fault_handler(struct interrupt_frame *frame, uint64_t error) {
    uint64_t fault_addr;
    asm("mov %%cr2, %0" : "=r"(fault_addr));

    printf("Page Fault at 0x%lx\n", fault_addr);
    if (error & 0x1) printf("  Page-level protection violation\n");
    else printf("  Non-present page\n");

    if (error & 0x2) printf("  Write access\n");
    else printf("  Read access\n");

    if (error & 0x4) printf("  User mode\n");
    else printf("  Kernel mode\n");

    while(1);
}

Hardware Interrupts (IRQs)

// Programmable Interrupt Controller (PIC) initialization
void init_pic(void) {
    // ICW1: Initialize
    outb(0x20, 0x11);  // Master PIC
    outb(0xA0, 0x11);  // Slave PIC

    // ICW2: Set vector offsets
    outb(0x21, 0x20);  // Master starts at IRQ 32
    outb(0xA1, 0x28);  // Slave starts at IRQ 40

    // ICW3: Tell master about slave
    outb(0x21, 0x04);  // Slave on IRQ2
    outb(0xA1, 0x02);  // Slave identity

    // ICW4: 8086 mode
    outb(0x21, 0x01);
    outb(0xA1, 0x01);

    // Unmask all interrupts
    outb(0x21, 0x00);
    outb(0xA1, 0x00);
}

// Timer interrupt handler (IRQ 0)
void timer_handler(void) {
    static uint64_t tick = 0;
    tick++;

    if (tick % 100 == 0) {
        printf("Timer tick: %lu\n", tick);
    }

    // Send EOI (End of Interrupt)
    outb(0x20, 0x20);
}

// Keyboard interrupt handler (IRQ 1)
void keyboard_handler(void) {
    uint8_t scancode = inb(0x60);
    printf("Key pressed: 0x%x\n", scancode);

    // Send EOI
    outb(0x20, 0x20);
}

Process Management Basics

Process Structure

enum process_state {
    PROCESS_RUNNING,
    PROCESS_READY,
    PROCESS_BLOCKED,
    PROCESS_TERMINATED
};

struct process {
    int pid;
    enum process_state state;
    uint64_t *stack_pointer;   // Saved stack pointer
    uint64_t *page_directory;  // Memory space
    struct process *next;       // Process list
};

struct process *current_process = NULL;
struct process *process_list = NULL;

Context Switching

Saving current process state and restoring another.

; switch_context(old_sp, new_sp)
; Saves current registers to old_sp, loads from new_sp

global switch_context
switch_context:
    ; Save current context
    push rbp
    push rbx
    push r12
    push r13
    push r14
    push r15
    pushf

    ; Save current stack pointer
    mov [rdi], rsp

    ; Load new stack pointer
    mov rsp, rsi

    ; Restore new context
    popf
    pop r15
    pop r14
    pop r13
    pop r12
    pop rbx
    pop rbp

    ret

C wrapper:

void schedule(void) {
    if (!current_process || !current_process->next) {
        return;  // Nothing to schedule
    }

    struct process *old = current_process;
    struct process *new = current_process->next;

    current_process = new;

    // Switch address space
    write_cr3((uint64_t)new->page_directory);

    // Switch context
    switch_context(&old->stack_pointer, new->stack_pointer);
}

Memory Management in Kernel

Physical Memory Allocator

#define PAGE_SIZE 4096

struct page {
    int ref_count;
    struct page *next;
};

struct page *free_list = NULL;

void init_physical_memory(uint64_t mem_start, uint64_t mem_end) {
    for (uint64_t addr = mem_start; addr < mem_end; addr += PAGE_SIZE) {
        struct page *pg = (struct page *)addr;
        pg->ref_count = 0;
        pg->next = free_list;
        free_list = pg;
    }
}

void *alloc_page(void) {
    if (!free_list) {
        return NULL;  // Out of memory
    }

    struct page *pg = free_list;
    free_list = pg->next;
    pg->ref_count = 1;

    return (void *)pg;
}

void free_page(void *addr) {
    struct page *pg = (struct page *)addr;
    pg->ref_count--;

    if (pg->ref_count == 0) {
        pg->next = free_list;
        free_list = pg;
    }
}

Kernel Memory Allocator (Simple)

#define HEAP_START 0xFFFFFFFF80000000
#define HEAP_SIZE  0x100000  // 1 MB

struct heap_block {
    size_t size;
    bool free;
    struct heap_block *next;
};

struct heap_block *heap_head = NULL;

void init_heap(void) {
    heap_head = (struct heap_block *)HEAP_START;
    heap_head->size = HEAP_SIZE - sizeof(struct heap_block);
    heap_head->free = true;
    heap_head->next = NULL;
}

void *kmalloc(size_t size) {
    struct heap_block *current = heap_head;

    while (current) {
        if (current->free && current->size >= size) {
            // Split block if too large
            if (current->size > size + sizeof(struct heap_block)) {
                struct heap_block *new_block =
                    (struct heap_block *)((uint8_t *)current +
                    sizeof(struct heap_block) + size);
                new_block->size = current->size - size - sizeof(struct heap_block);
                new_block->free = true;
                new_block->next = current->next;

                current->size = size;
                current->next = new_block;
            }

            current->free = false;
            return (void *)((uint8_t *)current + sizeof(struct heap_block));
        }
        current = current->next;
    }

    return NULL;  // Out of memory
}

void kfree(void *ptr) {
    if (!ptr) return;

    struct heap_block *block =
        (struct heap_block *)((uint8_t *)ptr - sizeof(struct heap_block));
    block->free = true;

    // Coalesce with next block if free
    if (block->next && block->next->free) {
        block->size += block->next->size + sizeof(struct heap_block);
        block->next = block->next->next;
    }
}

Device Drivers

Driver Interface

struct device_driver {
    const char *name;
    int (*init)(void);
    int (*read)(void *buf, size_t count);
    int (*write)(const void *buf, size_t count);
    void (*cleanup)(void);
};

struct device_driver *drivers[MAX_DRIVERS];
int num_drivers = 0;

int register_driver(struct device_driver *driver) {
    if (num_drivers >= MAX_DRIVERS) {
        return -1;
    }

    drivers[num_drivers++] = driver;

    if (driver->init) {
        return driver->init();
    }

    return 0;
}

Simple Console Driver

#define VGA_MEMORY 0xB8000
#define VGA_WIDTH  80
#define VGA_HEIGHT 25

uint16_t *vga_buffer = (uint16_t *)VGA_MEMORY;
int cursor_x = 0, cursor_y = 0;

void console_putchar(char c) {
    if (c == '\n') {
        cursor_x = 0;
        cursor_y++;
    } else {
        int offset = cursor_y * VGA_WIDTH + cursor_x;
        vga_buffer[offset] = (0x0F << 8) | c;  // White on black
        cursor_x++;
    }

    if (cursor_x >= VGA_WIDTH) {
        cursor_x = 0;
        cursor_y++;
    }

    if (cursor_y >= VGA_HEIGHT) {
        // Scroll
        for (int y = 1; y < VGA_HEIGHT; y++) {
            for (int x = 0; x < VGA_WIDTH; x++) {
                vga_buffer[(y - 1) * VGA_WIDTH + x] =
                    vga_buffer[y * VGA_WIDTH + x];
            }
        }
        cursor_y = VGA_HEIGHT - 1;
        for (int x = 0; x < VGA_WIDTH; x++) {
            vga_buffer[cursor_y * VGA_WIDTH + x] = 0;
        }
    }
}

int console_write(const void *buf, size_t count) {
    const char *str = (const char *)buf;
    for (size_t i = 0; i < count; i++) {
        console_putchar(str[i]);
    }
    return count;
}

Kernel Entry Point

// kernel.c - Main kernel entry point

void kernel_main(void) {
    // Clear screen
    for (int i = 0; i < VGA_WIDTH * VGA_HEIGHT; i++) {
        vga_buffer[i] = 0;
    }

    console_write("Kernel starting...\n", 19);

    // Initialize subsystems
    init_idt();
    init_pic();
    init_physical_memory(0x100000, 0x400000);  // 1MB - 4MB
    init_heap();

    console_write("Interrupts enabled\n", 19);
    asm("sti");  // Enable interrupts

    console_write("Kernel initialized\n", 19);

    // Kernel idle loop
    while (1) {
        asm("hlt");  // Halt until interrupt
    }
}

Key Concepts

  • Kernel manages hardware and provides services to applications
  • User space and kernel space are separated for security
  • System calls allow user programs to request kernel services
  • Interrupts handle asynchronous events (hardware, exceptions)
  • IDT maps interrupt numbers to handlers
  • Context switching saves/restores process state
  • Device drivers abstract hardware for kernel
  • Monolithic vs microkernel architectures

Common Mistakes

  1. Not disabling interrupts - During critical sections
  2. Stack overflows - Kernel stack is limited
  3. Forgetting EOI - PIC requires acknowledgment
  4. Wrong privilege level - Kernel must run in ring 0
  5. Synchronization issues - Interrupts can happen anytime
  6. Memory leaks - Kernel memory is precious
  7. Infinite loops in handlers - Halts entire system

Debugging Tips

  • Use serial port - Easier than video for early kernel
  • Print register dumps - On exceptions
  • Add assertions - Catch bugs early
  • Test incrementally - Add one feature at a time
  • Use QEMU logging - -d int,cpu_reset
  • Check interrupt masks - Verify IRQs are enabled
  • Verify IDT entries - Print IDT contents

Mini Exercises

  1. Implement a minimal kernel that prints to VGA
  2. Set up IDT with exception handlers
  3. Create a timer interrupt handler
  4. Implement basic keyboard input
  5. Write a simple physical page allocator
  6. Create a kmalloc/kfree implementation
  7. Implement system call handler
  8. Write a context switch function
  9. Create a simple round-robin scheduler
  10. Implement serial port driver

Review Questions

  1. What's the difference between kernel space and user space?
  2. How do system calls work?
  3. What is the purpose of the IDT?
  4. What's the difference between exceptions and interrupts?
  5. How does context switching work?

Reference Checklist

By the end of this chapter, you should be able to:

  • Explain kernel architecture (monolithic vs microkernel)
  • Understand kernel/user space separation
  • Implement system call handler
  • Set up Interrupt Descriptor Table
  • Handle exceptions and hardware interrupts
  • Initialize PIC for IRQs
  • Implement basic process structure
  • Perform context switching
  • Allocate physical memory pages
  • Write simple device drivers

Next Steps

With kernel fundamentals understood, the next chapter focuses on x86/x64 kernel development specifically. You'll build a complete minimal kernel for x86/x64 architecture, including bootloader integration, memory management, and multitasking.


Key Takeaway: Kernels are the foundation of operating systems. Understanding interrupts, system calls, memory management, and process scheduling is essential for kernel development and system-level programming.