Chapter 14: Kernel Fundamentals
Introduction
The kernel is the core of an operating system. It manages hardware resources, provides essential services to applications, and enforces security. Understanding kernel fundamentals is crucial for system programmers, whether you're developing your own OS, writing kernel modules, or debugging system-level issues.
Why This Matters
Everything you do on a computer goes through the kernel. File access, network communication, memory allocation, process creation - all require kernel services. Understanding how kernels work gives you deep insight into computer systems, enables you to write efficient code, and opens doors to systems programming careers.
How to Study This Chapter
- Think layers - Understand kernel vs user space separation
- Follow syscall path - Trace how applications interact with kernel
- Study real kernels - Read Linux kernel code for examples
- Build incrementally - Start with minimal kernel, add features gradually
What is a Kernel?
The kernel is privileged software that:
- Manages hardware resources (CPU, memory, devices)
- Provides abstraction layer for applications
- Enforces security and isolation
- Handles interrupts and exceptions
- Schedules processes
- Manages memory
- Implements file systems
- Provides system calls
Kernel vs User Space
Modern systems separate kernel and user code:
+---------------------------+
| User Space | Ring 3 (unprivileged)
| (Applications) |
+---------------------------+
| System Calls
v
+---------------------------+
| Kernel Space | Ring 0 (privileged)
| (Operating System) |
+---------------------------+
| Hardware |
+---------------------------+
User Space (Ring 3):
- Applications run here
- Limited privileges
- Cannot directly access hardware
- Protected from other processes
- Crashes don't affect whole system
Kernel Space (Ring 0):
- Full hardware access
- Unlimited privileges
- Bugs can crash entire system
- Shared by all kernel code
Kernel Architectures
Monolithic Kernel
All kernel services run in kernel space (single address space).
+---------------------------------------+
| Kernel Space |
| +----------+ +--------+ +--------+ |
| | VFS | | Sched | | Net | |
| +----------+ +--------+ +--------+ |
| | Device Drivers | Memory Mgmt | |
| +----------------+------------------+ |
+---------------------------------------+
| User Space |
+---------------------------------------+
Examples: Linux, Unix, BSD
Advantages:
- High performance (no IPC overhead)
- Efficient resource sharing
- Simple design
Disadvantages:
- Driver bugs can crash kernel
- Large code base
- Security issues affect entire kernel
Microkernel
Minimal kernel, most services in user space.
+---------------------------------------+
| User Space |
| +----------+ +--------+ +--------+ |
| | File Srv | | Net Srv| | Drivers|
| +----------+ +--------+ +--------+ |
+---------------------------------------+
^ ^ ^
| IPC (messages) |
v v v
+---------------------------------------+
| Kernel (Microkernel) |
| IPC | Scheduling | Memory Mgmt |
+---------------------------------------+
Examples: MINIX, QNX, seL4
Advantages:
- More stable (driver crashes don't kill kernel)
- Better security (isolation)
- Easier to debug
Disadvantages:
- Performance overhead (IPC)
- More complex design
Hybrid Kernel
Combination of monolithic and microkernel.
Examples: Windows NT, macOS (XNU)
Some services in kernel space for performance, others in user space for stability.
System Calls
System calls (syscalls) are the interface between user space and kernel.
How System Calls Work
User Application:
open("file.txt", O_RDONLY)
|
v
Library wrapper (libc)
|
v
Trigger interrupt/syscall instruction
|
v
Switch to kernel mode
|
v
Kernel syscall handler
|
v
Execute kernel function (sys_open)
|
v
Return to user mode
|
v
Library returns result
x86/x64 System Call Mechanisms
x86 (Legacy - INT 0x80):
; syscall: write(1, "Hello", 5)
mov eax, 4 ; syscall number (sys_write)
mov ebx, 1 ; fd = 1 (stdout)
mov ecx, message ; buffer
mov edx, 5 ; length
int 0x80 ; Trigger syscall
; Return value in eax
x64 (Modern - SYSCALL):
; syscall: write(1, "Hello", 5)
mov rax, 1 ; syscall number (sys_write on x64)
mov rdi, 1 ; arg1: fd
mov rsi, message ; arg2: buffer
mov rdx, 5 ; arg3: length
syscall ; Fast system call
; Return value in rax
ARM (SVC instruction):
; syscall: write(1, "Hello", 5)
mov r7, #4 @ syscall number
mov r0, #1 @ fd
ldr r1, =message @ buffer
mov r2, #5 @ length
svc #0 @ Supervisor call
@ Return value in r0
Implementing System Call Handler
x64 System Call Table:
#define SYSCALL_WRITE 1
#define SYSCALL_READ 2
#define SYSCALL_OPEN 3
#define SYSCALL_CLOSE 4
typedef long (*syscall_fn)(long, long, long, long, long, long);
syscall_fn syscall_table[] = {
[SYSCALL_WRITE] = sys_write,
[SYSCALL_READ] = sys_read,
[SYSCALL_OPEN] = sys_open,
[SYSCALL_CLOSE] = sys_close,
// ...
};
long syscall_handler(long syscall_num, long arg1, long arg2,
long arg3, long arg4, long arg5) {
if (syscall_num < 0 || syscall_num >= NUM_SYSCALLS) {
return -ENOSYS; // Invalid syscall
}
return syscall_table[syscall_num](arg1, arg2, arg3, arg4, arg5, 0);
}
Setting up SYSCALL instruction (x64):
void init_syscall(void) {
// Enable SYSCALL instruction
uint64_t efer = read_msr(MSR_EFER);
write_msr(MSR_EFER, efer | EFER_SCE);
// Set kernel entry point
write_msr(MSR_LSTAR, (uint64_t)syscall_entry);
// Set segment selectors
write_msr(MSR_STAR, ((uint64_t)KERNEL_CS << 32) | (USER_CS << 48));
// Set syscall flags mask
write_msr(MSR_SFMASK, 0x200); // Clear IF (interrupts)
}
Assembly syscall entry (x64):
; syscall_entry.asm
global syscall_entry
syscall_entry:
; Save user stack
mov [user_rsp], rsp
mov rsp, [kernel_rsp]
; Save registers
push rcx ; Return RIP
push r11 ; Return RFLAGS
; Arguments already in: rdi, rsi, rdx, r10, r8, r9
; Move r10 to rcx (syscall uses r10 instead of rcx)
mov rcx, r10
; Call C handler
; rax = syscall number (already set)
call syscall_handler
; Restore registers
pop r11
pop rcx
; Restore user stack
mov rsp, [user_rsp]
; Return to user mode
sysretq
Interrupts and Exceptions
Interrupt Descriptor Table (IDT)
The IDT maps interrupt numbers to handler addresses.
x64 IDT Entry:
struct idt_entry {
uint16_t offset_low; // Handler address bits 0-15
uint16_t selector; // Code segment selector
uint8_t ist; // Interrupt stack table
uint8_t type_attr; // Type and attributes
uint16_t offset_mid; // Handler address bits 16-31
uint32_t offset_high; // Handler address bits 32-63
uint32_t reserved;
} __attribute__((packed));
struct idt_ptr {
uint16_t limit; // Size of IDT - 1
uint64_t base; // Address of IDT
} __attribute__((packed));
Setting up IDT:
struct idt_entry idt[256];
struct idt_ptr idtr;
void set_idt_entry(int num, uint64_t handler, uint16_t selector, uint8_t flags) {
idt[num].offset_low = handler & 0xFFFF;
idt[num].selector = selector;
idt[num].ist = 0;
idt[num].type_attr = flags;
idt[num].offset_mid = (handler >> 16) & 0xFFFF;
idt[num].offset_high = (handler >> 32) & 0xFFFFFFFF;
idt[num].reserved = 0;
}
void init_idt(void) {
// Set up exception handlers (0-31)
set_idt_entry(0, (uint64_t)divide_error_handler, KERNEL_CS, 0x8E);
set_idt_entry(13, (uint64_t)general_protection_handler, KERNEL_CS, 0x8E);
set_idt_entry(14, (uint64_t)page_fault_handler, KERNEL_CS, 0x8E);
// Set up IRQ handlers (32-47)
set_idt_entry(32, (uint64_t)timer_handler, KERNEL_CS, 0x8E);
set_idt_entry(33, (uint64_t)keyboard_handler, KERNEL_CS, 0x8E);
// Load IDT
idtr.limit = sizeof(idt) - 1;
idtr.base = (uint64_t)&idt;
asm volatile("lidt %0" :: "m"(idtr));
}
Exception Handlers
// Divide by zero (#DE)
void divide_error_handler(void) {
printf("Divide error!\n");
while(1); // Halt
}
// General protection fault (#GP)
void general_protection_handler(struct interrupt_frame *frame, uint64_t error) {
printf("General Protection Fault!\n");
printf("Error code: 0x%lx\n", error);
printf("RIP: 0x%lx\n", frame->rip);
while(1);
}
// Page fault (#PF)
void page_fault_handler(struct interrupt_frame *frame, uint64_t error) {
uint64_t fault_addr;
asm("mov %%cr2, %0" : "=r"(fault_addr));
printf("Page Fault at 0x%lx\n", fault_addr);
if (error & 0x1) printf(" Page-level protection violation\n");
else printf(" Non-present page\n");
if (error & 0x2) printf(" Write access\n");
else printf(" Read access\n");
if (error & 0x4) printf(" User mode\n");
else printf(" Kernel mode\n");
while(1);
}
Hardware Interrupts (IRQs)
// Programmable Interrupt Controller (PIC) initialization
void init_pic(void) {
// ICW1: Initialize
outb(0x20, 0x11); // Master PIC
outb(0xA0, 0x11); // Slave PIC
// ICW2: Set vector offsets
outb(0x21, 0x20); // Master starts at IRQ 32
outb(0xA1, 0x28); // Slave starts at IRQ 40
// ICW3: Tell master about slave
outb(0x21, 0x04); // Slave on IRQ2
outb(0xA1, 0x02); // Slave identity
// ICW4: 8086 mode
outb(0x21, 0x01);
outb(0xA1, 0x01);
// Unmask all interrupts
outb(0x21, 0x00);
outb(0xA1, 0x00);
}
// Timer interrupt handler (IRQ 0)
void timer_handler(void) {
static uint64_t tick = 0;
tick++;
if (tick % 100 == 0) {
printf("Timer tick: %lu\n", tick);
}
// Send EOI (End of Interrupt)
outb(0x20, 0x20);
}
// Keyboard interrupt handler (IRQ 1)
void keyboard_handler(void) {
uint8_t scancode = inb(0x60);
printf("Key pressed: 0x%x\n", scancode);
// Send EOI
outb(0x20, 0x20);
}
Process Management Basics
Process Structure
enum process_state {
PROCESS_RUNNING,
PROCESS_READY,
PROCESS_BLOCKED,
PROCESS_TERMINATED
};
struct process {
int pid;
enum process_state state;
uint64_t *stack_pointer; // Saved stack pointer
uint64_t *page_directory; // Memory space
struct process *next; // Process list
};
struct process *current_process = NULL;
struct process *process_list = NULL;
Context Switching
Saving current process state and restoring another.
; switch_context(old_sp, new_sp)
; Saves current registers to old_sp, loads from new_sp
global switch_context
switch_context:
; Save current context
push rbp
push rbx
push r12
push r13
push r14
push r15
pushf
; Save current stack pointer
mov [rdi], rsp
; Load new stack pointer
mov rsp, rsi
; Restore new context
popf
pop r15
pop r14
pop r13
pop r12
pop rbx
pop rbp
ret
C wrapper:
void schedule(void) {
if (!current_process || !current_process->next) {
return; // Nothing to schedule
}
struct process *old = current_process;
struct process *new = current_process->next;
current_process = new;
// Switch address space
write_cr3((uint64_t)new->page_directory);
// Switch context
switch_context(&old->stack_pointer, new->stack_pointer);
}
Memory Management in Kernel
Physical Memory Allocator
#define PAGE_SIZE 4096
struct page {
int ref_count;
struct page *next;
};
struct page *free_list = NULL;
void init_physical_memory(uint64_t mem_start, uint64_t mem_end) {
for (uint64_t addr = mem_start; addr < mem_end; addr += PAGE_SIZE) {
struct page *pg = (struct page *)addr;
pg->ref_count = 0;
pg->next = free_list;
free_list = pg;
}
}
void *alloc_page(void) {
if (!free_list) {
return NULL; // Out of memory
}
struct page *pg = free_list;
free_list = pg->next;
pg->ref_count = 1;
return (void *)pg;
}
void free_page(void *addr) {
struct page *pg = (struct page *)addr;
pg->ref_count--;
if (pg->ref_count == 0) {
pg->next = free_list;
free_list = pg;
}
}
Kernel Memory Allocator (Simple)
#define HEAP_START 0xFFFFFFFF80000000
#define HEAP_SIZE 0x100000 // 1 MB
struct heap_block {
size_t size;
bool free;
struct heap_block *next;
};
struct heap_block *heap_head = NULL;
void init_heap(void) {
heap_head = (struct heap_block *)HEAP_START;
heap_head->size = HEAP_SIZE - sizeof(struct heap_block);
heap_head->free = true;
heap_head->next = NULL;
}
void *kmalloc(size_t size) {
struct heap_block *current = heap_head;
while (current) {
if (current->free && current->size >= size) {
// Split block if too large
if (current->size > size + sizeof(struct heap_block)) {
struct heap_block *new_block =
(struct heap_block *)((uint8_t *)current +
sizeof(struct heap_block) + size);
new_block->size = current->size - size - sizeof(struct heap_block);
new_block->free = true;
new_block->next = current->next;
current->size = size;
current->next = new_block;
}
current->free = false;
return (void *)((uint8_t *)current + sizeof(struct heap_block));
}
current = current->next;
}
return NULL; // Out of memory
}
void kfree(void *ptr) {
if (!ptr) return;
struct heap_block *block =
(struct heap_block *)((uint8_t *)ptr - sizeof(struct heap_block));
block->free = true;
// Coalesce with next block if free
if (block->next && block->next->free) {
block->size += block->next->size + sizeof(struct heap_block);
block->next = block->next->next;
}
}
Device Drivers
Driver Interface
struct device_driver {
const char *name;
int (*init)(void);
int (*read)(void *buf, size_t count);
int (*write)(const void *buf, size_t count);
void (*cleanup)(void);
};
struct device_driver *drivers[MAX_DRIVERS];
int num_drivers = 0;
int register_driver(struct device_driver *driver) {
if (num_drivers >= MAX_DRIVERS) {
return -1;
}
drivers[num_drivers++] = driver;
if (driver->init) {
return driver->init();
}
return 0;
}
Simple Console Driver
#define VGA_MEMORY 0xB8000
#define VGA_WIDTH 80
#define VGA_HEIGHT 25
uint16_t *vga_buffer = (uint16_t *)VGA_MEMORY;
int cursor_x = 0, cursor_y = 0;
void console_putchar(char c) {
if (c == '\n') {
cursor_x = 0;
cursor_y++;
} else {
int offset = cursor_y * VGA_WIDTH + cursor_x;
vga_buffer[offset] = (0x0F << 8) | c; // White on black
cursor_x++;
}
if (cursor_x >= VGA_WIDTH) {
cursor_x = 0;
cursor_y++;
}
if (cursor_y >= VGA_HEIGHT) {
// Scroll
for (int y = 1; y < VGA_HEIGHT; y++) {
for (int x = 0; x < VGA_WIDTH; x++) {
vga_buffer[(y - 1) * VGA_WIDTH + x] =
vga_buffer[y * VGA_WIDTH + x];
}
}
cursor_y = VGA_HEIGHT - 1;
for (int x = 0; x < VGA_WIDTH; x++) {
vga_buffer[cursor_y * VGA_WIDTH + x] = 0;
}
}
}
int console_write(const void *buf, size_t count) {
const char *str = (const char *)buf;
for (size_t i = 0; i < count; i++) {
console_putchar(str[i]);
}
return count;
}
Kernel Entry Point
// kernel.c - Main kernel entry point
void kernel_main(void) {
// Clear screen
for (int i = 0; i < VGA_WIDTH * VGA_HEIGHT; i++) {
vga_buffer[i] = 0;
}
console_write("Kernel starting...\n", 19);
// Initialize subsystems
init_idt();
init_pic();
init_physical_memory(0x100000, 0x400000); // 1MB - 4MB
init_heap();
console_write("Interrupts enabled\n", 19);
asm("sti"); // Enable interrupts
console_write("Kernel initialized\n", 19);
// Kernel idle loop
while (1) {
asm("hlt"); // Halt until interrupt
}
}
Key Concepts
- Kernel manages hardware and provides services to applications
- User space and kernel space are separated for security
- System calls allow user programs to request kernel services
- Interrupts handle asynchronous events (hardware, exceptions)
- IDT maps interrupt numbers to handlers
- Context switching saves/restores process state
- Device drivers abstract hardware for kernel
- Monolithic vs microkernel architectures
Common Mistakes
- Not disabling interrupts - During critical sections
- Stack overflows - Kernel stack is limited
- Forgetting EOI - PIC requires acknowledgment
- Wrong privilege level - Kernel must run in ring 0
- Synchronization issues - Interrupts can happen anytime
- Memory leaks - Kernel memory is precious
- Infinite loops in handlers - Halts entire system
Debugging Tips
- Use serial port - Easier than video for early kernel
- Print register dumps - On exceptions
- Add assertions - Catch bugs early
- Test incrementally - Add one feature at a time
- Use QEMU logging -
-d int,cpu_reset - Check interrupt masks - Verify IRQs are enabled
- Verify IDT entries - Print IDT contents
Mini Exercises
- Implement a minimal kernel that prints to VGA
- Set up IDT with exception handlers
- Create a timer interrupt handler
- Implement basic keyboard input
- Write a simple physical page allocator
- Create a kmalloc/kfree implementation
- Implement system call handler
- Write a context switch function
- Create a simple round-robin scheduler
- Implement serial port driver
Review Questions
- What's the difference between kernel space and user space?
- How do system calls work?
- What is the purpose of the IDT?
- What's the difference between exceptions and interrupts?
- How does context switching work?
Reference Checklist
By the end of this chapter, you should be able to:
- Explain kernel architecture (monolithic vs microkernel)
- Understand kernel/user space separation
- Implement system call handler
- Set up Interrupt Descriptor Table
- Handle exceptions and hardware interrupts
- Initialize PIC for IRQs
- Implement basic process structure
- Perform context switching
- Allocate physical memory pages
- Write simple device drivers
Next Steps
With kernel fundamentals understood, the next chapter focuses on x86/x64 kernel development specifically. You'll build a complete minimal kernel for x86/x64 architecture, including bootloader integration, memory management, and multitasking.
Key Takeaway: Kernels are the foundation of operating systems. Understanding interrupts, system calls, memory management, and process scheduling is essential for kernel development and system-level programming.