Chapter 6: Compilers, Linkers, and Libraries
Introduction
When you write C code and run gcc hello.c -o hello, a complex multi-stage process transforms your source code into an executable binary. Understanding this toolchain - preprocessor, compiler, assembler, and linker - is essential for system programming. This chapter demystifies how code becomes a program.
Why This Matters
System programmers need to understand the compilation process because:
- You'll debug linking errors and symbol resolution
- You'll create and use libraries
- You'll optimize compilation for specific architectures
- You'll understand binary formats and how programs are loaded
- You'll write makefiles and build systems
How to Study This Chapter
- Experiment - Compile code with different flags
- Inspect output - Look at assembly, object files, executables
- Break things - See what errors look like
- Use tools - nm, objdump, ldd show what's inside binaries
The GCC Compilation Pipeline
GCC (GNU Compiler Collection) performs compilation in stages:
Source Code (.c)
↓
[Preprocessor]
↓
Preprocessed Code (.i)
↓
[Compiler]
↓
Assembly Code (.s)
↓
[Assembler]
↓
Object Code (.o)
↓
[Linker]
↓
Executable
Stage 1: Preprocessing
The preprocessor handles directives starting with #.
What It Does
#include <stdio.h> // Include header file
#define MAX 100 // Define macro
#ifdef DEBUG // Conditional compilation
printf("Debug mode\n");
#endif
Actions:
- #include - Paste entire header file contents
- #define - Text substitution of macros
- #ifdef/#ifndef - Conditional code inclusion
- #pragma - Compiler-specific directives
Running Just the Preprocessor
gcc -E hello.c -o hello.i
Example:
Input (hello.c):
#include <stdio.h>
#define NUM 42
int main() {
printf("Number: %d\n", NUM);
return 0;
}
Output (hello.i):
// ... thousands of lines from stdio.h ...
int main() {
printf("Number: %d\n", 42); // NUM replaced with 42
return 0;
}
Stage 2: Compilation
The compiler converts C code to assembly language.
What It Does
- Parses C syntax
- Checks types
- Optimizes code
- Generates assembly for target architecture
Running Just the Compiler
gcc -S hello.c -o hello.s
Example Assembly Output (hello.s for x86-64):
.file "hello.c"
.section .rodata
.LC0:
.string "Hello, World!"
.text
.globl main
.type main, @function
main:
pushq %rbp
movq %rsp, %rbp
movl $.LC0, %edi
call puts
movl $0, %eax
popq %rbp
ret
You can read this! It's the actual CPU instructions your code becomes.
Stage 3: Assembly
The assembler converts assembly to machine code (object file).
What It Does
- Translates assembly mnemonics to binary opcodes
- Creates object file (.o or .obj)
- Includes symbol table (function/variable names and addresses)
- Not yet executable (needs linking)
Running Just the Assembler
gcc -c hello.c -o hello.o
Object files contain:
- Machine code for your functions
- Data sections for variables
- Symbol table (exported/imported symbols)
- Relocation information
Inspecting Object Files
# List symbols
nm hello.o
# Disassemble
objdump -d hello.o
# View sections
objdump -h hello.o
Stage 4: Linking
The linker combines object files and libraries into an executable.
What It Does
- Resolves symbol references (function calls, variables)
- Combines code and data sections
- Determines final memory addresses
- Produces executable file
Example: Multi-File Program
main.c:
extern void greet();
int main() {
greet();
return 0;
}
greet.c:
#include <stdio.h>
void greet() {
printf("Hello!\n");
}
Compilation:
gcc -c main.c -o main.o
gcc -c greet.c -o greet.o
gcc main.o greet.o -o program
The linker:
- Sees
main.ocalls functiongreet(undefined symbol) - Finds
greetdefined ingreet.o - Resolves the address
- Combines into single executable
Libraries
Libraries are collections of reusable code.
Static Libraries (.a on Linux, .lib on Windows)
Characteristics:
- Linked directly into your executable
- Code copied into your binary
- Larger executable size
- No external dependencies at runtime
Creating a Static Library:
# Compile source files
gcc -c lib1.c -o lib1.o
gcc -c lib2.c -o lib2.o
# Create archive (static library)
ar rcs libmylib.a lib1.o lib2.o
# Link with program
gcc main.c -L. -lmylib -o program
Explanation:
ar- archive tool-L.- look for libraries in current directory-lmylib- link with libmylib.a (lib prefix and .a suffix automatic)
Dynamic/Shared Libraries (.so on Linux, .dll on Windows, .dylib on macOS)
Characteristics:
- Not copied into executable
- Loaded at runtime
- Smaller executable
- Multiple programs can share one copy in memory
- Can update library without recompiling program
Creating a Shared Library:
# Compile with position-independent code
gcc -fPIC -c lib1.c -o lib1.o
gcc -fPIC -c lib2.c -o lib2.o
# Create shared library
gcc -shared -o libmylib.so lib1.o lib2.o
# Link with program
gcc main.c -L. -lmylib -o program
# Run (need to set library path)
LD_LIBRARY_PATH=. ./program
PIC (Position Independent Code):
- Code can execute at any memory address
- Required for shared libraries
- Slight performance overhead
Static vs Dynamic Comparison
| Aspect | Static Library | Dynamic Library |
|---|---|---|
| Link Time | Copied into executable | Reference stored |
| Executable Size | Larger | Smaller |
| Load Time | Faster | Slower (must load library) |
| Memory | Duplicated per program | Shared among programs |
| Updates | Requires recompilation | Can update library independently |
| Dependencies | Self-contained | Needs .so/.dll present |
Binary Executable Formats
ELF (Executable and Linkable Format)
Used on Linux and many Unix systems.
ELF Sections:
.text- Executable code.data- Initialized global variables.bss- Uninitialized global variables (zeroed).rodata- Read-only data (constants, string literals).symtab- Symbol table.strtab- String table (symbol names)
Viewing ELF Structure:
# View sections
readelf -S program
# View symbols
readelf -s program
# View program headers
readelf -l program
Example:
$ readelf -S hello
Section Headers:
[Nr] Name Type Address Off Size
[ 0] NULL 0000000000000000 000000 000000
[ 1] .text PROGBITS 0000000000401000 001000 000185
[ 2] .rodata PROGBITS 0000000000402000 002000 000013
[ 3] .data PROGBITS 0000000000404000 003000 000010
[ 4] .bss NOBITS 0000000000404010 003010 000008
...
PE (Portable Executable)
Used on Windows (.exe, .dll).
Similar concept to ELF but different format.
The Linker's Job in Detail
Symbol Resolution
When you call a function, the compiler generates a reference:
int main() {
printf("Hello"); // Compiler: "call function printf"
return 0;
}
Object file contains:
UNDEFINED SYMBOL: printf
Linker searches:
- Other object files you specified
- Libraries (static then dynamic)
- System libraries
If found: Resolves to actual address
If not found: Linker error: "undefined reference to printf"
Relocation
Object files contain placeholder addresses:
call 0x0 ; Placeholder, don't know printf's address yet
Linker:
- Decides final memory layout
- Assigns actual addresses
- Patches all references
Result:
call 0x401234 ; printf is at this address
Common Linking Errors
Undefined Reference
undefined reference to `someFunction'
Cause: Function declared but never defined, or forgot to link library.
Fix:
- Define the function
- Link the library:
gcc main.c -lm(for math library)
Multiple Definition
multiple definition of `globalVar'
Cause: Same symbol defined in multiple object files.
Fix: Use extern or static.
Library Not Found
cannot find -lsomelib
Cause: Library file not in library search path.
Fix: Use -L/path/to/lib to add search directory.
GCC Compilation Flags
Essential Flags
# Output file name
gcc -o program main.c
# Compile without linking
gcc -c main.c
# Enable all warnings
gcc -Wall -Wextra main.c
# Debug symbols
gcc -g main.c
# Optimization
gcc -O2 main.c # -O0 (none), -O1, -O2, -O3
# Specify C standard
gcc -std=c11 main.c
# Link library
gcc main.c -lm # Link libm.so (math library)
# Add library search path
gcc main.c -L/usr/local/lib -lmylib
# Add include search path
gcc -I/usr/local/include main.c
# Define macro
gcc -DDEBUG -DMAX=100 main.c
Debugging and Analysis Flags
# Generate assembly
gcc -S main.c
# Preprocess only
gcc -E main.c
# Verbose output (see what gcc is doing)
gcc -v main.c
# Save temporary files
gcc -save-temps main.c
# Position-independent code (for shared libraries)
gcc -fPIC main.c
Build Systems
For large projects with many files, typing gcc commands manually is tedious. Build systems automate compilation.
Makefiles (Make)
Example Makefile:
CC = gcc
CFLAGS = -Wall -Wextra -O2
program: main.o utils.o
$(CC) $(CFLAGS) -o program main.o utils.o
main.o: main.c utils.h
$(CC) $(CFLAGS) -c main.c
utils.o: utils.c utils.h
$(CC) $(CFLAGS) -c utils.c
clean:
rm -f *.o program
Usage:
make # Build program
make clean # Remove generated files
How it works:
- Dependency graph
- Only recompiles changed files
- Saves time in large projects
Key Concepts
- Compilation has four stages: preprocessing, compiling, assembling, linking
- Object files contain machine code but aren't executable yet
- Linker resolves symbols and combines object files
- Static libraries are copied into executable
- Shared libraries are loaded at runtime
- ELF is the binary format on Linux
- Build systems automate compilation
Common Mistakes
- Forgetting -l flag - Can't find library
- Wrong library order - Order matters, dependencies last
- Missing -fPIC - Shared library compilation fails
- Not using -Wall - Miss important warnings
- Mixing debug/release - Inconsistent behavior
Debugging Tips
- Use nm - See what symbols are in object files/libraries
- Use ldd - See what libraries executable depends on
- Use objdump - Disassemble and inspect binaries
- Check library paths - Use
LD_LIBRARY_PATHfor testing - Read linker errors carefully - Usually tell you exactly what's wrong
Mini Exercises
- Compile a C file in stages (stop at each stage and inspect output)
- Create a multi-file program and compile it
- Create a static library and link against it
- Create a shared library and use it
- Use
nmto inspect symbols in an object file - Disassemble an executable with
objdump -d - Write a simple Makefile
- Use
readelfto examine ELF structure - Intentionally cause an "undefined reference" error and fix it
- Compare executable sizes with static vs dynamic linking
Review Questions
- What are the four stages of compilation?
- What does the linker do?
- What's the difference between static and shared libraries?
- What is ELF and what sections does it contain?
- Why do we need position-independent code for shared libraries?
Reference Checklist
By the end of this chapter, you should be able to:
- Understand the GCC compilation pipeline
- Compile code in stages
- Create and use static libraries
- Create and use shared libraries
- Understand ELF format basics
- Use GCC flags effectively
- Debug linking errors
- Write basic Makefiles
Next Steps
Now that you understand how C code becomes executable machine code, the next chapter explores data structures implemented in C. You'll learn how arrays, linked lists, stacks, and queues are actually laid out in memory and how to implement them efficiently at a low level.
Key Takeaway: The journey from source code to executable involves preprocessing, compiling to assembly, assembling to object code, and linking. Understanding this process helps you debug errors, optimize builds, and create libraries for system programming.