Compilers, Linkers, and Libraries

Chapter 6: Compilers, Linkers, and Libraries

Introduction

When you write C code and run gcc hello.c -o hello, a complex multi-stage process transforms your source code into an executable binary. Understanding this toolchain - preprocessor, compiler, assembler, and linker - is essential for system programming. This chapter demystifies how code becomes a program.

Why This Matters

System programmers need to understand the compilation process because:

  • You'll debug linking errors and symbol resolution
  • You'll create and use libraries
  • You'll optimize compilation for specific architectures
  • You'll understand binary formats and how programs are loaded
  • You'll write makefiles and build systems

How to Study This Chapter

  1. Experiment - Compile code with different flags
  2. Inspect output - Look at assembly, object files, executables
  3. Break things - See what errors look like
  4. Use tools - nm, objdump, ldd show what's inside binaries

The GCC Compilation Pipeline

GCC (GNU Compiler Collection) performs compilation in stages:

Source Code (.c)
      ↓
[Preprocessor]
      ↓
Preprocessed Code (.i)
      ↓
[Compiler]
      ↓
Assembly Code (.s)
      ↓
[Assembler]
      ↓
Object Code (.o)
      ↓
[Linker]
      ↓
Executable

Stage 1: Preprocessing

The preprocessor handles directives starting with #.

What It Does

#include <stdio.h>    // Include header file
#define MAX 100       // Define macro
#ifdef DEBUG          // Conditional compilation
    printf("Debug mode\n");
#endif

Actions:

  • #include - Paste entire header file contents
  • #define - Text substitution of macros
  • #ifdef/#ifndef - Conditional code inclusion
  • #pragma - Compiler-specific directives

Running Just the Preprocessor

gcc -E hello.c -o hello.i

Example:

Input (hello.c):

#include <stdio.h>
#define NUM 42

int main() {
    printf("Number: %d\n", NUM);
    return 0;
}

Output (hello.i):

// ... thousands of lines from stdio.h ...

int main() {
    printf("Number: %d\n", 42);  // NUM replaced with 42
    return 0;
}

Stage 2: Compilation

The compiler converts C code to assembly language.

What It Does

  • Parses C syntax
  • Checks types
  • Optimizes code
  • Generates assembly for target architecture

Running Just the Compiler

gcc -S hello.c -o hello.s

Example Assembly Output (hello.s for x86-64):

    .file   "hello.c"
    .section    .rodata
.LC0:
    .string "Hello, World!"
    .text
    .globl  main
    .type   main, @function
main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $.LC0, %edi
    call    puts
    movl    $0, %eax
    popq    %rbp
    ret

You can read this! It's the actual CPU instructions your code becomes.

Stage 3: Assembly

The assembler converts assembly to machine code (object file).

What It Does

  • Translates assembly mnemonics to binary opcodes
  • Creates object file (.o or .obj)
  • Includes symbol table (function/variable names and addresses)
  • Not yet executable (needs linking)

Running Just the Assembler

gcc -c hello.c -o hello.o

Object files contain:

  • Machine code for your functions
  • Data sections for variables
  • Symbol table (exported/imported symbols)
  • Relocation information

Inspecting Object Files

# List symbols
nm hello.o

# Disassemble
objdump -d hello.o

# View sections
objdump -h hello.o

Stage 4: Linking

The linker combines object files and libraries into an executable.

What It Does

  • Resolves symbol references (function calls, variables)
  • Combines code and data sections
  • Determines final memory addresses
  • Produces executable file

Example: Multi-File Program

main.c:

extern void greet();

int main() {
    greet();
    return 0;
}

greet.c:

#include <stdio.h>

void greet() {
    printf("Hello!\n");
}

Compilation:

gcc -c main.c -o main.o
gcc -c greet.c -o greet.o
gcc main.o greet.o -o program

The linker:

  1. Sees main.o calls function greet (undefined symbol)
  2. Finds greet defined in greet.o
  3. Resolves the address
  4. Combines into single executable

Libraries

Libraries are collections of reusable code.

Static Libraries (.a on Linux, .lib on Windows)

Characteristics:

  • Linked directly into your executable
  • Code copied into your binary
  • Larger executable size
  • No external dependencies at runtime

Creating a Static Library:

# Compile source files
gcc -c lib1.c -o lib1.o
gcc -c lib2.c -o lib2.o

# Create archive (static library)
ar rcs libmylib.a lib1.o lib2.o

# Link with program
gcc main.c -L. -lmylib -o program

Explanation:

  • ar - archive tool
  • -L. - look for libraries in current directory
  • -lmylib - link with libmylib.a (lib prefix and .a suffix automatic)

Dynamic/Shared Libraries (.so on Linux, .dll on Windows, .dylib on macOS)

Characteristics:

  • Not copied into executable
  • Loaded at runtime
  • Smaller executable
  • Multiple programs can share one copy in memory
  • Can update library without recompiling program

Creating a Shared Library:

# Compile with position-independent code
gcc -fPIC -c lib1.c -o lib1.o
gcc -fPIC -c lib2.c -o lib2.o

# Create shared library
gcc -shared -o libmylib.so lib1.o lib2.o

# Link with program
gcc main.c -L. -lmylib -o program

# Run (need to set library path)
LD_LIBRARY_PATH=. ./program

PIC (Position Independent Code):

  • Code can execute at any memory address
  • Required for shared libraries
  • Slight performance overhead

Static vs Dynamic Comparison

AspectStatic LibraryDynamic Library
Link TimeCopied into executableReference stored
Executable SizeLargerSmaller
Load TimeFasterSlower (must load library)
MemoryDuplicated per programShared among programs
UpdatesRequires recompilationCan update library independently
DependenciesSelf-containedNeeds .so/.dll present

Binary Executable Formats

ELF (Executable and Linkable Format)

Used on Linux and many Unix systems.

ELF Sections:

  • .text - Executable code
  • .data - Initialized global variables
  • .bss - Uninitialized global variables (zeroed)
  • .rodata - Read-only data (constants, string literals)
  • .symtab - Symbol table
  • .strtab - String table (symbol names)

Viewing ELF Structure:

# View sections
readelf -S program

# View symbols
readelf -s program

# View program headers
readelf -l program

Example:

$ readelf -S hello

Section Headers:
  [Nr] Name              Type            Address          Off    Size
  [ 0]                   NULL            0000000000000000 000000 000000
  [ 1] .text             PROGBITS        0000000000401000 001000 000185
  [ 2] .rodata           PROGBITS        0000000000402000 002000 000013
  [ 3] .data             PROGBITS        0000000000404000 003000 000010
  [ 4] .bss              NOBITS          0000000000404010 003010 000008
  ...

PE (Portable Executable)

Used on Windows (.exe, .dll).

Similar concept to ELF but different format.

The Linker's Job in Detail

Symbol Resolution

When you call a function, the compiler generates a reference:

int main() {
    printf("Hello");  // Compiler: "call function printf"
    return 0;
}

Object file contains:

UNDEFINED SYMBOL: printf

Linker searches:

  1. Other object files you specified
  2. Libraries (static then dynamic)
  3. System libraries

If found: Resolves to actual address If not found: Linker error: "undefined reference to printf"

Relocation

Object files contain placeholder addresses:

call 0x0  ; Placeholder, don't know printf's address yet

Linker:

  1. Decides final memory layout
  2. Assigns actual addresses
  3. Patches all references

Result:

call 0x401234  ; printf is at this address

Common Linking Errors

Undefined Reference

undefined reference to `someFunction'

Cause: Function declared but never defined, or forgot to link library.

Fix:

  • Define the function
  • Link the library: gcc main.c -lm (for math library)

Multiple Definition

multiple definition of `globalVar'

Cause: Same symbol defined in multiple object files.

Fix: Use extern or static.

Library Not Found

cannot find -lsomelib

Cause: Library file not in library search path.

Fix: Use -L/path/to/lib to add search directory.

GCC Compilation Flags

Essential Flags

# Output file name
gcc -o program main.c

# Compile without linking
gcc -c main.c

# Enable all warnings
gcc -Wall -Wextra main.c

# Debug symbols
gcc -g main.c

# Optimization
gcc -O2 main.c    # -O0 (none), -O1, -O2, -O3

# Specify C standard
gcc -std=c11 main.c

# Link library
gcc main.c -lm    # Link libm.so (math library)

# Add library search path
gcc main.c -L/usr/local/lib -lmylib

# Add include search path
gcc -I/usr/local/include main.c

# Define macro
gcc -DDEBUG -DMAX=100 main.c

Debugging and Analysis Flags

# Generate assembly
gcc -S main.c

# Preprocess only
gcc -E main.c

# Verbose output (see what gcc is doing)
gcc -v main.c

# Save temporary files
gcc -save-temps main.c

# Position-independent code (for shared libraries)
gcc -fPIC main.c

Build Systems

For large projects with many files, typing gcc commands manually is tedious. Build systems automate compilation.

Makefiles (Make)

Example Makefile:

CC = gcc
CFLAGS = -Wall -Wextra -O2

program: main.o utils.o
	$(CC) $(CFLAGS) -o program main.o utils.o

main.o: main.c utils.h
	$(CC) $(CFLAGS) -c main.c

utils.o: utils.c utils.h
	$(CC) $(CFLAGS) -c utils.c

clean:
	rm -f *.o program

Usage:

make           # Build program
make clean     # Remove generated files

How it works:

  • Dependency graph
  • Only recompiles changed files
  • Saves time in large projects

Key Concepts

  • Compilation has four stages: preprocessing, compiling, assembling, linking
  • Object files contain machine code but aren't executable yet
  • Linker resolves symbols and combines object files
  • Static libraries are copied into executable
  • Shared libraries are loaded at runtime
  • ELF is the binary format on Linux
  • Build systems automate compilation

Common Mistakes

  1. Forgetting -l flag - Can't find library
  2. Wrong library order - Order matters, dependencies last
  3. Missing -fPIC - Shared library compilation fails
  4. Not using -Wall - Miss important warnings
  5. Mixing debug/release - Inconsistent behavior

Debugging Tips

  • Use nm - See what symbols are in object files/libraries
  • Use ldd - See what libraries executable depends on
  • Use objdump - Disassemble and inspect binaries
  • Check library paths - Use LD_LIBRARY_PATH for testing
  • Read linker errors carefully - Usually tell you exactly what's wrong

Mini Exercises

  1. Compile a C file in stages (stop at each stage and inspect output)
  2. Create a multi-file program and compile it
  3. Create a static library and link against it
  4. Create a shared library and use it
  5. Use nm to inspect symbols in an object file
  6. Disassemble an executable with objdump -d
  7. Write a simple Makefile
  8. Use readelf to examine ELF structure
  9. Intentionally cause an "undefined reference" error and fix it
  10. Compare executable sizes with static vs dynamic linking

Review Questions

  1. What are the four stages of compilation?
  2. What does the linker do?
  3. What's the difference between static and shared libraries?
  4. What is ELF and what sections does it contain?
  5. Why do we need position-independent code for shared libraries?

Reference Checklist

By the end of this chapter, you should be able to:

  • Understand the GCC compilation pipeline
  • Compile code in stages
  • Create and use static libraries
  • Create and use shared libraries
  • Understand ELF format basics
  • Use GCC flags effectively
  • Debug linking errors
  • Write basic Makefiles

Next Steps

Now that you understand how C code becomes executable machine code, the next chapter explores data structures implemented in C. You'll learn how arrays, linked lists, stacks, and queues are actually laid out in memory and how to implement them efficiently at a low level.


Key Takeaway: The journey from source code to executable involves preprocessing, compiling to assembly, assembling to object code, and linking. Understanding this process helps you debug errors, optimize builds, and create libraries for system programming.