Chapter 7: Data Structures in C

Introduction

Data structures organize and store data efficiently. In high-level languages, data structures are often built-in or provided by libraries. In C, you implement them yourself, giving you complete control over memory layout and performance. Understanding data structures at this low level is crucial for system programming.

Why This Matters

System software needs efficient data structures. The Linux kernel uses linked lists, hash tables, red-black trees, and more. Device drivers use queues for buffering. Memory allocators use complex tree structures. Understanding how data structures work at the memory level helps you choose the right one and implement it efficiently.

How to Study This Chapter

Draw memory diagrams - Visualize how data is laid out
Implement from scratch - Don't just read, code it
Analyze complexity - Think about time and space trade-offs
Test thoroughly - Edge cases matter in system code

Arrays

Arrays are the simplest data structure - contiguous memory locations.

Fixed-Size Arrays

int numbers[5] = {10, 20, 30, 40, 50};

Memory Layout:

Address    Value
0x1000     10
0x1004     20
0x1008     30
0x100C     40
0x1010     50

Characteristics:

O(1) access - Direct indexing: numbers[2] = *(numbers + 2)
Fixed size - Can't grow or shrink
Contiguous memory - All elements together
Cache-friendly - Sequential access is fast

Dynamic Arrays

int *arr = malloc(5 * sizeof(int));
arr[0] = 10;
arr[1] = 20;
// ...
free(arr);

Resizing (manual reallocation):

int capacity = 5;
int *arr = malloc(capacity * sizeof(int));

// Need more space
capacity *= 2;
arr = realloc(arr, capacity * sizeof(int));

free(arr);

2D Arrays

// Stack allocation
int matrix[3][4];  // 3 rows, 4 columns

// Heap allocation
int **matrix = malloc(3 * sizeof(int*));
for (int i = 0; i < 3; i++) {
    matrix[i] = malloc(4 * sizeof(int));
}

// Don't forget to free
for (int i = 0; i < 3; i++) {
    free(matrix[i]);
}
free(matrix);

Linked Lists

A sequence of nodes where each node contains data and a pointer to the next node.

Singly Linked List

Structure Definition:

typedef struct Node {
    int data;
    struct Node *next;
} Node;

Memory Layout:

[10 | next] -> [20 | next] -> [30 | NULL]

Create Node:

Node* createNode(int data) {
    Node *newNode = malloc(sizeof(Node));
    if (newNode == NULL) {
        return NULL;
    }
    newNode->data = data;
    newNode->next = NULL;
    return newNode;
}

Insert at Beginning:

Node* insertAtBeginning(Node *head, int data) {
    Node *newNode = createNode(data);
    newNode->next = head;
    return newNode;  // New head
}

Insert at End:

void insertAtEnd(Node **head, int data) {
    Node *newNode = createNode(data);

    if (*head == NULL) {
        *head = newNode;
        return;
    }

    Node *temp = *head;
    while (temp->next != NULL) {
        temp = temp->next;
    }
    temp->next = newNode;
}

Delete Node:

Node* deleteNode(Node *head, int data) {
    if (head == NULL) return NULL;

    // Deleting head
    if (head->data == data) {
        Node *temp = head->next;
        free(head);
        return temp;
    }

    // Find node before the one to delete
    Node *current = head;
    while (current->next != NULL && current->next->data != data) {
        current = current->next;
    }

    if (current->next != NULL) {
        Node *temp = current->next;
        current->next = temp->next;
        free(temp);
    }

    return head;
}

Free List:

void freeList(Node *head) {
    while (head != NULL) {
        Node *temp = head;
        head = head->next;
        free(temp);
    }
}

Doubly Linked List

Each node has pointers to both next and previous nodes.

typedef struct DNode {
    int data;
    struct DNode *next;
    struct DNode *prev;
} DNode;

Advantages:

Can traverse backward
Easier to delete node (don't need previous pointer)

Disadvantages:

More memory (extra pointer)
More complex insertion/deletion

Circular Linked List

Last node points back to first node.

// Same Node structure, but:
// tail->next = head (instead of NULL)

Stacks

Last-In-First-Out (LIFO) data structure.

Array Implementation

#define MAX_SIZE 100

typedef struct {
    int items[MAX_SIZE];
    int top;
} Stack;

void initStack(Stack *s) {
    s->top = -1;
}

int isEmpty(Stack *s) {
    return s->top == -1;
}

int isFull(Stack *s) {
    return s->top == MAX_SIZE - 1;
}

void push(Stack *s, int value) {
    if (isFull(s)) {
        printf("Stack overflow\n");
        return;
    }
    s->items[++s->top] = value;
}

int pop(Stack *s) {
    if (isEmpty(s)) {
        printf("Stack underflow\n");
        return -1;
    }
    return s->items[s->top--];
}

int peek(Stack *s) {
    if (isEmpty(s)) {
        return -1;
    }
    return s->items[s->top];
}

Linked List Implementation

typedef struct StackNode {
    int data;
    struct StackNode *next;
} StackNode;

typedef struct {
    StackNode *top;
} Stack;

void push(Stack *s, int value) {
    StackNode *newNode = malloc(sizeof(StackNode));
    newNode->data = value;
    newNode->next = s->top;
    s->top = newNode;
}

int pop(Stack *s) {
    if (s->top == NULL) return -1;

    int value = s->top->data;
    StackNode *temp = s->top;
    s->top = s->top->next;
    free(temp);
    return value;
}

Stack Applications:

Function call stack (system level!)
Expression evaluation
Undo mechanisms
Backtracking algorithms

Queues

First-In-First-Out (FIFO) data structure.

Array Implementation (Circular Queue)

#define MAX_SIZE 100

typedef struct {
    int items[MAX_SIZE];
    int front;
    int rear;
    int count;
} Queue;

void initQueue(Queue *q) {
    q->front = 0;
    q->rear = -1;
    q->count = 0;
}

int isEmpty(Queue *q) {
    return q->count == 0;
}

int isFull(Queue *q) {
    return q->count == MAX_SIZE;
}

void enqueue(Queue *q, int value) {
    if (isFull(q)) {
        printf("Queue overflow\n");
        return;
    }
    q->rear = (q->rear + 1) % MAX_SIZE;
    q->items[q->rear] = value;
    q->count++;
}

int dequeue(Queue *q) {
    if (isEmpty(q)) {
        printf("Queue underflow\n");
        return -1;
    }
    int value = q->items[q->front];
    q->front = (q->front + 1) % MAX_SIZE;
    q->count--;
    return value;
}

Linked List Implementation

typedef struct QueueNode {
    int data;
    struct QueueNode *next;
} QueueNode;

typedef struct {
    QueueNode *front;
    QueueNode *rear;
} Queue;

void enqueue(Queue *q, int value) {
    QueueNode *newNode = malloc(sizeof(QueueNode));
    newNode->data = value;
    newNode->next = NULL;

    if (q->rear == NULL) {
        q->front = q->rear = newNode;
        return;
    }

    q->rear->next = newNode;
    q->rear = newNode;
}

int dequeue(Queue *q) {
    if (q->front == NULL) return -1;

    QueueNode *temp = q->front;
    int value = temp->data;
    q->front = q->front->next;

    if (q->front == NULL) {
        q->rear = NULL;
    }

    free(temp);
    return value;
}

Queue Applications:

Job scheduling
Buffering (I/O, network)
Breadth-first search
Printer spooling

Hash Tables

Maps keys to values with O(1) average access time.

Simple Hash Table with Chaining

#define TABLE_SIZE 100

typedef struct HashNode {
    char *key;
    int value;
    struct HashNode *next;
} HashNode;

typedef struct {
    HashNode *buckets[TABLE_SIZE];
} HashTable;

// Hash function
unsigned int hash(const char *key) {
    unsigned int hash = 0;
    while (*key) {
        hash = (hash << 5) + *key++;
    }
    return hash % TABLE_SIZE;
}

void initHashTable(HashTable *ht) {
    for (int i = 0; i < TABLE_SIZE; i++) {
        ht->buckets[i] = NULL;
    }
}

void insert(HashTable *ht, const char *key, int value) {
    unsigned int index = hash(key);

    HashNode *newNode = malloc(sizeof(HashNode));
    newNode->key = strdup(key);  // Copy key
    newNode->value = value;
    newNode->next = ht->buckets[index];
    ht->buckets[index] = newNode;
}

int* get(HashTable *ht, const char *key) {
    unsigned int index = hash(key);
    HashNode *node = ht->buckets[index];

    while (node != NULL) {
        if (strcmp(node->key, key) == 0) {
            return &node->value;
        }
        node = node->next;
    }

    return NULL;  // Not found
}

Hash Table Applications:

Symbol tables in compilers
Database indexing
Caching
Implementing sets and maps

Binary Trees

Hierarchical data structure with nodes that have at most two children.

Binary Search Tree (BST)

typedef struct TreeNode {
    int data;
    struct TreeNode *left;
    struct TreeNode *right;
} TreeNode;

TreeNode* createTreeNode(int data) {
    TreeNode *newNode = malloc(sizeof(TreeNode));
    newNode->data = data;
    newNode->left = NULL;
    newNode->right = NULL;
    return newNode;
}

TreeNode* insert(TreeNode *root, int data) {
    if (root == NULL) {
        return createTreeNode(data);
    }

    if (data < root->data) {
        root->left = insert(root->left, data);
    } else if (data > root->data) {
        root->right = insert(root->right, data);
    }

    return root;
}

TreeNode* search(TreeNode *root, int data) {
    if (root == NULL || root->data == data) {
        return root;
    }

    if (data < root->data) {
        return search(root->left, data);
    } else {
        return search(root->right, data);
    }
}

void inorderTraversal(TreeNode *root) {
    if (root != NULL) {
        inorderTraversal(root->left);
        printf("%d ", root->data);
        inorderTraversal(root->right);
    }
}

Tree Traversals:

Inorder: Left, Root, Right (gives sorted order for BST)
Preorder: Root, Left, Right
Postorder: Left, Right, Root

Algorithm Complexity

Understanding time and space complexity is critical.

Big-O Notation

Operation	Array	Linked List	Stack	Queue	Hash Table	BST (balanced)
Access	O(1)	O(n)	O(n)	O(n)	-	O(log n)
Search	O(n)	O(n)	O(n)	O(n)	O(1)*	O(log n)
Insert	O(n)	O(1)**	O(1)	O(1)	O(1)*	O(log n)
Delete	O(n)	O(1)**	O(1)	O(1)	O(1)*	O(log n)

* Average case ** At beginning/end

Space Complexity

Arrays: O(n) - just the elements
Linked Lists: O(n) - elements + pointers
Hash Tables: O(n) - depends on load factor
Trees: O(n) - elements + pointers

Memory Layout Considerations

Cache Performance

Arrays: Excellent cache locality (contiguous memory) Linked Lists: Poor cache locality (scattered in memory)

Implication: For sequential access, arrays are much faster even if Big-O is same.

Memory Overhead

// Array: 10 integers
int arr[10];  // 40 bytes (10 × 4)

// Linked list: 10 integers
typedef struct Node {
    int data;          // 4 bytes
    struct Node *next; // 8 bytes (on 64-bit system)
} Node;
// Total: 10 × 12 = 120 bytes (3x overhead!)

Alignment and Padding

struct Example {
    char a;    // 1 byte
    // 3 bytes padding
    int b;     // 4 bytes
    char c;    // 1 byte
    // 7 bytes padding
    double d;  // 8 bytes
};
// Total: 24 bytes (not 14!)

Compiler adds padding for alignment (performance).

Key Concepts

Arrays provide O(1) access but fixed size
Linked lists allow dynamic size but O(n) access
Stacks are LIFO, queues are FIFO
Hash tables provide O(1) average lookups
Trees provide O(log n) operations when balanced
Memory layout affects performance significantly
Choose the right data structure based on access patterns

Common Mistakes

Memory leaks - Forgetting to free linked list nodes
Off-by-one errors - Array indexing mistakes
Not checking malloc - Assuming allocation succeeds
Stack vs heap confusion - Large arrays on stack overflow
Ignoring cache locality - Using linked lists when arrays work better

Debugging Tips

Draw diagrams - Visualize pointers and memory
Use valgrind - Detect memory leaks and errors
Print structure - Debug by printing entire data structure
Check bounds - Verify array indices
Test edge cases - Empty, single element, large size

Mini Exercises

Implement a linked list with insert, delete, and search
Create a stack and use it to reverse a string
Implement a circular queue
Write a function to detect cycles in a linked list
Implement a simple hash table
Create a binary search tree with insert and search
Write iterative and recursive tree traversals
Implement a dynamic array that auto-resizes
Measure cache performance: array vs linked list iteration
Implement a doubly linked list

Review Questions

What are the advantages and disadvantages of arrays vs linked lists?
When would you choose a stack over a queue?
How does a hash table achieve O(1) average lookups?
What is cache locality and why does it matter?
What is the time complexity of searching in a balanced BST?

Reference Checklist

By the end of this chapter, you should be able to:

Implement arrays and dynamic arrays
Create and manipulate linked lists
Implement stacks and queues
Build a hash table with collision handling
Construct binary search trees
Analyze time and space complexity
Understand memory layout of data structures
Choose appropriate data structures for problems

Next Steps

Now that you understand data structures in C, the next chapter introduces assembly language. You'll learn to write code at the lowest level - directly programming the CPU with assembly instructions.

Key Takeaway: Data structures in C require manual memory management and careful pointer manipulation. Understanding how they're laid out in memory and their performance characteristics is essential for writing efficient system software.