Data Structures in C

Chapter 7: Data Structures in C

Introduction

Data structures organize and store data efficiently. In high-level languages, data structures are often built-in or provided by libraries. In C, you implement them yourself, giving you complete control over memory layout and performance. Understanding data structures at this low level is crucial for system programming.

Why This Matters

System software needs efficient data structures. The Linux kernel uses linked lists, hash tables, red-black trees, and more. Device drivers use queues for buffering. Memory allocators use complex tree structures. Understanding how data structures work at the memory level helps you choose the right one and implement it efficiently.

How to Study This Chapter

  1. Draw memory diagrams - Visualize how data is laid out
  2. Implement from scratch - Don't just read, code it
  3. Analyze complexity - Think about time and space trade-offs
  4. Test thoroughly - Edge cases matter in system code

Arrays

Arrays are the simplest data structure - contiguous memory locations.

Fixed-Size Arrays

int numbers[5] = {10, 20, 30, 40, 50};

Memory Layout:

Address    Value
0x1000     10
0x1004     20
0x1008     30
0x100C     40
0x1010     50

Characteristics:

  • O(1) access - Direct indexing: numbers[2] = *(numbers + 2)
  • Fixed size - Can't grow or shrink
  • Contiguous memory - All elements together
  • Cache-friendly - Sequential access is fast

Dynamic Arrays

int *arr = malloc(5 * sizeof(int));
arr[0] = 10;
arr[1] = 20;
// ...
free(arr);

Resizing (manual reallocation):

int capacity = 5;
int *arr = malloc(capacity * sizeof(int));

// Need more space
capacity *= 2;
arr = realloc(arr, capacity * sizeof(int));

free(arr);

2D Arrays

// Stack allocation
int matrix[3][4];  // 3 rows, 4 columns

// Heap allocation
int **matrix = malloc(3 * sizeof(int*));
for (int i = 0; i < 3; i++) {
    matrix[i] = malloc(4 * sizeof(int));
}

// Don't forget to free
for (int i = 0; i < 3; i++) {
    free(matrix[i]);
}
free(matrix);

Linked Lists

A sequence of nodes where each node contains data and a pointer to the next node.

Singly Linked List

Structure Definition:

typedef struct Node {
    int data;
    struct Node *next;
} Node;

Memory Layout:

[10 | next] -> [20 | next] -> [30 | NULL]

Create Node:

Node* createNode(int data) {
    Node *newNode = malloc(sizeof(Node));
    if (newNode == NULL) {
        return NULL;
    }
    newNode->data = data;
    newNode->next = NULL;
    return newNode;
}

Insert at Beginning:

Node* insertAtBeginning(Node *head, int data) {
    Node *newNode = createNode(data);
    newNode->next = head;
    return newNode;  // New head
}

Insert at End:

void insertAtEnd(Node **head, int data) {
    Node *newNode = createNode(data);

    if (*head == NULL) {
        *head = newNode;
        return;
    }

    Node *temp = *head;
    while (temp->next != NULL) {
        temp = temp->next;
    }
    temp->next = newNode;
}

Delete Node:

Node* deleteNode(Node *head, int data) {
    if (head == NULL) return NULL;

    // Deleting head
    if (head->data == data) {
        Node *temp = head->next;
        free(head);
        return temp;
    }

    // Find node before the one to delete
    Node *current = head;
    while (current->next != NULL && current->next->data != data) {
        current = current->next;
    }

    if (current->next != NULL) {
        Node *temp = current->next;
        current->next = temp->next;
        free(temp);
    }

    return head;
}

Free List:

void freeList(Node *head) {
    while (head != NULL) {
        Node *temp = head;
        head = head->next;
        free(temp);
    }
}

Doubly Linked List

Each node has pointers to both next and previous nodes.

typedef struct DNode {
    int data;
    struct DNode *next;
    struct DNode *prev;
} DNode;

Advantages:

  • Can traverse backward
  • Easier to delete node (don't need previous pointer)

Disadvantages:

  • More memory (extra pointer)
  • More complex insertion/deletion

Circular Linked List

Last node points back to first node.

// Same Node structure, but:
// tail->next = head (instead of NULL)

Stacks

Last-In-First-Out (LIFO) data structure.

Array Implementation

#define MAX_SIZE 100

typedef struct {
    int items[MAX_SIZE];
    int top;
} Stack;

void initStack(Stack *s) {
    s->top = -1;
}

int isEmpty(Stack *s) {
    return s->top == -1;
}

int isFull(Stack *s) {
    return s->top == MAX_SIZE - 1;
}

void push(Stack *s, int value) {
    if (isFull(s)) {
        printf("Stack overflow\n");
        return;
    }
    s->items[++s->top] = value;
}

int pop(Stack *s) {
    if (isEmpty(s)) {
        printf("Stack underflow\n");
        return -1;
    }
    return s->items[s->top--];
}

int peek(Stack *s) {
    if (isEmpty(s)) {
        return -1;
    }
    return s->items[s->top];
}

Linked List Implementation

typedef struct StackNode {
    int data;
    struct StackNode *next;
} StackNode;

typedef struct {
    StackNode *top;
} Stack;

void push(Stack *s, int value) {
    StackNode *newNode = malloc(sizeof(StackNode));
    newNode->data = value;
    newNode->next = s->top;
    s->top = newNode;
}

int pop(Stack *s) {
    if (s->top == NULL) return -1;

    int value = s->top->data;
    StackNode *temp = s->top;
    s->top = s->top->next;
    free(temp);
    return value;
}

Stack Applications:

  • Function call stack (system level!)
  • Expression evaluation
  • Undo mechanisms
  • Backtracking algorithms

Queues

First-In-First-Out (FIFO) data structure.

Array Implementation (Circular Queue)

#define MAX_SIZE 100

typedef struct {
    int items[MAX_SIZE];
    int front;
    int rear;
    int count;
} Queue;

void initQueue(Queue *q) {
    q->front = 0;
    q->rear = -1;
    q->count = 0;
}

int isEmpty(Queue *q) {
    return q->count == 0;
}

int isFull(Queue *q) {
    return q->count == MAX_SIZE;
}

void enqueue(Queue *q, int value) {
    if (isFull(q)) {
        printf("Queue overflow\n");
        return;
    }
    q->rear = (q->rear + 1) % MAX_SIZE;
    q->items[q->rear] = value;
    q->count++;
}

int dequeue(Queue *q) {
    if (isEmpty(q)) {
        printf("Queue underflow\n");
        return -1;
    }
    int value = q->items[q->front];
    q->front = (q->front + 1) % MAX_SIZE;
    q->count--;
    return value;
}

Linked List Implementation

typedef struct QueueNode {
    int data;
    struct QueueNode *next;
} QueueNode;

typedef struct {
    QueueNode *front;
    QueueNode *rear;
} Queue;

void enqueue(Queue *q, int value) {
    QueueNode *newNode = malloc(sizeof(QueueNode));
    newNode->data = value;
    newNode->next = NULL;

    if (q->rear == NULL) {
        q->front = q->rear = newNode;
        return;
    }

    q->rear->next = newNode;
    q->rear = newNode;
}

int dequeue(Queue *q) {
    if (q->front == NULL) return -1;

    QueueNode *temp = q->front;
    int value = temp->data;
    q->front = q->front->next;

    if (q->front == NULL) {
        q->rear = NULL;
    }

    free(temp);
    return value;
}

Queue Applications:

  • Job scheduling
  • Buffering (I/O, network)
  • Breadth-first search
  • Printer spooling

Hash Tables

Maps keys to values with O(1) average access time.

Simple Hash Table with Chaining

#define TABLE_SIZE 100

typedef struct HashNode {
    char *key;
    int value;
    struct HashNode *next;
} HashNode;

typedef struct {
    HashNode *buckets[TABLE_SIZE];
} HashTable;

// Hash function
unsigned int hash(const char *key) {
    unsigned int hash = 0;
    while (*key) {
        hash = (hash << 5) + *key++;
    }
    return hash % TABLE_SIZE;
}

void initHashTable(HashTable *ht) {
    for (int i = 0; i < TABLE_SIZE; i++) {
        ht->buckets[i] = NULL;
    }
}

void insert(HashTable *ht, const char *key, int value) {
    unsigned int index = hash(key);

    HashNode *newNode = malloc(sizeof(HashNode));
    newNode->key = strdup(key);  // Copy key
    newNode->value = value;
    newNode->next = ht->buckets[index];
    ht->buckets[index] = newNode;
}

int* get(HashTable *ht, const char *key) {
    unsigned int index = hash(key);
    HashNode *node = ht->buckets[index];

    while (node != NULL) {
        if (strcmp(node->key, key) == 0) {
            return &node->value;
        }
        node = node->next;
    }

    return NULL;  // Not found
}

Hash Table Applications:

  • Symbol tables in compilers
  • Database indexing
  • Caching
  • Implementing sets and maps

Binary Trees

Hierarchical data structure with nodes that have at most two children.

Binary Search Tree (BST)

typedef struct TreeNode {
    int data;
    struct TreeNode *left;
    struct TreeNode *right;
} TreeNode;

TreeNode* createTreeNode(int data) {
    TreeNode *newNode = malloc(sizeof(TreeNode));
    newNode->data = data;
    newNode->left = NULL;
    newNode->right = NULL;
    return newNode;
}

TreeNode* insert(TreeNode *root, int data) {
    if (root == NULL) {
        return createTreeNode(data);
    }

    if (data < root->data) {
        root->left = insert(root->left, data);
    } else if (data > root->data) {
        root->right = insert(root->right, data);
    }

    return root;
}

TreeNode* search(TreeNode *root, int data) {
    if (root == NULL || root->data == data) {
        return root;
    }

    if (data < root->data) {
        return search(root->left, data);
    } else {
        return search(root->right, data);
    }
}

void inorderTraversal(TreeNode *root) {
    if (root != NULL) {
        inorderTraversal(root->left);
        printf("%d ", root->data);
        inorderTraversal(root->right);
    }
}

Tree Traversals:

  • Inorder: Left, Root, Right (gives sorted order for BST)
  • Preorder: Root, Left, Right
  • Postorder: Left, Right, Root

Algorithm Complexity

Understanding time and space complexity is critical.

Big-O Notation

OperationArrayLinked ListStackQueueHash TableBST (balanced)
AccessO(1)O(n)O(n)O(n)-O(log n)
SearchO(n)O(n)O(n)O(n)O(1)*O(log n)
InsertO(n)O(1)**O(1)O(1)O(1)*O(log n)
DeleteO(n)O(1)**O(1)O(1)O(1)*O(log n)

* Average case ** At beginning/end

Space Complexity

  • Arrays: O(n) - just the elements
  • Linked Lists: O(n) - elements + pointers
  • Hash Tables: O(n) - depends on load factor
  • Trees: O(n) - elements + pointers

Memory Layout Considerations

Cache Performance

Arrays: Excellent cache locality (contiguous memory) Linked Lists: Poor cache locality (scattered in memory)

Implication: For sequential access, arrays are much faster even if Big-O is same.

Memory Overhead

// Array: 10 integers
int arr[10];  // 40 bytes (10 × 4)

// Linked list: 10 integers
typedef struct Node {
    int data;          // 4 bytes
    struct Node *next; // 8 bytes (on 64-bit system)
} Node;
// Total: 10 × 12 = 120 bytes (3x overhead!)

Alignment and Padding

struct Example {
    char a;    // 1 byte
    // 3 bytes padding
    int b;     // 4 bytes
    char c;    // 1 byte
    // 7 bytes padding
    double d;  // 8 bytes
};
// Total: 24 bytes (not 14!)

Compiler adds padding for alignment (performance).

Key Concepts

  • Arrays provide O(1) access but fixed size
  • Linked lists allow dynamic size but O(n) access
  • Stacks are LIFO, queues are FIFO
  • Hash tables provide O(1) average lookups
  • Trees provide O(log n) operations when balanced
  • Memory layout affects performance significantly
  • Choose the right data structure based on access patterns

Common Mistakes

  1. Memory leaks - Forgetting to free linked list nodes
  2. Off-by-one errors - Array indexing mistakes
  3. Not checking malloc - Assuming allocation succeeds
  4. Stack vs heap confusion - Large arrays on stack overflow
  5. Ignoring cache locality - Using linked lists when arrays work better

Debugging Tips

  • Draw diagrams - Visualize pointers and memory
  • Use valgrind - Detect memory leaks and errors
  • Print structure - Debug by printing entire data structure
  • Check bounds - Verify array indices
  • Test edge cases - Empty, single element, large size

Mini Exercises

  1. Implement a linked list with insert, delete, and search
  2. Create a stack and use it to reverse a string
  3. Implement a circular queue
  4. Write a function to detect cycles in a linked list
  5. Implement a simple hash table
  6. Create a binary search tree with insert and search
  7. Write iterative and recursive tree traversals
  8. Implement a dynamic array that auto-resizes
  9. Measure cache performance: array vs linked list iteration
  10. Implement a doubly linked list

Review Questions

  1. What are the advantages and disadvantages of arrays vs linked lists?
  2. When would you choose a stack over a queue?
  3. How does a hash table achieve O(1) average lookups?
  4. What is cache locality and why does it matter?
  5. What is the time complexity of searching in a balanced BST?

Reference Checklist

By the end of this chapter, you should be able to:

  • Implement arrays and dynamic arrays
  • Create and manipulate linked lists
  • Implement stacks and queues
  • Build a hash table with collision handling
  • Construct binary search trees
  • Analyze time and space complexity
  • Understand memory layout of data structures
  • Choose appropriate data structures for problems

Next Steps

Now that you understand data structures in C, the next chapter introduces assembly language. You'll learn to write code at the lowest level - directly programming the CPU with assembly instructions.


Key Takeaway: Data structures in C require manual memory management and careful pointer manipulation. Understanding how they're laid out in memory and their performance characteristics is essential for writing efficient system software.