Chapter 7: Data Structures in C
Introduction
Data structures organize and store data efficiently. In high-level languages, data structures are often built-in or provided by libraries. In C, you implement them yourself, giving you complete control over memory layout and performance. Understanding data structures at this low level is crucial for system programming.
Why This Matters
System software needs efficient data structures. The Linux kernel uses linked lists, hash tables, red-black trees, and more. Device drivers use queues for buffering. Memory allocators use complex tree structures. Understanding how data structures work at the memory level helps you choose the right one and implement it efficiently.
How to Study This Chapter
- Draw memory diagrams - Visualize how data is laid out
- Implement from scratch - Don't just read, code it
- Analyze complexity - Think about time and space trade-offs
- Test thoroughly - Edge cases matter in system code
Arrays
Arrays are the simplest data structure - contiguous memory locations.
Fixed-Size Arrays
int numbers[5] = {10, 20, 30, 40, 50};
Memory Layout:
Address Value
0x1000 10
0x1004 20
0x1008 30
0x100C 40
0x1010 50
Characteristics:
- O(1) access - Direct indexing:
numbers[2]= *(numbers + 2) - Fixed size - Can't grow or shrink
- Contiguous memory - All elements together
- Cache-friendly - Sequential access is fast
Dynamic Arrays
int *arr = malloc(5 * sizeof(int));
arr[0] = 10;
arr[1] = 20;
// ...
free(arr);
Resizing (manual reallocation):
int capacity = 5;
int *arr = malloc(capacity * sizeof(int));
// Need more space
capacity *= 2;
arr = realloc(arr, capacity * sizeof(int));
free(arr);
2D Arrays
// Stack allocation
int matrix[3][4]; // 3 rows, 4 columns
// Heap allocation
int **matrix = malloc(3 * sizeof(int*));
for (int i = 0; i < 3; i++) {
matrix[i] = malloc(4 * sizeof(int));
}
// Don't forget to free
for (int i = 0; i < 3; i++) {
free(matrix[i]);
}
free(matrix);
Linked Lists
A sequence of nodes where each node contains data and a pointer to the next node.
Singly Linked List
Structure Definition:
typedef struct Node {
int data;
struct Node *next;
} Node;
Memory Layout:
[10 | next] -> [20 | next] -> [30 | NULL]
Create Node:
Node* createNode(int data) {
Node *newNode = malloc(sizeof(Node));
if (newNode == NULL) {
return NULL;
}
newNode->data = data;
newNode->next = NULL;
return newNode;
}
Insert at Beginning:
Node* insertAtBeginning(Node *head, int data) {
Node *newNode = createNode(data);
newNode->next = head;
return newNode; // New head
}
Insert at End:
void insertAtEnd(Node **head, int data) {
Node *newNode = createNode(data);
if (*head == NULL) {
*head = newNode;
return;
}
Node *temp = *head;
while (temp->next != NULL) {
temp = temp->next;
}
temp->next = newNode;
}
Delete Node:
Node* deleteNode(Node *head, int data) {
if (head == NULL) return NULL;
// Deleting head
if (head->data == data) {
Node *temp = head->next;
free(head);
return temp;
}
// Find node before the one to delete
Node *current = head;
while (current->next != NULL && current->next->data != data) {
current = current->next;
}
if (current->next != NULL) {
Node *temp = current->next;
current->next = temp->next;
free(temp);
}
return head;
}
Free List:
void freeList(Node *head) {
while (head != NULL) {
Node *temp = head;
head = head->next;
free(temp);
}
}
Doubly Linked List
Each node has pointers to both next and previous nodes.
typedef struct DNode {
int data;
struct DNode *next;
struct DNode *prev;
} DNode;
Advantages:
- Can traverse backward
- Easier to delete node (don't need previous pointer)
Disadvantages:
- More memory (extra pointer)
- More complex insertion/deletion
Circular Linked List
Last node points back to first node.
// Same Node structure, but:
// tail->next = head (instead of NULL)
Stacks
Last-In-First-Out (LIFO) data structure.
Array Implementation
#define MAX_SIZE 100
typedef struct {
int items[MAX_SIZE];
int top;
} Stack;
void initStack(Stack *s) {
s->top = -1;
}
int isEmpty(Stack *s) {
return s->top == -1;
}
int isFull(Stack *s) {
return s->top == MAX_SIZE - 1;
}
void push(Stack *s, int value) {
if (isFull(s)) {
printf("Stack overflow\n");
return;
}
s->items[++s->top] = value;
}
int pop(Stack *s) {
if (isEmpty(s)) {
printf("Stack underflow\n");
return -1;
}
return s->items[s->top--];
}
int peek(Stack *s) {
if (isEmpty(s)) {
return -1;
}
return s->items[s->top];
}
Linked List Implementation
typedef struct StackNode {
int data;
struct StackNode *next;
} StackNode;
typedef struct {
StackNode *top;
} Stack;
void push(Stack *s, int value) {
StackNode *newNode = malloc(sizeof(StackNode));
newNode->data = value;
newNode->next = s->top;
s->top = newNode;
}
int pop(Stack *s) {
if (s->top == NULL) return -1;
int value = s->top->data;
StackNode *temp = s->top;
s->top = s->top->next;
free(temp);
return value;
}
Stack Applications:
- Function call stack (system level!)
- Expression evaluation
- Undo mechanisms
- Backtracking algorithms
Queues
First-In-First-Out (FIFO) data structure.
Array Implementation (Circular Queue)
#define MAX_SIZE 100
typedef struct {
int items[MAX_SIZE];
int front;
int rear;
int count;
} Queue;
void initQueue(Queue *q) {
q->front = 0;
q->rear = -1;
q->count = 0;
}
int isEmpty(Queue *q) {
return q->count == 0;
}
int isFull(Queue *q) {
return q->count == MAX_SIZE;
}
void enqueue(Queue *q, int value) {
if (isFull(q)) {
printf("Queue overflow\n");
return;
}
q->rear = (q->rear + 1) % MAX_SIZE;
q->items[q->rear] = value;
q->count++;
}
int dequeue(Queue *q) {
if (isEmpty(q)) {
printf("Queue underflow\n");
return -1;
}
int value = q->items[q->front];
q->front = (q->front + 1) % MAX_SIZE;
q->count--;
return value;
}
Linked List Implementation
typedef struct QueueNode {
int data;
struct QueueNode *next;
} QueueNode;
typedef struct {
QueueNode *front;
QueueNode *rear;
} Queue;
void enqueue(Queue *q, int value) {
QueueNode *newNode = malloc(sizeof(QueueNode));
newNode->data = value;
newNode->next = NULL;
if (q->rear == NULL) {
q->front = q->rear = newNode;
return;
}
q->rear->next = newNode;
q->rear = newNode;
}
int dequeue(Queue *q) {
if (q->front == NULL) return -1;
QueueNode *temp = q->front;
int value = temp->data;
q->front = q->front->next;
if (q->front == NULL) {
q->rear = NULL;
}
free(temp);
return value;
}
Queue Applications:
- Job scheduling
- Buffering (I/O, network)
- Breadth-first search
- Printer spooling
Hash Tables
Maps keys to values with O(1) average access time.
Simple Hash Table with Chaining
#define TABLE_SIZE 100
typedef struct HashNode {
char *key;
int value;
struct HashNode *next;
} HashNode;
typedef struct {
HashNode *buckets[TABLE_SIZE];
} HashTable;
// Hash function
unsigned int hash(const char *key) {
unsigned int hash = 0;
while (*key) {
hash = (hash << 5) + *key++;
}
return hash % TABLE_SIZE;
}
void initHashTable(HashTable *ht) {
for (int i = 0; i < TABLE_SIZE; i++) {
ht->buckets[i] = NULL;
}
}
void insert(HashTable *ht, const char *key, int value) {
unsigned int index = hash(key);
HashNode *newNode = malloc(sizeof(HashNode));
newNode->key = strdup(key); // Copy key
newNode->value = value;
newNode->next = ht->buckets[index];
ht->buckets[index] = newNode;
}
int* get(HashTable *ht, const char *key) {
unsigned int index = hash(key);
HashNode *node = ht->buckets[index];
while (node != NULL) {
if (strcmp(node->key, key) == 0) {
return &node->value;
}
node = node->next;
}
return NULL; // Not found
}
Hash Table Applications:
- Symbol tables in compilers
- Database indexing
- Caching
- Implementing sets and maps
Binary Trees
Hierarchical data structure with nodes that have at most two children.
Binary Search Tree (BST)
typedef struct TreeNode {
int data;
struct TreeNode *left;
struct TreeNode *right;
} TreeNode;
TreeNode* createTreeNode(int data) {
TreeNode *newNode = malloc(sizeof(TreeNode));
newNode->data = data;
newNode->left = NULL;
newNode->right = NULL;
return newNode;
}
TreeNode* insert(TreeNode *root, int data) {
if (root == NULL) {
return createTreeNode(data);
}
if (data < root->data) {
root->left = insert(root->left, data);
} else if (data > root->data) {
root->right = insert(root->right, data);
}
return root;
}
TreeNode* search(TreeNode *root, int data) {
if (root == NULL || root->data == data) {
return root;
}
if (data < root->data) {
return search(root->left, data);
} else {
return search(root->right, data);
}
}
void inorderTraversal(TreeNode *root) {
if (root != NULL) {
inorderTraversal(root->left);
printf("%d ", root->data);
inorderTraversal(root->right);
}
}
Tree Traversals:
- Inorder: Left, Root, Right (gives sorted order for BST)
- Preorder: Root, Left, Right
- Postorder: Left, Right, Root
Algorithm Complexity
Understanding time and space complexity is critical.
Big-O Notation
| Operation | Array | Linked List | Stack | Queue | Hash Table | BST (balanced) |
|---|---|---|---|---|---|---|
| Access | O(1) | O(n) | O(n) | O(n) | - | O(log n) |
| Search | O(n) | O(n) | O(n) | O(n) | O(1)* | O(log n) |
| Insert | O(n) | O(1)** | O(1) | O(1) | O(1)* | O(log n) |
| Delete | O(n) | O(1)** | O(1) | O(1) | O(1)* | O(log n) |
* Average case ** At beginning/end
Space Complexity
- Arrays: O(n) - just the elements
- Linked Lists: O(n) - elements + pointers
- Hash Tables: O(n) - depends on load factor
- Trees: O(n) - elements + pointers
Memory Layout Considerations
Cache Performance
Arrays: Excellent cache locality (contiguous memory) Linked Lists: Poor cache locality (scattered in memory)
Implication: For sequential access, arrays are much faster even if Big-O is same.
Memory Overhead
// Array: 10 integers
int arr[10]; // 40 bytes (10 × 4)
// Linked list: 10 integers
typedef struct Node {
int data; // 4 bytes
struct Node *next; // 8 bytes (on 64-bit system)
} Node;
// Total: 10 × 12 = 120 bytes (3x overhead!)
Alignment and Padding
struct Example {
char a; // 1 byte
// 3 bytes padding
int b; // 4 bytes
char c; // 1 byte
// 7 bytes padding
double d; // 8 bytes
};
// Total: 24 bytes (not 14!)
Compiler adds padding for alignment (performance).
Key Concepts
- Arrays provide O(1) access but fixed size
- Linked lists allow dynamic size but O(n) access
- Stacks are LIFO, queues are FIFO
- Hash tables provide O(1) average lookups
- Trees provide O(log n) operations when balanced
- Memory layout affects performance significantly
- Choose the right data structure based on access patterns
Common Mistakes
- Memory leaks - Forgetting to free linked list nodes
- Off-by-one errors - Array indexing mistakes
- Not checking malloc - Assuming allocation succeeds
- Stack vs heap confusion - Large arrays on stack overflow
- Ignoring cache locality - Using linked lists when arrays work better
Debugging Tips
- Draw diagrams - Visualize pointers and memory
- Use valgrind - Detect memory leaks and errors
- Print structure - Debug by printing entire data structure
- Check bounds - Verify array indices
- Test edge cases - Empty, single element, large size
Mini Exercises
- Implement a linked list with insert, delete, and search
- Create a stack and use it to reverse a string
- Implement a circular queue
- Write a function to detect cycles in a linked list
- Implement a simple hash table
- Create a binary search tree with insert and search
- Write iterative and recursive tree traversals
- Implement a dynamic array that auto-resizes
- Measure cache performance: array vs linked list iteration
- Implement a doubly linked list
Review Questions
- What are the advantages and disadvantages of arrays vs linked lists?
- When would you choose a stack over a queue?
- How does a hash table achieve O(1) average lookups?
- What is cache locality and why does it matter?
- What is the time complexity of searching in a balanced BST?
Reference Checklist
By the end of this chapter, you should be able to:
- Implement arrays and dynamic arrays
- Create and manipulate linked lists
- Implement stacks and queues
- Build a hash table with collision handling
- Construct binary search trees
- Analyze time and space complexity
- Understand memory layout of data structures
- Choose appropriate data structures for problems
Next Steps
Now that you understand data structures in C, the next chapter introduces assembly language. You'll learn to write code at the lowest level - directly programming the CPU with assembly instructions.
Key Takeaway: Data structures in C require manual memory management and careful pointer manipulation. Understanding how they're laid out in memory and their performance characteristics is essential for writing efficient system software.