Data Manipulation (DML)

Chapter 2: Data Manipulation (DML)

Data Manipulation Language (DML) encompasses the operations used to mutate the state of the database: INSERT, UPDATE, and DELETE. In a highly concurrent system, DML operations must strictly adhere to the ACID properties, utilizing the Write-Ahead Log (WAL) to ensure durability and Locks/MVCC to ensure isolation.

I. Inserting Data (INSERT)

The INSERT statement adds new rows to a table. When a row is inserted, the engine allocates space in a data page, updates the primary clustered index, and simultaneously updates any secondary B+ tree indexes.

1. Single and Multi-Row Inserts

Batching inserts reduces the overhead of network round-trips and transaction commit synchronization (fsync).

-- Single row insert
INSERT INTO employees (employee_id, first_name, last_name, email, department_id)
VALUES (101, 'Alice', 'Smith', 'alice@corp.local', 10);

-- Multi-row bulk insert (Highly recommended for throughput)
INSERT INTO employees (employee_id, first_name, last_name, email, department_id)
VALUES 
    (102, 'Bob', 'Jones', 'bob@corp.local', 10),
    (103, 'Carol', 'Davis', 'carol@corp.local', 20);

2. INSERT INTO ... SELECT

This pattern moves data directly within the database kernel without bringing it back to the application server, making it optimal for ELT (Extract, Load, Transform) pipelines.

-- Archiving old records to a history table
INSERT INTO employees_archive (emp_id, name, archived_at)
SELECT employee_id, first_name, CURRENT_TIMESTAMP
FROM employees
WHERE status = 'TERMINATED';

II. Modifying State (UPDATE & DELETE)

UPDATE and DELETE operations are complex because they require a "Search" phase to locate the targeted rows before the "Mutation" phase can occur.

1. The UPDATE Statement

In MVCC (Multi-Version Concurrency Control) engines like PostgreSQL, an UPDATE is not an in-place mutation. It is physically executed as a DELETE of the old tuple (marking it as invisible) followed by an INSERT of the new tuple.

-- Targeting a specific row via the Primary Key (O(log N) cost)
UPDATE employees
SET salary = salary * 1.05,
    status = 'ACTIVE'
WHERE employee_id = 102;

2. The DELETE Statement

Deletes remove the visibility of rows. The actual disk space is reclaimed later by background garbage collection processes (e.g., VACUUM).

-- Deleting a subset of rows
DELETE FROM employees
WHERE status = 'TERMINATED' AND department_id = 50;

DML RequestUPDATE users...Buffer Pool (RAM)1. Fetch Page from Disk2. Modify in Memory3. Mark as "Dirty"WAL (Disk)Synchronous Flush(fsync commit)Ensures Durability


III. Production Anti-Patterns

  • The Unfiltered Mutation: Executing an UPDATE or DELETE without a WHERE clause. This will target the entire table, locking all rows and filling the transaction log until the disk is full.
  • Read-Modify-Write in App Code: Selecting a value into the application (e.g., balance = 100), modifying it in Python/Node (balance -= 10), and updating it (SET balance = 90). In concurrent environments, this causes Lost Updates. Always perform relative updates directly in SQL: UPDATE accounts SET balance = balance - 10 WHERE id = 1.
  • DML in Loops: Emitting 10,000 individual INSERT statements from an application loop. Each statement requires a network round trip and an individual disk commit. Use bulk inserts or COPY commands.

IV. Performance Bottlenecks

  • Write Amplification from Indexes: Every INSERT or DELETE requires an update to the primary clustered index and every non-clustered index on the table. A table with 15 indexes will incur a massive write penalty, completely bottlenecking high-throughput ingestion pipelines.
  • Lock Escalation: When a session acquires thousands of row-level locks (e.g., during a massive UPDATE), the database engine may "escalate" these into a single Table-Level Lock to save memory. This instantly blocks all other writes to the table.
  • Hot-Spot Page Contention: Multiple concurrent sessions attempting to INSERT rows into a table clustered by a monotonically increasing sequence (like an auto-increment ID). All threads compete for a lock on the "right-most" leaf page of the B+ tree.