Chapter 1: SQL Foundations & DDL (Data Definition Language)
Data Definition Language (DDL) is the subset of SQL responsible for defining the structural blueprints of the database. DDL operations dictate how data is physically and logically stored on disk, establishing the strict schema and constraints required to guarantee data integrity before a single row is inserted.
I. Schema Design & Table Creation (CREATE)
The CREATE TABLE statement initializes a new relation. At the storage engine layer, this typically allocates a new segment or file on disk and initializes the metadata in the system catalog.
1. Data Types & Storage Efficiency
Choosing the correct data type is a paramount engineering decision. It affects disk space, RAM usage in the Buffer Pool, and CPU cache line efficiency during vectorization.
- Exact Numerics:
INT,BIGINT,DECIMAL(p, s). UseDECIMALfor financial data to prevent IEEE 754 floating-point rounding errors. - Strings:
VARCHAR(n)(variable length) vs.CHAR(n)(fixed length).CHARpads with spaces, which can lead to wasted bytes but prevents row fragmentation during updates. - Dates/Times:
TIMESTAMP(without timezone) vs.TIMESTAMP WITH TIME ZONE. Always store UTC in production systems.
CREATE TABLE employees (
employee_id BIGINT PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
salary DECIMAL(12, 2) CHECK (salary > 0),
department_id INT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
2. Integrity Constraints
Constraints are business rules enforced by the database kernel.
- PRIMARY KEY: Enforces uniqueness and
NOT NULL. Physically, it almost always creates a Clustered B+ Tree Index. - FOREIGN KEY: Enforces referential integrity. An insert/update in the child table requires an
O(log N)lookup in the parent table's index. - CHECK: Evaluates a boolean expression on insertion or update.
II. Schema Evolution (ALTER & DROP)
In continuous deployment pipelines, schemas must evolve without requiring extensive downtime.
1. ALTER TABLE
Modifies an existing structure. Depending on the database engine, some ALTER operations are metadata-only (instant), while others require an expensive table rewrite (blocking).
-- Adding a column (Typically metadata-only if no default value is enforced)
ALTER TABLE employees ADD COLUMN status VARCHAR(20);
-- Modifying a data type (May require an O(N) table rewrite)
ALTER TABLE employees ALTER COLUMN status SET DATA TYPE VARCHAR(50);
-- Adding a constraint
ALTER TABLE employees ADD CONSTRAINT chk_status CHECK (status IN ('ACTIVE', 'TERMINATED'));
2. DROP vs. TRUNCATE
DROP TABLE: Completely removes the table structure, its data, and its associated indexes from the disk and the catalog.TRUNCATE TABLE: Removes all data quickly. UnlikeDELETE, it does not scan the table or log individual row deletions in the WAL. It simply deallocates the data pages, making it anO(1)metadata operation. It resetsIDENTITYsequences but cannot be used if a foreign key references the table.
III. Production Anti-Patterns
- Using Wide/Unbounded Types by Default: Using
VARCHAR(MAX)orTEXTfor small strings like email addresses. This prevents the query optimizer from accurately estimating memory required for sorting and hashing. - Dropping Tables in Production: Executing
DROP TABLEabruptly. Always perform a soft-drop (renaming the table totable_obsolete) and observe for a week before reclaiming the disk space. - Missing Foreign Keys for "Performance": Removing FKs to speed up bulk inserts. This transfers the burden of data integrity to the application code, which almost always results in orphaned records and corrupted analytic reports over time.
IV. Performance Bottlenecks
- Blocking DDL Operations: Executing an
ALTER TABLEthat requires a full table rewrite (e.g., adding a column with a non-constant default value in older database versions). This acquires an Access Exclusive Lock, blocking allSELECT,INSERT, andUPDATEqueries until the rewrite completes. - Catalog Bloat: Creating and dropping thousands of temporary tables in a single session. This bloats the system catalog tables, slowing down subsequent query planning phases. Use CTEs or memory-optimized table variables where possible.