Data Querying (SELECT) & Logical Execution

Chapter 3: Data Querying (SELECT) & Logical Execution

The SELECT statement is the core of the Data Query Language (DQL). To write high-performance queries, engineers must understand not just the syntax, but the Logical Order of Execution—the exact sequence in which the database kernel parses, filters, and transforms the data.

I. The Logical Order of Execution

While an SQL query is written in a specific syntax order, the engine evaluates the clauses in a completely different sequence.

  1. FROM: Identifies the source tables and computes Cartesian products if joins are present.
  2. WHERE: Filters rows from the source based on boolean predicates.
  3. GROUP BY: Aggregates the remaining rows into distinct groups.
  4. HAVING: Filters the grouped aggregates.
  5. SELECT: Computes the projection (the columns to return), applying aliases and functions.
  6. ORDER BY: Sorts the final result set.
  7. LIMIT / OFFSET: Restricts the number of returned rows.

1. FROM2. WHERE3. GROUP4. HAVING5. SELECT6. ORDERUnderstanding this pipeline explains why you cannot use a SELECT alias in the WHERE clause.


II. Filtering & SARGability

The WHERE clause evaluates predicates using standard comparison (=, <>, <, >) and logical operators (AND, OR, NOT). The performance of a query is entirely dependent on whether its predicates are SARGable (Search ARgumentable).

A predicate is SARGable if the database engine can use a B+ tree index to satisfy it. If a predicate requires a function or computation on the column side, the engine is forced to evaluate that function for every row in the table (a Full Table Scan).

-- BAD: Non-SARGable. Forces a full table scan because the index cannot be traversed.
SELECT employee_id, last_name
FROM employees
WHERE SUBSTRING(last_name, 1, 1) = 'S';

-- GOOD: SARGable. The B+ tree index on 'last_name' can be traversed via a range seek.
SELECT employee_id, last_name
FROM employees
WHERE last_name LIKE 'S%';

III. Sorting & Pagination

The ORDER BY clause sorts the final result set. Without an ORDER BY, relational databases make zero guarantees about the order of returned rows.

-- Complex Sort: Descending by department, then ascending by salary
SELECT first_name, department_id, salary
FROM employees
WHERE status = 'ACTIVE'
ORDER BY department_id DESC, salary ASC;

Pagination is commonly implemented using OFFSET and FETCH NEXT (or LIMIT).

-- ANSI Standard Pagination (Skip first 100, return next 10)
SELECT employee_id, last_name
FROM employees
ORDER BY employee_id
OFFSET 100 ROWS FETCH NEXT 10 ROWS ONLY;

IV. Production Anti-Patterns

  • Alias Usage in WHERE: Attempting to use an alias defined in the SELECT clause within the WHERE clause. Because WHERE executes before SELECT, the alias does not yet exist in the execution pipeline.
  • The SELECT * Trap: Projecting all columns blindly. This forces the engine to read unneeded data pages from disk, prevents Index-Only Scans, and wastes network bandwidth during serialization.
  • Offset Pagination at Scale: Using OFFSET 1000000 requires the database engine to sort and scan one million rows just to throw them away and return the next 10. For deep pagination, always use Keyset Pagination (e.g., WHERE id > last_seen_id ORDER BY id LIMIT 10).

V. Performance Bottlenecks

  • Implicit Type Conversions: Filtering a string column against an integer value (e.g., WHERE varchar_zipcode = 90210). The database must implicitly cast the string column to an integer for every row, disabling the index and spiking CPU usage.
  • In-Memory Sort Exhaustion: If an ORDER BY clause cannot be satisfied by traversing a pre-sorted index, the engine must perform a "Sort" operation. If the dataset exceeds the allocated sort buffer (e.g., work_mem), the engine spills the intermediate sort data to temp files on disk, destroying throughput.