SQL Subqueries — Scalar, Correlated, and Derived Table Patterns

What is a subquery?

A subquery is a SELECT statement nested inside another SQL statement. The outer query treats the subquery's result as a value, a list, or a temporary table. Subqueries let you express multi-step logic in a single statement — find the highest order value, then fetch the order; find all customers who bought product X, then count their other purchases.

Scalar subqueries — a single value

A scalar subquery returns exactly one row and one column. It can appear anywhere an expression is valid: SELECT, WHERE, or HAVING.

-- Show each order with the company-wide average for comparison
SELECT
    id,
    customer_id,
    total_amount,
    (SELECT AVG(total_amount) FROM orders) AS company_avg,
    total_amount - (SELECT AVG(total_amount) FROM orders) AS vs_avg
FROM orders
WHERE created_at >= '2026-01-01';

If a scalar subquery returns more than one row, the database raises an error. Use aggregate functions (MAX, AVG, COUNT) or LIMIT 1 to guarantee a single result.

Subqueries in WHERE — filtering by a derived value

-- Orders larger than the customer's own average order value
SELECT id, customer_id, total_amount
FROM orders o
WHERE total_amount > (
    SELECT AVG(total_amount)
    FROM   orders
    WHERE  customer_id = o.customer_id  -- correlated: references outer query
);

-- Most recent order for each customer (alternative to window function)
SELECT * FROM orders
WHERE id IN (
    SELECT MAX(id)
    FROM   orders
    GROUP  BY customer_id
);

Correlated subqueries — referencing the outer row

A correlated subquery refers to a column from the outer query. It is re-executed for each row of the outer query — making it logically equivalent to a nested loop. This is powerful but can be slow on large tables; use an index on the correlated column or rewrite with a JOIN.

-- Customers whose most recent order was a refund
SELECT c.id, c.email
FROM customers c
WHERE (
    SELECT status
    FROM   orders
    WHERE  customer_id = c.id
    ORDER  BY created_at DESC
    LIMIT  1
) = 'refunded';

EXISTS and NOT EXISTS — the NULL-safe existence test

EXISTS returns TRUE as soon as the subquery finds one matching row — it short-circuits and is efficient with an index on the join column.

-- Customers who have placed at least one order over £200
SELECT c.id, c.email
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM   orders o
    WHERE  o.customer_id = c.id
      AND  o.total_amount > 200
);

-- Products never added to any wishlist (anti-join via NOT EXISTS)
SELECT p.id, p.product_name
FROM products p
WHERE NOT EXISTS (
    SELECT 1 FROM wishlist_items w WHERE w.product_id = p.id
);

NOT EXISTS handles NULLs correctly. NOT IN with a subquery fails silently if the subquery returns any NULL — prefer NOT EXISTS for anti-join patterns.

Derived tables — subqueries in FROM

A subquery in FROM creates a derived table (also called an inline view). The outer query treats it like a real table and can filter, join, and aggregate it.

-- Top 5 customers by lifetime value, with their most recent order date
SELECT ranked.customer_id, ranked.lifetime_value, ranked.last_order
FROM (
    SELECT
        customer_id,
        SUM(total_amount) AS lifetime_value,
        MAX(created_at)   AS last_order,
        RANK() OVER (ORDER BY SUM(total_amount) DESC) AS rnk
    FROM orders
    GROUP BY customer_id
) AS ranked
WHERE ranked.rnk <= 5;

Derived tables must be given an alias. Every column in the subquery that the outer query references must have a name (use AS if it is a computed column).

Subquery vs JOIN — when to use which

Situation	Prefer
Need columns from the related table	JOIN
Just checking existence / absence	EXISTS / NOT EXISTS
Need aggregate from the related table per outer row	Correlated subquery or window function
Pre-aggregating before joining	Derived table or CTE
Filtering by a single derived value	Scalar subquery

Correlated subqueries that run once per outer row are fine for small tables but can be expensive at scale. For large datasets, rewrite as a JOIN with pre-aggregation or use a window function.

Recap

Subqueries let you nest queries to express multi-step logic. Scalar subqueries return one value and plug into any expression context. Correlated subqueries reference the outer row — powerful but potentially slow; ensure the correlated column is indexed. EXISTS / NOT EXISTS are the safest and most efficient way to test membership and exclusion. Derived tables in FROM act as named intermediate result sets and are the stepping stone to CTEs.

More ways to practice