What is a subquery?
A subquery is a SELECT statement nested inside another SQL statement. The
outer query treats the subquery's result as a value, a list, or a temporary
table. Subqueries let you express multi-step logic in a single statement — find
the highest order value, then fetch the order; find all customers who bought
product X, then count their other purchases.
Scalar subqueries — a single value
A scalar subquery returns exactly one row and one column. It can appear
anywhere an expression is valid: SELECT, WHERE, or HAVING.
-- Show each order with the company-wide average for comparison
SELECT
id,
customer_id,
total_amount,
(SELECT AVG(total_amount) FROM orders) AS company_avg,
total_amount - (SELECT AVG(total_amount) FROM orders) AS vs_avg
FROM orders
WHERE created_at >= '2026-01-01';
If a scalar subquery returns more than one row, the database raises an error.
Use aggregate functions (MAX, AVG, COUNT) or LIMIT 1 to guarantee a
single result.
Subqueries in WHERE — filtering by a derived value
-- Orders larger than the customer's own average order value
SELECT id, customer_id, total_amount
FROM orders o
WHERE total_amount > (
SELECT AVG(total_amount)
FROM orders
WHERE customer_id = o.customer_id -- correlated: references outer query
);
-- Most recent order for each customer (alternative to window function)
SELECT * FROM orders
WHERE id IN (
SELECT MAX(id)
FROM orders
GROUP BY customer_id
);
Correlated subqueries — referencing the outer row
A correlated subquery refers to a column from the outer query. It is re-executed for each row of the outer query — making it logically equivalent to a nested loop. This is powerful but can be slow on large tables; use an index on the correlated column or rewrite with a JOIN.
-- Customers whose most recent order was a refund
SELECT c.id, c.email
FROM customers c
WHERE (
SELECT status
FROM orders
WHERE customer_id = c.id
ORDER BY created_at DESC
LIMIT 1
) = 'refunded';
EXISTS and NOT EXISTS — the NULL-safe existence test
EXISTS returns TRUE as soon as the subquery finds one matching row — it
short-circuits and is efficient with an index on the join column.
-- Customers who have placed at least one order over £200
SELECT c.id, c.email
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.id
AND o.total_amount > 200
);
-- Products never added to any wishlist (anti-join via NOT EXISTS)
SELECT p.id, p.product_name
FROM products p
WHERE NOT EXISTS (
SELECT 1 FROM wishlist_items w WHERE w.product_id = p.id
);
NOT EXISTS handles NULLs correctly. NOT IN with a subquery fails silently
if the subquery returns any NULL — prefer NOT EXISTS for anti-join patterns.
Derived tables — subqueries in FROM
A subquery in FROM creates a derived table (also called an inline view).
The outer query treats it like a real table and can filter, join, and aggregate
it.
-- Top 5 customers by lifetime value, with their most recent order date
SELECT ranked.customer_id, ranked.lifetime_value, ranked.last_order
FROM (
SELECT
customer_id,
SUM(total_amount) AS lifetime_value,
MAX(created_at) AS last_order,
RANK() OVER (ORDER BY SUM(total_amount) DESC) AS rnk
FROM orders
GROUP BY customer_id
) AS ranked
WHERE ranked.rnk <= 5;
Derived tables must be given an alias. Every column in the subquery that the
outer query references must have a name (use AS if it is a computed column).
Subquery vs JOIN — when to use which
| Situation | Prefer |
|---|---|
| Need columns from the related table | JOIN |
| Just checking existence / absence | EXISTS / NOT EXISTS |
| Need aggregate from the related table per outer row | Correlated subquery or window function |
| Pre-aggregating before joining | Derived table or CTE |
| Filtering by a single derived value | Scalar subquery |
Correlated subqueries that run once per outer row are fine for small tables but can be expensive at scale. For large datasets, rewrite as a JOIN with pre-aggregation or use a window function.
Recap
Subqueries let you nest queries to express multi-step logic. Scalar
subqueries return one value and plug into any expression context. Correlated
subqueries reference the outer row — powerful but potentially slow; ensure the
correlated column is indexed. EXISTS / NOT EXISTS are the safest and
most efficient way to test membership and exclusion. Derived tables in FROM
act as named intermediate result sets and are the stepping stone to CTEs.