SQL JOINs vs Subqueries vs CTEs — Performance and When to Use Each

Three ways to combine data, one right answer per situation

Every intermediate SQL writer reaches a point where they can express the same query as a JOIN, a subquery, or a CTE — and aren't sure which to pick. The choice matters: a correlated subquery can turn a millisecond query into a minutes-long scan; a CTE with the wrong database can silently perform worse than a join. This guide explains how the query planner treats each form, when performance diverges, and the judgment that makes the right choice obvious.

Quick-reference comparison

	`JOIN`	Subquery	CTE (`WITH`)
What it does	Combines rows from two tables	Returns a result set used by the outer query	Names a temporary result set, referenced above the main query
Execution	Planner optimises join order, indexes	Inline subqueries often merged with outer; correlated ones run per row	Usually materialised once, or inlined by planner (database-dependent)
Performance	Generally fastest — planners are tuned for joins	Inline: similar to join. Correlated: O(n) per outer row	Depends on dialect — Postgres often materialises, MySQL/SQL Server may inline
Readability	Concise for simple cases; nests poorly	Can be hard to read inside `WHERE` or `FROM`	Best readability for multi-step logic
Reusable?	No	No	Yes — reference the same CTE multiple times
Recursive?	No	No	Yes (`WITH RECURSIVE`)
Best for	Combining tables, filtering by relation	Scalar lookups, `EXISTS` checks	Multi-step transforms, recursive queries, readability

JOINs — the planner's home territory

A JOIN is the primary way relational databases combine data. The query planner has decades of optimisation for join operations: it picks hash joins, merge joins, or nested loop joins based on table size and index availability, and it can reorder joins freely to minimise intermediate row counts.

-- Simple inner join: orders with their customer names
SELECT o.id, o.total, c.name
FROM   orders  o
JOIN   customers c ON c.id = o.customer_id
WHERE  o.total > 100;
-- Planner can use index on customer_id and o.total simultaneously

Joins shine for combining tables on a key and filtering by relationship. They're the right tool when both sides of the relationship are needed in the result.

-- LEFT JOIN: all customers, even those with no orders
SELECT c.name, COUNT(o.id) AS order_count
FROM   customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.name;

Rule of thumb: default to a JOIN for combining tables. If you find yourself writing a subquery where a join would work just as well, the join is almost always faster and clearer.

Deep dive: Joins interview questions

Subqueries — scalar lookups and existence checks

A subquery is a SELECT nested inside another statement. There are two fundamentally different kinds, and their performance characteristics are opposite.

Non-correlated subqueries (fast)

A non-correlated subquery runs once and hands its result to the outer query. The planner usually optimises it identically to a join.

-- Non-correlated: inner query runs ONCE, result reused
SELECT name
FROM   products
WHERE  price > (SELECT AVG(price) FROM products);
--              ↑ computed once, compared against every row

Correlated subqueries (can be slow)

A correlated subquery references a column from the outer query, forcing it to re-execute for every row the outer query processes. On a million-row table, that's a million subquery executions.

-- Correlated: re-runs the subquery for EVERY employee row — O(n²) risk
SELECT name, salary
FROM   employees e
WHERE  salary > (
    SELECT AVG(salary)
    FROM   employees
    WHERE  department = e.department  -- ← references outer row
);

The fix is almost always to push the subquery into a JOIN or CTE so it runs once:

-- Better: aggregate once, then join
WITH dept_avg AS (
    SELECT department, AVG(salary) AS avg_sal
    FROM   employees
    GROUP BY department
)
SELECT e.name, e.salary
FROM   employees e
JOIN   dept_avg d ON d.department = e.department
WHERE  e.salary > d.avg_sal;

`EXISTS` vs `IN` for membership checks

For checking whether a related row exists, EXISTS is usually faster than IN on large sets — it short-circuits as soon as it finds one match, while IN builds the full set.

-- EXISTS: stops scanning as soon as one match is found
SELECT name FROM customers c
WHERE  EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

-- IN: materialises the full set of customer_ids first
SELECT name FROM customers
WHERE  id IN (SELECT customer_id FROM orders);

Rule of thumb: use a subquery for scalar lookups (single value comparisons) and EXISTS checks. Never use a correlated subquery where a join or CTE with a GROUP BY would work — you'll turn a fast query into a table scan.

Deep dive: Subqueries interview questions

CTEs (`WITH`) — readability and multi-step logic

A Common Table Expression names a temporary result set at the top of a query. The main query can reference it like a table, and the same CTE can be referenced multiple times. CTEs don't change what is possible — anything a CTE expresses could also be a subquery — but they dramatically improve readability for multi-step logic.

-- Without CTE: deeply nested, hard to read
SELECT * FROM (
    SELECT department, AVG(salary) AS avg_sal
    FROM (
        SELECT * FROM employees WHERE active = true
    ) active_emps
    GROUP BY department
) dept_avg
WHERE avg_sal > 80000;

-- With CTE: reads top to bottom like a story
WITH active_employees AS (
    SELECT * FROM employees WHERE active = true
),
dept_avg AS (
    SELECT department, AVG(salary) AS avg_sal
    FROM   active_employees
    GROUP BY department
)
SELECT * FROM dept_avg WHERE avg_sal > 80000;

CTEs are also the only way to write recursive queries — traversing trees, graphs, or hierarchies — without procedural code:

-- Recursive CTE: walk an org chart
WITH RECURSIVE org AS (
    SELECT id, name, manager_id, 0 AS depth
    FROM   employees WHERE manager_id IS NULL   -- root
    UNION ALL
    SELECT e.id, e.name, e.manager_id, o.depth + 1
    FROM   employees e
    JOIN   org o ON o.id = e.manager_id          -- recurse
)
SELECT * FROM org ORDER BY depth;

CTE materialisation — the performance caveat

Whether a CTE is a performance help or hindrance depends on the database:

PostgreSQL (pre-14): CTEs are always materialised (computed once, result stored). This can be faster (when referenced many times) or slower (when the optimiser could push a filter inside).
PostgreSQL 14+ / SQL Server / MySQL 8+: the planner may inline a CTE, treating it like a subquery and optimising freely.
Oracle: similar to Postgres 14+ — may inline or materialise based on cost.

When you need Postgres to not materialise a CTE, use WITH ... AS NOT MATERIALIZED (14+).

Rule of thumb: use a CTE whenever a query has more than two logical steps or when you need to reference the same intermediate result more than once. Prefer a join over a CTE when performance is critical and the planner version is uncertain.

Deep dive: CTEs interview questions

The decision flowchart

Do I need to combine columns from two tables?
└── Yes → JOIN  (default choice; planner is optimised for it)

Do I need a single value from another table?
└── Yes → scalar subquery  (SELECT MAX(...) etc.)

Do I just need to check whether a related row exists?
└── Yes → EXISTS subquery  (short-circuits, usually faster than IN)

Does my logic have multiple steps, or do I reference the same
intermediate result more than once?
└── Yes → CTE  (readability + reuse; watch materialisation in old Postgres)

Does the query need to walk a tree or graph?
└── Yes → recursive CTE  (no other clean SQL option)

Interview tip — explain the trade-offs

Interviewers asking "use a join or a subquery?" are testing whether you understand the performance implications. The safe answer:

"I'd default to a join because planners have the most optimisation for that form. I'd use a non-correlated subquery for scalar comparisons, EXISTS for membership checks, and a CTE to break complex multi-step logic into readable pieces. I'd avoid correlated subqueries except in rare cases where the database rewrites them to a join anyway."

Recap

JOINs are the planner's home territory — use them to combine tables and let the optimiser choose the best execution strategy. Non-correlated subqueries run once and are often rewritten to joins internally; correlated subqueries re-run per outer row and can silently cause O(n²) performance — replace them with a JOIN + GROUP BY or a CTE. CTEs trade raw performance for readability and reusability, and are the only way to write recursive queries; their materialisation behaviour varies by database version. Default to joins, escalate to CTEs for clarity, and avoid correlated subqueries on large tables.

More ways to practice