SQL joins, explained
Relational databases split data across normalized tables to avoid duplication. Joins
are how you stitch that data back together at query time. Knowing the join types is
table stakes; knowing how NULLs behave, why rows multiply, and where the ON vs
WHERE distinction bites is what separates correct queries from subtly wrong ones. This
guide covers all of it with runnable examples (users, orders).
What a join is
A join combines rows from two or more tables by matching a related column — typically a
foreign key referencing a primary key. The ON clause is the join condition; the
join type decides what to do with rows that have no match.
SELECT users.name, orders.total
FROM users
JOIN orders ON orders.user_id = users.id;
INNER vs OUTER
INNER JOINreturns only rows with a match in both tables — the intersection.OUTER JOIN(LEFT/RIGHT/FULL) keeps unmatched rows from one or both sides, filling missing columns withNULL.
-- only users who have ordered
SELECT u.name, o.total FROM users u INNER JOIN orders o ON o.user_id = u.id;
-- every user, NULL for those who never ordered
SELECT u.name, o.total FROM users u LEFT JOIN orders o ON o.user_id = u.id;
JOIN alone means INNER JOIN; LEFT JOIN means LEFT OUTER JOIN. LEFT keeps all
rows from the left table; RIGHT keeps all from the right — they're mirror images, so
teams usually standardize on LEFT for readability. FULL OUTER keeps everything from
both sides (great for reconciliation), though MySQL lacks it and needs a UNION of a
LEFT and RIGHT join to emulate it.
ON vs WHERE — the trap that turns LEFT into INNER
ON defines how rows match (during the join); WHERE filters the result (after the
join). For INNER joins they're often interchangeable. For OUTER joins they are not: a
WHERE condition on the optional table's columns evaluates to unknown for the
NULL-filled rows and removes them, silently converting your LEFT JOIN into an INNER
JOIN.
-- unmatched users have o.status = NULL -> WHERE drops them
SELECT u.name, o.status FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.status = 'shipped';
-- predicate on the optional side belongs in ON
SELECT u.name, o.status FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.status = 'shipped';
Rule: put conditions on the joined (right) table in ON to preserve outer rows; put
conditions on the final result in WHERE.
Self joins and multiple joins
A self join joins a table to itself using aliases — ideal for hierarchies stored in one table.
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
To join three or more tables, chain JOIN clauses; each ON connects the new table to
one already in the query.
SELECT u.name, o.id, p.name
FROM users u
JOIN orders o ON o.user_id = u.id
JOIN order_items i ON i.order_id = o.id
JOIN products p ON p.id = i.product_id;
For outer-join chains, every downstream join on the optional path must also be an outer join, or an inner join will re-filter the preserved rows back out.
Why joins multiply rows (and the COUNT trap)
A join produces a row for every matching pair. If one side matches multiple rows on
the other (one-to-many), the single-side values repeat. Joining a parent to two
one-to-many children multiplies their rows together — the fan trap — inflating
aggregates like SUM.
-- a user with 3 orders appears on 3 rows
SELECT u.name, o.id FROM users u JOIN orders o ON o.user_id = u.id;
Two fixes: aggregate with GROUP BY, or pre-aggregate each child in its own subquery
before joining so each contributes one row. And watch the COUNT trap with LEFT JOIN:
COUNT(*) counts the NULL-filled row (so zero-match rows wrongly count as 1), while
COUNT(column) ignores NULLs.
SELECT u.name,
COUNT(*) AS wrong, -- counts the NULL row
COUNT(o.id) AS correct -- ignores NULLs -> 0 for users with no orders
FROM users u LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;
Always COUNT a non-nullable column from the joined table.
Semi-joins and anti-joins
Sometimes you want to filter by existence, not return the other table's columns.
- Semi-join — rows that have at least one match (each row once, no duplicates).
- Anti-join — rows that have no match.
-- semi-join: users with at least one order
SELECT u.* FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
-- anti-join: products never ordered
SELECT p.* FROM products p
WHERE NOT EXISTS (SELECT 1 FROM order_items i WHERE i.product_id = p.id);
EXISTS stops at the first match and is NULL-safe, unlike NOT IN — which returns
no rows if the value list contains any NULL. The anti-join can also be written as a
LEFT JOIN ... WHERE right_key IS NULL. (Remember: in SQL, anything = NULL is
unknown, never true — always use IS NULL.)
CROSS JOIN, non-equi joins, and performance
A CROSS JOIN produces the Cartesian product (every row paired with every row) — useful
for generating combinations, dangerous by accident. Non-equi joins use <, >, or
BETWEEN instead of equality — handy for matching values to ranges (price tiers,
time windows):
SELECT e.id, c.name
FROM events e
JOIN campaigns c ON e.occurred_at BETWEEN c.start_at AND c.end_at;
Performance hinges on indexes: the database matches on the ON columns, so an index
on the join key (usually the foreign key, which often isn't indexed automatically) turns
a full scan into a fast lookup. The optimizer then picks a physical strategy — nested
loop (small/indexed), hash join (large unindexed equi-joins), or merge join
(sorted inputs). Use EXPLAIN to see which.
Recap
Joins combine tables on a related column; the type decides how unmatched rows are
handled — INNER keeps only matches, OUTER keeps the leftovers with NULLs. The
biggest gotchas are putting optional-table predicates in WHERE (which silently makes a
LEFT JOIN behave like INNER), row multiplication from one-to-many joins (fix with
aggregation or pre-aggregation), and the LEFT JOIN COUNT trap. Use EXISTS/NOT EXISTS for NULL-safe existence checks, and index your join keys. Get those right and
your joins return exactly the rows you intend.