SQL Joins Explained — INNER, OUTER, SELF & Anti-Joins with Examples

SQL joins, explained

Relational databases split data across normalized tables to avoid duplication. Joins are how you stitch that data back together at query time. Knowing the join types is table stakes; knowing how NULLs behave, why rows multiply, and where the ON vs WHERE distinction bites is what separates correct queries from subtly wrong ones. This guide covers all of it with runnable examples (users, orders).

What a join is

A join combines rows from two or more tables by matching a related column — typically a foreign key referencing a primary key. The ON clause is the join condition; the join type decides what to do with rows that have no match.

SELECT users.name, orders.total
FROM users
JOIN orders ON orders.user_id = users.id;

INNER vs OUTER

INNER JOIN returns only rows with a match in both tables — the intersection.
OUTER JOIN (LEFT/RIGHT/FULL) keeps unmatched rows from one or both sides, filling missing columns with NULL.

-- only users who have ordered
SELECT u.name, o.total FROM users u INNER JOIN orders o ON o.user_id = u.id;

-- every user, NULL for those who never ordered
SELECT u.name, o.total FROM users u LEFT JOIN orders o ON o.user_id = u.id;

JOIN alone means INNER JOIN; LEFT JOIN means LEFT OUTER JOIN. LEFT keeps all rows from the left table; RIGHT keeps all from the right — they're mirror images, so teams usually standardize on LEFT for readability. FULL OUTER keeps everything from both sides (great for reconciliation), though MySQL lacks it and needs a UNION of a LEFT and RIGHT join to emulate it.

ON vs WHERE — the trap that turns LEFT into INNER

ON defines how rows match (during the join); WHERE filters the result (after the join). For INNER joins they're often interchangeable. For OUTER joins they are not: a WHERE condition on the optional table's columns evaluates to unknown for the NULL-filled rows and removes them, silently converting your LEFT JOIN into an INNER JOIN.

-- unmatched users have o.status = NULL -> WHERE drops them
SELECT u.name, o.status FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.status = 'shipped';

-- predicate on the optional side belongs in ON
SELECT u.name, o.status FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.status = 'shipped';

Rule: put conditions on the joined (right) table in ON to preserve outer rows; put conditions on the final result in WHERE.

Self joins and multiple joins

A self join joins a table to itself using aliases — ideal for hierarchies stored in one table.

SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

To join three or more tables, chain JOIN clauses; each ON connects the new table to one already in the query.

SELECT u.name, o.id, p.name
FROM users u
JOIN orders o      ON o.user_id = u.id
JOIN order_items i ON i.order_id = o.id
JOIN products p    ON p.id = i.product_id;

For outer-join chains, every downstream join on the optional path must also be an outer join, or an inner join will re-filter the preserved rows back out.

Why joins multiply rows (and the COUNT trap)

A join produces a row for every matching pair. If one side matches multiple rows on the other (one-to-many), the single-side values repeat. Joining a parent to two one-to-many children multiplies their rows together — the fan trap — inflating aggregates like SUM.

-- a user with 3 orders appears on 3 rows
SELECT u.name, o.id FROM users u JOIN orders o ON o.user_id = u.id;

Two fixes: aggregate with GROUP BY, or pre-aggregate each child in its own subquery before joining so each contributes one row. And watch the COUNT trap with LEFT JOIN: COUNT(*) counts the NULL-filled row (so zero-match rows wrongly count as 1), while COUNT(column) ignores NULLs.

SELECT u.name,
       COUNT(*)    AS wrong,   -- counts the NULL row
       COUNT(o.id) AS correct  -- ignores NULLs -> 0 for users with no orders
FROM users u LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;

Always COUNT a non-nullable column from the joined table.

Semi-joins and anti-joins

Sometimes you want to filter by existence, not return the other table's columns.

Semi-join — rows that have at least one match (each row once, no duplicates).
Anti-join — rows that have no match.

-- semi-join: users with at least one order
SELECT u.* FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

-- anti-join: products never ordered
SELECT p.* FROM products p
WHERE NOT EXISTS (SELECT 1 FROM order_items i WHERE i.product_id = p.id);

EXISTS stops at the first match and is NULL-safe, unlike NOT IN — which returns no rows if the value list contains any NULL. The anti-join can also be written as a LEFT JOIN ... WHERE right_key IS NULL. (Remember: in SQL, anything = NULL is unknown, never true — always use IS NULL.)

CROSS JOIN, non-equi joins, and performance

A CROSS JOIN produces the Cartesian product (every row paired with every row) — useful for generating combinations, dangerous by accident. Non-equi joins use <, >, or BETWEEN instead of equality — handy for matching values to ranges (price tiers, time windows):

SELECT e.id, c.name
FROM events e
JOIN campaigns c ON e.occurred_at BETWEEN c.start_at AND c.end_at;

Performance hinges on indexes: the database matches on the ON columns, so an index on the join key (usually the foreign key, which often isn't indexed automatically) turns a full scan into a fast lookup. The optimizer then picks a physical strategy — nested loop (small/indexed), hash join (large unindexed equi-joins), or merge join (sorted inputs). Use EXPLAIN to see which.

Recap

Joins combine tables on a related column; the type decides how unmatched rows are handled — INNER keeps only matches, OUTER keeps the leftovers with NULLs. The biggest gotchas are putting optional-table predicates in WHERE (which silently makes a LEFT JOIN behave like INNER), row multiplication from one-to-many joins (fix with aggregation or pre-aggregation), and the LEFT JOIN COUNT trap. Use EXISTS/NOT EXISTS for NULL-safe existence checks, and index your join keys. Get those right and your joins return exactly the rows you intend.

Practice tests are coming soon