Joins Interview Questions & Answers
33 questions Updated 2026-06-17
SQL join interview questions — inner vs outer joins, left vs right, self joins and how NULLs behave, with examples.
Read the in-depth guideSQL Joins Explained — INNER, OUTER, SELF & Anti-Joins with ExamplesA JOIN combines rows from two or more tables into one result set, matching them on a related column (typically a foreign key referencing a primary key). Relational databases store data in normalized tables to avoid duplication; joins are how you stitch that data back together at query time.
-- users(id, name) and orders(id, user_id, total)
SELECT users.name, orders.total
FROM users
JOIN orders ON orders.user_id = users.id;
The ON clause is the join condition — it decides which rows from the left
table pair with which rows from the right. The type of join (INNER, LEFT,
etc.) then decides what to do with rows that have no match.
The difference is what happens to unmatched rows:
INNER JOINreturns only rows that have a match in both tables. Rows with no counterpart are dropped from the result.OUTER JOIN(LEFT / RIGHT / FULL) keeps unmatched rows from one or both sides, filling the missing columns withNULL.
-- only users who have placed at least one order
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON o.user_id = u.id;
-- every user, with NULLs for those who never ordered
SELECT u.name, o.total
FROM users u
LEFT OUTER JOIN orders o ON o.user_id = u.id;
Think of INNER as the intersection and OUTER as "intersection plus the
leftovers from the chosen side(s)." (INNER and OUTER are optional keywords —
JOIN alone means INNER JOIN, and LEFT JOIN means LEFT OUTER JOIN.)
Both are outer joins; they differ only in which side is preserved:
LEFT JOINkeeps all rows from the left (first) table, plus matching rows from the right — unmatched right columns becomeNULL.RIGHT JOINkeeps all rows from the right (second) table, plus matches from the left.
They're mirror images: any RIGHT JOIN can be rewritten as a LEFT JOIN by swapping the table order, which is why teams often standardize on LEFT JOIN for readability.
-- these two return the same rows
SELECT u.name, o.id FROM users u LEFT JOIN orders o ON o.user_id = u.id;
SELECT u.name, o.id FROM orders o RIGHT JOIN users u ON o.user_id = u.id;
A self join is a table joined to itself, using table aliases to treat the one physical table as two logical ones. It's the standard way to relate rows within the same table — most classically, hierarchies where a row points to another row in the same table.
-- employees(id, name, manager_id) where manager_id -> employees.id
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
Here e is the "employee" view of the table and m is the "manager" view.
Using LEFT JOIN keeps top-level employees (whose manager_id is NULL) in
the result with a NULL manager. Aliases are mandatory — without them the
column references would be ambiguous.
A CROSS JOIN returns the Cartesian product: every row of the first table
paired with every row of the second, with no ON condition. If the
tables have M and N rows, you get M × N rows back — so it grows fast.
-- generate every size/color combination
SELECT s.label, c.name
FROM sizes s
CROSS JOIN colors c; -- 4 sizes × 6 colors = 24 rows
Use it deliberately — for generating combinations, building calendars, or
creating test data. An accidental cross join (forgetting the join
condition, the dreaded "comma join" FROM a, b) is a common cause of
runaway result sets and slow queries.
Use the anti-join pattern: LEFT JOIN the second table, then filter where
its key IS NULL. Because unmatched rows get NULL in the right-hand
columns, "right key is NULL" precisely selects the rows that had no match.
-- users who have never placed an order
SELECT u.*
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;
Two things to get right: filter on a column that's non-nullable in the source
table (like the right table's primary key o.id) so a NULL there truly
means "no row matched," and remember you must use IS NULL, never
= NULL — in SQL, anything = NULL evaluates to unknown, never true.
NOT EXISTS is an equivalent and often clearer alternative.
ON defines how rows are matched (it runs during the join); WHERE filters
the result after the join. For INNER joins they're often
interchangeable, but for OUTER joins they behave very differently.
-- keeps all users; only joins orders with amount > 100
SELECT u.name, o.amount
FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.amount > 100;
-- effectively INNER: WHERE drops users whose joined row is NULL
SELECT u.name, o.amount
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.amount > 100;
Put conditions on the joined (right) table in ON to preserve outer rows; put
them in WHERE to filter the final result.
An outer join fills unmatched right-table columns with NULL. A WHERE condition
on those columns (other than IS NULL) evaluates to unknown for the unmatched
rows and removes them — silently converting your LEFT JOIN into an INNER JOIN.
-- unmatched users have o.status = NULL -> WHERE drops them
SELECT u.name, o.status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.status = 'shipped';
-- move the predicate into ON to keep all users
LEFT JOIN orders o ON o.user_id = u.id AND o.status = 'shipped';
This is one of the most common SQL bugs. Rule: predicates on the optional side
belong in ON.
Chain JOIN clauses; each ON connects the new table to one already in the query.
The joins are applied left to right, building a progressively wider result.
SELECT u.name, o.id AS order_id, p.name AS product
FROM users u
JOIN orders o ON o.user_id = u.id
JOIN order_items i ON i.order_id = o.id
JOIN products p ON p.id = i.product_id;
Each join multiplies/filters rows based on its matches, so a user with many orders and items appears on many rows. Mind the cardinality — joining several one-to-many tables can explode row counts (the "fan trap").
USING (col) is shorthand for an equi-join when the join columns have the same
name in both tables: USING (user_id) ≡ ON a.user_id = b.user_id. It also
merges the shared column into one in the output.
SELECT name, amount
FROM users
JOIN orders USING (user_id); -- one user_id column in the result
Cleaner than repeating the column, but only works when names match exactly, and
the coalesced column can't be qualified with a table alias. ON is more explicit
and flexible.
NATURAL JOIN automatically joins on all columns with the same name in both
tables — no ON needed. It's dangerous because the join condition is implicit:
adding a same-named column later (like created_at) silently changes the join.
SELECT * FROM users NATURAL JOIN orders;
-- joins on EVERY shared column name — fragile and surprising
A new updated_at column on both tables would suddenly become part of the join
condition, breaking results with no error. Most teams avoid NATURAL JOIN in
favor of explicit ON/USING.
Join the tables, then GROUP BY the dimension you want to aggregate per, applying
aggregate functions to the joined rows. With outer joins, choose COUNT(column)
vs COUNT(*) carefully (next question).
SELECT u.name, COUNT(o.id) AS order_count, COALESCE(SUM(o.amount), 0) AS total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;
The LEFT JOIN keeps users with zero orders (counted as 0). Every non-aggregated
selected column must appear in GROUP BY (in standard SQL).
A join produces a row for every matching pair. If one side matches multiple rows on the other (a one-to-many relationship), the single-side values repeat across those matches — not true duplicates, but multiplied rows.
-- a user with 3 orders appears on 3 rows
SELECT u.name, o.id
FROM users u
JOIN orders o ON o.user_id = u.id;
To collapse them, aggregate (GROUP BY + COUNT/SUM), use DISTINCT, or
pre-aggregate the many-side in a subquery before joining. Joining two
one-to-many tables to the same parent multiplies rows (the fan trap).
COUNT(*) counts rows, including the NULL-filled row produced for an unmatched
LEFT JOIN — so users with no orders wrongly count as 1. COUNT(column) ignores
NULLs, giving the correct 0.
SELECT u.name,
COUNT(*) AS wrong, -- counts the NULL row -> 1 for zero-order users
COUNT(o.id) AS correct -- ignores NULLs -> 0
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;
Always COUNT a non-nullable column from the joined table (like its primary
key) when counting matches in an outer join.
Use a join when you need columns from both tables in the output. Use a
subquery (especially EXISTS/IN) when you only need to filter by the
existence of related rows, not return their columns.
-- join: need order data in the result
SELECT u.name, o.amount FROM users u JOIN orders o ON o.user_id = u.id;
-- subquery: only filter users who have any order
SELECT name FROM users u WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);
A join can produce duplicate rows when filtering by existence; EXISTS won't.
Modern optimizers often plan them similarly, so favor whichever is clearer.
Both test membership, but: EXISTS stops at the first match (often faster for
large/correlated subqueries) and handles NULLs safely. IN compares against a
value list and has a NULL trap — NOT IN with any NULL in the list returns
no rows.
-- if any user_id is NULL, NOT IN returns NOTHING
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM orders);
-- NOT EXISTS is NULL-safe
SELECT * FROM users u WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);
Prefer EXISTS/NOT EXISTS for correlated existence checks, especially when
NULLs are possible.
An anti-join returns rows from the first table that have no match in the
second. Two idioms: NOT EXISTS, or LEFT JOIN ... WHERE right.key IS NULL.
-- products never ordered (NOT EXISTS)
SELECT p.* FROM products p
WHERE NOT EXISTS (
SELECT 1 FROM order_items i WHERE i.product_id = p.id
);
-- equivalent LEFT JOIN form
SELECT p.* FROM products p
LEFT JOIN order_items i ON i.product_id = p.id
WHERE i.product_id IS NULL;
NOT EXISTS is usually clearest and NULL-safe; the LEFT JOIN form can be faster
with the right indexes. Avoid NOT IN here due to its NULL trap.
A semi-join returns rows from the first table that have at least one match in
the second — but without duplicating them per match and without returning the
second table's columns. EXISTS/IN express it.
-- users who have placed at least one order (each user once)
SELECT u.* FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
Contrast with an inner join, which would repeat a user once per order. Semi-join = "filter by existence." Anti-join = "filter by non-existence." Databases have dedicated semi/anti-join operators for these.
A FULL OUTER JOIN keeps all rows from both tables, matching where possible
and filling the non-matching side with NULLs. It's the union of LEFT and RIGHT
outer joins.
SELECT COALESCE(a.id, b.id) AS id, a.val AS left_val, b.val AS right_val
FROM table_a a
FULL OUTER JOIN table_b b ON a.id = b.id;
Useful for reconciliation ("what's in either source"), like comparing two datasets and finding rows present in one but not the other. Supported in PostgreSQL/SQL Server/Oracle, but not in MySQL (which needs an emulation).
MySQL lacks FULL OUTER JOIN, so you UNION a LEFT JOIN with a RIGHT JOIN (or a
LEFT JOIN where the left key is NULL), using UNION to dedupe the overlapping
matched rows.
SELECT a.id, a.val, b.val FROM a LEFT JOIN b ON a.id = b.id
UNION
SELECT b.id, a.val, b.val FROM a RIGHT JOIN b ON a.id = b.id;
UNION (not UNION ALL) removes the duplicate matched rows that appear in both
halves. The first half gives all left rows + matches; the second adds the
unmatched right rows.
An equi-join matches with equality (a.x = b.x) — by far the most common. A
non-equi join uses other operators (<, >, BETWEEN, !=), useful for
ranges, bands, and comparisons.
-- non-equi: match each sale to its price tier by range
SELECT s.amount, t.label
FROM sales s
JOIN tiers t ON s.amount BETWEEN t.min_amount AND t.max_amount;
Non-equi joins can match many rows and are costlier (no simple hash join), but they're powerful for bucketing, gaps-and-islands, and "find rows within a range" problems.
Combine the conditions in the ON clause with AND — all must match. This is
common for composite keys or matching on several attributes.
SELECT *
FROM order_items i
JOIN inventory v
ON v.product_id = i.product_id
AND v.warehouse_id = i.warehouse_id;
Every AND condition tightens the match. If the columns share names across both
tables, USING (product_id, warehouse_id) is a shorthand. Make sure composite-key
columns are indexed together for performance.
Join a table to itself on a key offset by one (e.g. matching id = id + 1, or
using a date difference) to put each row next to its neighbor — useful for
computing differences between sequential records.
SELECT a.day, b.sales - a.sales AS daily_change
FROM daily a
JOIN daily b ON b.day = a.day + INTERVAL '1 day';
Modern SQL often replaces this with window functions (LAG/LEAD), which are
cleaner and faster, but the self-join technique is the classic approach and still
appears in interviews.
The fan trap occurs when you join one parent to two different one-to-many
child tables. Their rows multiply against each other (a partial Cartesian
product), inflating aggregates like SUM.
-- orders × shipments multiply; SUM(amount) is overcounted
SELECT o.id, SUM(p.amount), SUM(s.weight)
FROM orders o
JOIN payments p ON p.order_id = o.id
JOIN shipments s ON s.order_id = o.id
GROUP BY o.id;
Fix by pre-aggregating each child in its own subquery before joining, so each contributes a single row per parent. Always sanity-check counts when joining multiple one-to-many tables.
Pre-aggregating collapses a one-to-many child to one row per key in a subquery, so the subsequent join doesn't multiply rows or distort aggregates (avoiding the fan trap and double counting).
SELECT u.name, o.order_count, o.total
FROM users u
LEFT JOIN (
SELECT user_id, COUNT(*) AS order_count, SUM(amount) AS total
FROM orders GROUP BY user_id
) o ON o.user_id = u.id;
Now each user joins to exactly one aggregated order row. This pattern also lets you combine multiple independent aggregates without them multiplying together.
Joins match rows on the ON columns, so an index on the join key (typically
the foreign key) lets the database find matches quickly instead of scanning the
whole table. Without it, joins degrade to slow full scans.
CREATE INDEX idx_orders_user_id ON orders(user_id); -- speeds up the join
SELECT * FROM users u JOIN orders o ON o.user_id = u.id;
Primary keys are indexed automatically, but foreign keys often aren't — a frequent cause of slow joins. Index both sides of frequent join conditions, and composite indexes for multi-column joins.
The optimizer picks among three physical strategies based on data size and indexes:
- Nested loop join — for each row in one table, look up matches in the other. Great with an index on the inner table; small inputs.
- Hash join — build a hash table on one input, probe with the other. Best for large, unindexed equi-joins.
- Merge join — sort both inputs on the key, then merge. Efficient when inputs are already sorted/indexed.
EXPLAIN SELECT * FROM users u JOIN orders o ON o.user_id = u.id;
-- shows which join algorithm the planner chose
You don't pick directly, but understanding them (and reading EXPLAIN) explains
why a query is slow.
Outer joins produce NULLs for unmatched rows. Use COALESCE to substitute a
default (0, '', 'N/A') so results are clean and aggregates behave.
SELECT u.name,
COALESCE(o.amount, 0) AS amount,
COALESCE(o.status, 'none') AS status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;
COALESCE returns the first non-NULL argument. It's essential after outer joins
and aggregates (COALESCE(SUM(x), 0)) so missing data shows as a sensible value
rather than NULL.
A join that matches one-to-many repeats the single-side rows. Options to dedupe:
DISTINCT, aggregation with GROUP BY, or restructuring to a semi-join
(EXISTS) when you don't need the joined columns.
-- DISTINCT removes exact duplicate rows
SELECT DISTINCT u.id, u.name
FROM users u JOIN orders o ON o.user_id = u.id;
-- better when you only want "users with orders":
SELECT u.id, u.name FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
Prefer EXISTS over DISTINCT when you're really filtering by existence —
DISTINCT does extra sorting/work to remove the duplicates the join created.
A LATERAL join (Postgres) / CROSS APPLY (SQL Server) lets a subquery in the
FROM clause reference columns from preceding tables in the same FROM — so
it runs per outer row. Ideal for "top-N per group."
-- each user's 3 most recent orders
SELECT u.name, o.id, o.created_at
FROM users u
CROSS JOIN LATERAL (
SELECT id, created_at FROM orders
WHERE user_id = u.id -- references the outer u
ORDER BY created_at DESC LIMIT 3
) o;
Regular subqueries can't see outer FROM columns; LATERAL can, making
per-row/per-group derived tables possible.
Join order and type matter. A LEFT JOIN followed by an INNER JOIN on the
optional table can drop the preserved rows, because the inner join requires a
match the NULL-filled rows don't have.
-- the INNER JOIN re-filters out users with no orders
SELECT u.name, oi.qty
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
JOIN order_items oi ON oi.order_id = o.id; -- inner -> drops NULL o.id
-- keep it LEFT all the way down
LEFT JOIN order_items oi ON oi.order_id = o.id;
To preserve outer rows through a chain, every downstream join on the optional path must also be an outer join.
A JOIN combines tables horizontally — adding columns by matching rows. A UNION combines result sets vertically — stacking rows from queries that have the same columns.
-- JOIN: wider rows (user + their order)
SELECT u.name, o.amount FROM users u JOIN orders o ON o.user_id = u.id;
-- UNION: more rows (current + archived orders)
SELECT id, amount FROM orders
UNION ALL
SELECT id, amount FROM archived_orders;
UNION dedupes; UNION ALL keeps duplicates (and is faster). Use JOIN to relate
tables, UNION to append similar datasets.
Use a non-equi join with BETWEEN or comparison operators in ON — matching each
row to all rows of the other table that fall within a range. Common for
time-windows, price tiers, and IP-range lookups.
-- attribute each event to the active campaign window it falls in
SELECT e.id, c.name
FROM events e
JOIN campaigns c
ON e.occurred_at BETWEEN c.start_at AND c.end_at;
Range joins can match multiple rows and don't use simple hash joins, so they're heavier — index the range columns, and ensure ranges don't unintentionally overlap (which multiplies rows).
Practice tests are coming soon
Get notified when interactive mock interviews and quizzes launch.