Question 1

What is a JOIN?

Accepted Answer

A JOIN combines rows from **two or more tables** into one result set, matching
them on a **related column** (typically a foreign key referencing a primary
key). Relational databases store data in *normalized* tables to avoid
duplication; joins are how you stitch that data back together at query time.

```sql
-- users(id, name)  and  orders(id, user_id, total)
SELECT users.name, orders.total
FROM users
JOIN orders ON orders.user_id = users.id;
```

The `ON` clause is the **join condition** — it decides which rows from the left
table pair with which rows from the right. The *type* of join (INNER, LEFT,
etc.) then decides what to do with rows that have **no match**.

Question 2

What is the difference between INNER JOIN and OUTER JOIN?

Accepted Answer

The difference is what happens to **unmatched rows**:

- **`INNER JOIN`** returns **only** rows that have a match in *both* tables.
  Rows with no counterpart are dropped from the result.
- **`OUTER JOIN`** (LEFT / RIGHT / FULL) **keeps unmatched rows** from one or
  both sides, filling the missing columns with **`NULL`**.

```sql
-- only users who have placed at least one order
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON o.user_id = u.id;

-- every user, with NULLs for those who never ordered
SELECT u.name, o.total
FROM users u
LEFT OUTER JOIN orders o ON o.user_id = u.id;
```

Think of INNER as the **intersection** and OUTER as "intersection **plus** the
leftovers from the chosen side(s)." (`INNER` and `OUTER` are optional keywords —
`JOIN` alone means `INNER JOIN`, and `LEFT JOIN` means `LEFT OUTER JOIN`.)

Question 3

What is the difference between LEFT and RIGHT JOIN?

Accepted Answer

Both are outer joins; they differ only in **which side is preserved**:

- **`LEFT JOIN`** keeps **all rows from the left** (first) table, plus matching
  rows from the right — unmatched right columns become `NULL`.
- **`RIGHT JOIN`** keeps **all rows from the right** (second) table, plus
  matches from the left.

They're **mirror images**: any RIGHT JOIN can be rewritten as a LEFT JOIN by
swapping the table order, which is why teams often standardize on LEFT JOIN for
readability.

```sql
-- these two return the same rows
SELECT u.name, o.id FROM users u LEFT  JOIN orders o ON o.user_id = u.id;
SELECT u.name, o.id FROM orders o RIGHT JOIN users  u ON o.user_id = u.id;
```

Question 4

What is a self join?

Accepted Answer

A self join is a table **joined to itself**, using **table aliases** to treat
the one physical table as two logical ones. It's the standard way to relate
rows *within* the same table — most classically, **hierarchies** where a row
points to another row in the same table.

```sql
-- employees(id, name, manager_id) where manager_id -> employees.id
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
```

Here `e` is the "employee" view of the table and `m` is the "manager" view.
Using `LEFT JOIN` keeps top-level employees (whose `manager_id` is `NULL`) in
the result with a `NULL` manager. Aliases are **mandatory** — without them the
column references would be ambiguous.

Question 5

What does a CROSS JOIN produce?

Accepted Answer

A `CROSS JOIN` returns the **Cartesian product**: every row of the first table
paired with **every** row of the second, with **no `ON` condition**. If the
tables have *M* and *N* rows, you get *M × N* rows back — so it grows fast.

```sql
-- generate every size/color combination
SELECT s.label, c.name
FROM sizes s
CROSS JOIN colors c;   -- 4 sizes × 6 colors = 24 rows
```

Use it deliberately — for generating combinations, building calendars, or
creating test data. An **accidental** cross join (forgetting the join
condition, the dreaded "comma join" `FROM a, b`) is a common cause of
runaway result sets and slow queries.

Question 6

How do you find rows with no match using a join?

Accepted Answer

Use the **anti-join** pattern: `LEFT JOIN` the second table, then filter where
its key **`IS NULL`**. Because unmatched rows get `NULL` in the right-hand
columns, "right key is NULL" precisely selects the rows that had **no match**.

```sql
-- users who have never placed an order
SELECT u.*
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;
```

Two things to get right: filter on a column that's **non-nullable in the source
table** (like the right table's primary key `o.id`) so a `NULL` there truly
means "no row matched," and remember you must use **`IS NULL`**, never
`= NULL` — in SQL, `anything = NULL` evaluates to `unknown`, never `true`.
`NOT EXISTS` is an equivalent and often clearer alternative.

Question 7

What is the difference between the ON and WHERE clauses in a join?

Accepted Answer

`ON` defines **how rows are matched** (it runs *during* the join); `WHERE` filters
the **result** *after* the join. For **INNER** joins they're often
interchangeable, but for **OUTER** joins they behave very differently.

```sql
-- keeps all users; only joins orders with amount > 100
SELECT u.name, o.amount
FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.amount > 100;

-- effectively INNER: WHERE drops users whose joined row is NULL
SELECT u.name, o.amount
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.amount > 100;
```

Put conditions on the **joined (right) table** in `ON` to preserve outer rows; put
them in `WHERE` to filter the final result.

Question 8

Why does filtering the right table in WHERE turn a LEFT JOIN into an INNER JOIN?

Accepted Answer

An outer join fills unmatched right-table columns with `NULL`. A `WHERE` condition
on those columns (other than `IS NULL`) evaluates to `unknown` for the unmatched
rows and **removes them** — silently converting your LEFT JOIN into an INNER JOIN.

```sql
-- unmatched users have o.status = NULL -> WHERE drops them
SELECT u.name, o.status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.status = 'shipped';

-- move the predicate into ON to keep all users
LEFT JOIN orders o ON o.user_id = u.id AND o.status = 'shipped';
```

This is one of the most common SQL bugs. Rule: predicates on the optional side
belong in `ON`.

Question 9

How do you join three or more tables?

Accepted Answer

Chain `JOIN` clauses; each `ON` connects the new table to one already in the query.
The joins are applied left to right, building a progressively wider result.

```sql
SELECT u.name, o.id AS order_id, p.name AS product
FROM users u
JOIN orders o      ON o.user_id = u.id
JOIN order_items i ON i.order_id = o.id
JOIN products p    ON p.id = i.product_id;
```

Each join multiplies/filters rows based on its matches, so a user with many orders
and items appears on many rows. Mind the cardinality — joining several
one-to-many tables can explode row counts (the "fan trap").

Question 10

What does the USING clause do?

Accepted Answer

`USING (col)` is shorthand for an equi-join when the join columns have the **same
name** in both tables: `USING (user_id)` ≡ `ON a.user_id = b.user_id`. It also
**merges** the shared column into one in the output.

```sql
SELECT name, amount
FROM users
JOIN orders USING (user_id);   -- one user_id column in the result
```

Cleaner than repeating the column, but only works when names match exactly, and
the coalesced column can't be qualified with a table alias. `ON` is more explicit
and flexible.

Question 11

What is a NATURAL JOIN and why is it risky?

Accepted Answer

`NATURAL JOIN` automatically joins on **all columns with the same name** in both
tables — no `ON` needed. It's dangerous because the join condition is **implicit**:
adding a same-named column later (like `created_at`) silently changes the join.

```sql
SELECT * FROM users NATURAL JOIN orders;
-- joins on EVERY shared column name — fragile and surprising
```

A new `updated_at` column on both tables would suddenly become part of the join
condition, breaking results with no error. Most teams **avoid** `NATURAL JOIN` in
favor of explicit `ON`/`USING`.

Question 12

How do you combine a join with aggregation?

Accepted Answer

Join the tables, then `GROUP BY` the dimension you want to aggregate per, applying
aggregate functions to the joined rows. With outer joins, choose `COUNT(column)`
vs `COUNT(*)` carefully (next question).

```sql
SELECT u.name, COUNT(o.id) AS order_count, COALESCE(SUM(o.amount), 0) AS total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;
```

The `LEFT JOIN` keeps users with zero orders (counted as 0). Every non-aggregated
selected column must appear in `GROUP BY` (in standard SQL).

Question 13

Why do joins sometimes produce duplicate rows?

Accepted Answer

A join produces a row for **every matching pair**. If one side matches **multiple**
rows on the other (a one-to-many relationship), the single-side values repeat
across those matches — not true duplicates, but multiplied rows.

```sql
-- a user with 3 orders appears on 3 rows
SELECT u.name, o.id
FROM users u
JOIN orders o ON o.user_id = u.id;
```

To collapse them, aggregate (`GROUP BY` + `COUNT`/`SUM`), use `DISTINCT`, or
pre-aggregate the many-side in a subquery before joining. Joining two
one-to-many tables to the same parent multiplies rows (the fan trap).

Question 14

What is the COUNT trap with LEFT JOIN?

Accepted Answer

`COUNT(*)` counts **rows**, including the NULL-filled row produced for an unmatched
LEFT JOIN — so users with no orders wrongly count as 1. `COUNT(column)` ignores
`NULL`s, giving the correct 0.

```sql
SELECT u.name,
       COUNT(*)    AS wrong,   -- counts the NULL row -> 1 for zero-order users
       COUNT(o.id) AS correct  -- ignores NULLs -> 0
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;
```

Always `COUNT` a **non-nullable column from the joined table** (like its primary
key) when counting matches in an outer join.

Question 15

When should you use a join vs a subquery?

Accepted Answer

Use a **join** when you need **columns from both tables** in the output. Use a
**subquery** (especially `EXISTS`/`IN`) when you only need to **filter** by the
existence of related rows, not return their columns.

```sql
-- join: need order data in the result
SELECT u.name, o.amount FROM users u JOIN orders o ON o.user_id = u.id;

-- subquery: only filter users who have any order
SELECT name FROM users u WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
);
```

A join can produce duplicate rows when filtering by existence; `EXISTS` won't.
Modern optimizers often plan them similarly, so favor whichever is clearer.

Question 16

What is the difference between EXISTS and IN?

Accepted Answer

Both test membership, but: `EXISTS` stops at the **first match** (often faster for
large/correlated subqueries) and handles `NULL`s safely. `IN` compares against a
value list and has a **NULL trap** — `NOT IN` with any `NULL` in the list returns
no rows.

```sql
-- if any user_id is NULL, NOT IN returns NOTHING
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM orders);

-- NOT EXISTS is NULL-safe
SELECT * FROM users u WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
);
```

Prefer `EXISTS`/`NOT EXISTS` for correlated existence checks, especially when
`NULL`s are possible.

Question 17

How do you write an anti-join?

Accepted Answer

An anti-join returns rows from the first table that have **no match** in the
second. Two idioms: `NOT EXISTS`, or `LEFT JOIN ... WHERE right.key IS NULL`.

```sql
-- products never ordered (NOT EXISTS)
SELECT p.* FROM products p
WHERE NOT EXISTS (
  SELECT 1 FROM order_items i WHERE i.product_id = p.id
);

-- equivalent LEFT JOIN form
SELECT p.* FROM products p
LEFT JOIN order_items i ON i.product_id = p.id
WHERE i.product_id IS NULL;
```

`NOT EXISTS` is usually clearest and NULL-safe; the LEFT JOIN form can be faster
with the right indexes. Avoid `NOT IN` here due to its NULL trap.

Question 18

What is a semi-join?

Accepted Answer

A semi-join returns rows from the first table that have **at least one match** in
the second — but **without** duplicating them per match and without returning the
second table's columns. `EXISTS`/`IN` express it.

```sql
-- users who have placed at least one order (each user once)
SELECT u.* FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
```

Contrast with an inner join, which would repeat a user once per order. Semi-join =
"filter by existence." Anti-join = "filter by non-existence." Databases have
dedicated semi/anti-join operators for these.

Question 19

What does a FULL OUTER JOIN do?

Accepted Answer

A `FULL OUTER JOIN` keeps **all rows from both tables**, matching where possible
and filling the non-matching side with `NULL`s. It's the union of LEFT and RIGHT
outer joins.

```sql
SELECT COALESCE(a.id, b.id) AS id, a.val AS left_val, b.val AS right_val
FROM table_a a
FULL OUTER JOIN table_b b ON a.id = b.id;
```

Useful for reconciliation ("what's in either source"), like comparing two datasets
and finding rows present in one but not the other. Supported in PostgreSQL/SQL
Server/Oracle, but **not** in MySQL (which needs an emulation).

Question 20

How do you emulate a FULL OUTER JOIN in MySQL?

Accepted Answer

MySQL lacks `FULL OUTER JOIN`, so you **UNION a LEFT JOIN with a RIGHT JOIN** (or a
LEFT JOIN where the left key is NULL), using `UNION` to dedupe the overlapping
matched rows.

```sql
SELECT a.id, a.val, b.val FROM a LEFT JOIN b ON a.id = b.id
UNION
SELECT b.id, a.val, b.val FROM a RIGHT JOIN b ON a.id = b.id;
```

`UNION` (not `UNION ALL`) removes the duplicate matched rows that appear in both
halves. The first half gives all left rows + matches; the second adds the
unmatched right rows.

Question 21

What is the difference between an equi-join and a non-equi join?

Accepted Answer

An **equi-join** matches with equality (`a.x = b.x`) — by far the most common. A
**non-equi join** uses other operators (`<`, `>`, `BETWEEN`, `!=`), useful for
ranges, bands, and comparisons.

```sql
-- non-equi: match each sale to its price tier by range
SELECT s.amount, t.label
FROM sales s
JOIN tiers t ON s.amount BETWEEN t.min_amount AND t.max_amount;
```

Non-equi joins can match many rows and are costlier (no simple hash join), but
they're powerful for bucketing, gaps-and-islands, and "find rows within a range"
problems.

Question 22

How do you join on multiple columns?

Accepted Answer

Combine the conditions in the `ON` clause with `AND` — all must match. This is
common for **composite keys** or matching on several attributes.

```sql
SELECT *
FROM order_items i
JOIN inventory v
  ON v.product_id = i.product_id
 AND v.warehouse_id = i.warehouse_id;
```

Every `AND` condition tightens the match. If the columns share names across both
tables, `USING (product_id, warehouse_id)` is a shorthand. Make sure composite-key
columns are indexed together for performance.

Question 23

How do you use a self join to compare consecutive rows?

Accepted Answer

Join a table to itself on a key offset by one (e.g. matching `id = id + 1`, or
using a date difference) to put each row next to its neighbor — useful for
computing differences between sequential records.

```sql
SELECT a.day, b.sales - a.sales AS daily_change
FROM daily a
JOIN daily b ON b.day = a.day + INTERVAL '1 day';
```

Modern SQL often replaces this with **window functions** (`LAG`/`LEAD`), which are
cleaner and faster, but the self-join technique is the classic approach and still
appears in interviews.

Question 24

What is the fan trap (row multiplication) in joins?

Accepted Answer

The fan trap occurs when you join one parent to **two different one-to-many**
child tables. Their rows multiply against each other (a partial Cartesian
product), inflating aggregates like `SUM`.

```sql
-- orders × shipments multiply; SUM(amount) is overcounted
SELECT o.id, SUM(p.amount), SUM(s.weight)
FROM orders o
JOIN payments p  ON p.order_id = o.id
JOIN shipments s ON s.order_id = o.id
GROUP BY o.id;
```

Fix by **pre-aggregating** each child in its own subquery before joining, so each
contributes a single row per parent. Always sanity-check counts when joining
multiple one-to-many tables.

Question 25

Why and how do you pre-aggregate before joining?

Accepted Answer

Pre-aggregating collapses a one-to-many child to **one row per key** in a subquery,
so the subsequent join doesn't multiply rows or distort aggregates (avoiding the
fan trap and double counting).

```sql
SELECT u.name, o.order_count, o.total
FROM users u
LEFT JOIN (
  SELECT user_id, COUNT(*) AS order_count, SUM(amount) AS total
  FROM orders GROUP BY user_id
) o ON o.user_id = u.id;
```

Now each user joins to exactly one aggregated order row. This pattern also lets you
combine **multiple** independent aggregates without them multiplying together.

Question 26

How do indexes affect join performance?

Accepted Answer

Joins match rows on the `ON` columns, so an **index on the join key** (typically
the foreign key) lets the database find matches quickly instead of scanning the
whole table. Without it, joins degrade to slow full scans.

```sql
CREATE INDEX idx_orders_user_id ON orders(user_id); -- speeds up the join
SELECT * FROM users u JOIN orders o ON o.user_id = u.id;
```

Primary keys are indexed automatically, but **foreign keys often aren't** — a
frequent cause of slow joins. Index both sides of frequent join conditions, and
composite indexes for multi-column joins.

Question 27

What join algorithms do databases use?

Accepted Answer

The optimizer picks among three physical strategies based on data size and
indexes:

- **Nested loop join** — for each row in one table, look up matches in the other.
  Great with an index on the inner table; small inputs.
- **Hash join** — build a hash table on one input, probe with the other. Best for
  large, unindexed equi-joins.
- **Merge join** — sort both inputs on the key, then merge. Efficient when inputs
  are already sorted/indexed.

```sql
EXPLAIN SELECT * FROM users u JOIN orders o ON o.user_id = u.id;
-- shows which join algorithm the planner chose
```

You don't pick directly, but understanding them (and reading `EXPLAIN`) explains
why a query is slow.

Question 28

How do you handle NULLs from outer joins in the output?

Accepted Answer

Outer joins produce `NULL`s for unmatched rows. Use **`COALESCE`** to substitute a
default (0, '', 'N/A') so results are clean and aggregates behave.

```sql
SELECT u.name,
       COALESCE(o.amount, 0)       AS amount,
       COALESCE(o.status, 'none')  AS status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;
```

`COALESCE` returns the first non-NULL argument. It's essential after outer joins
and aggregates (`COALESCE(SUM(x), 0)`) so missing data shows as a sensible value
rather than `NULL`.

Question 29

How do you remove duplicate rows from a join result?

Accepted Answer

A join that matches one-to-many repeats the single-side rows. Options to dedupe:
`DISTINCT`, aggregation with `GROUP BY`, or restructuring to a semi-join
(`EXISTS`) when you don't need the joined columns.

```sql
-- DISTINCT removes exact duplicate rows
SELECT DISTINCT u.id, u.name
FROM users u JOIN orders o ON o.user_id = u.id;

-- better when you only want "users with orders":
SELECT u.id, u.name FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
```

Prefer `EXISTS` over `DISTINCT` when you're really filtering by existence —
`DISTINCT` does extra sorting/work to remove the duplicates the join created.

Question 30

What is a LATERAL join (or CROSS APPLY)?

Accepted Answer

A `LATERAL` join (Postgres) / `CROSS APPLY` (SQL Server) lets a subquery in the
`FROM` clause **reference columns from preceding tables** in the same `FROM` — so
it runs per outer row. Ideal for "top-N per group."

```sql
-- each user's 3 most recent orders
SELECT u.name, o.id, o.created_at
FROM users u
CROSS JOIN LATERAL (
  SELECT id, created_at FROM orders
  WHERE user_id = u.id            -- references the outer u
  ORDER BY created_at DESC LIMIT 3
) o;
```

Regular subqueries can't see outer `FROM` columns; `LATERAL` can, making
per-row/per-group derived tables possible.

Question 31

What happens when you chain outer joins across three tables?

Accepted Answer

Join order and type matter. A `LEFT JOIN` followed by an `INNER JOIN` on the
optional table can **drop** the preserved rows, because the inner join requires a
match the NULL-filled rows don't have.

```sql
-- the INNER JOIN re-filters out users with no orders
SELECT u.name, oi.qty
FROM users u
LEFT JOIN orders o      ON o.user_id = u.id
JOIN order_items oi     ON oi.order_id = o.id;  -- inner -> drops NULL o.id

-- keep it LEFT all the way down
LEFT JOIN order_items oi ON oi.order_id = o.id;
```

To preserve outer rows through a chain, every downstream join on the optional path
must also be an outer join.

Question 32

What is the difference between a JOIN and a UNION?

Accepted Answer

A **JOIN** combines tables **horizontally** — adding columns by matching rows. A
**UNION** combines result sets **vertically** — stacking rows from queries that
have the **same columns**.

```sql
-- JOIN: wider rows (user + their order)
SELECT u.name, o.amount FROM users u JOIN orders o ON o.user_id = u.id;

-- UNION: more rows (current + archived orders)
SELECT id, amount FROM orders
UNION ALL
SELECT id, amount FROM archived_orders;
```

`UNION` dedupes; `UNION ALL` keeps duplicates (and is faster). Use JOIN to relate
tables, UNION to append similar datasets.

Question 33

How do you join rows within a date or value range?

Accepted Answer

Use a non-equi join with `BETWEEN` or comparison operators in `ON` — matching each
row to all rows of the other table that fall within a range. Common for
time-windows, price tiers, and IP-range lookups.

```sql
-- attribute each event to the active campaign window it falls in
SELECT e.id, c.name
FROM events e
JOIN campaigns c
  ON e.occurred_at BETWEEN c.start_at AND c.end_at;
```

Range joins can match multiple rows and don't use simple hash joins, so they're
heavier — index the range columns, and ensure ranges don't unintentionally overlap
(which multiplies rows).

Joins Interview Questions & Answers