Set operations combine the results of two or more SELECT queries vertically —
stacking or comparing rows rather than joining columns. The three are UNION
(combine), INTERSECT (common rows), and EXCEPT/MINUS (difference).
SELECT id FROM current_users
UNION
SELECT id FROM archived_users; -- all ids from either table
Both queries must produce compatible columns (same count, compatible types). Set operations work on whole rows, like mathematical set algebra.
Rule of thumb: joins combine tables side-by-side; set operations stack them top-to-bottom.
UNION combines the rows of both queries and removes duplicates; UNION ALL
keeps every row, including duplicates. Because dedup requires a sort/hash,
UNION is slower.
-- deduped: a user in both lists appears once
SELECT email FROM newsletter UNION SELECT email FROM customers;
-- all rows kept; faster, but duplicates remain
SELECT email FROM newsletter UNION ALL SELECT email FROM customers;
Use UNION ALL when you know the inputs are disjoint or duplicates are fine — it
skips the expensive dedup step.
Rule of thumb: default to UNION ALL unless you specifically need duplicates
removed.
Each query must have the same number of columns, in the same order, with compatible data types positionally. Column names needn't match — the result takes its names from the first query.
SELECT id, name FROM employees
UNION ALL
SELECT id, full_name FROM contractors; -- result columns: id, name
Type mismatches either error or force an implicit cast. If a query has fewer
columns, add literals/NULLs to line them up.
Rule of thumb: align column count, order, and types; names come from the first query.
INTERSECT returns only the rows that appear in both result sets, deduplicated.
It's the set intersection of the two queries.
-- users who are both newsletter subscribers AND customers
SELECT email FROM newsletter
INTERSECT
SELECT email FROM customers;
Like UNION, it dedupes by default; INTERSECT ALL (where supported) keeps the
minimum count of duplicates. MySQL gained INTERSECT in 8.0.31; older versions
emulate it with IN/EXISTS.
Rule of thumb: INTERSECT = "rows present in both queries."
EXCEPT (Postgres/SQL Server/standard) and MINUS (Oracle) return rows from the
first query that are not in the second — set difference. They're the same
operation under two names.
-- users who signed up but never ordered
SELECT id FROM users
EXCEPT
SELECT user_id FROM orders;
Order matters: A EXCEPT B ≠ B EXCEPT A. It dedupes by default; EXCEPT ALL
keeps duplicate multiplicity. It's a clean way to express an anti-join when you only
need the keys.
Rule of thumb: EXCEPT/MINUS = "rows in the first query but not the second."
For deduplication, set operations treat two NULLs as equal (unlike =, where
NULL = NULL is unknown). So UNION/INTERSECT/EXCEPT collapse duplicate
NULL-containing rows.
-- the two NULL rows are treated as duplicates and collapse to one
SELECT NULL AS x UNION SELECT NULL; -- returns a single NULL row
This "NULL-as-equal" rule is why EXCEPT works as an anti-join even with NULLs,
whereas NOT IN famously breaks on them.
Rule of thumb: set operations consider NULLs equal for dedup; row comparisons (=)
don't.
Apply ORDER BY / LIMIT to the whole combined result, once, at the very end —
not to the individual SELECTs. The sort refers to the result columns (by name or
position).
SELECT name, created_at FROM users
UNION ALL
SELECT name, created_at FROM archived_users
ORDER BY created_at DESC
LIMIT 20;
To order within a branch (e.g. limit each side), wrap each SELECT in a subquery
or CTE first.
Rule of thumb: one trailing ORDER BY/LIMIT applies to the entire set result.
Use UNION to append rows from sources with the same shape (current +
archived orders, multiple regions' tables). Use a JOIN to add columns by
relating rows across tables on a key.
-- UNION: more rows, same columns
SELECT id, total FROM orders_2025
UNION ALL
SELECT id, total FROM orders_2026;
A telltale sign you want UNION is "combine these similarly-structured datasets into
one list."
Rule of thumb: same columns, more rows → UNION; related tables, more columns →
JOIN.
UNION must deduplicate the combined result, which means sorting or hashing all
rows — extra CPU and memory. UNION ALL simply concatenates the inputs and streams
them out, with no dedup step.
-- no dedup work; fastest when you know rows can't overlap
SELECT id FROM a UNION ALL SELECT id FROM b;
On large datasets the difference is significant. Only pay for UNION when duplicates
are actually possible and unwanted.
Rule of thumb: prefer UNION ALL and only upgrade to UNION when dedup is truly
needed.
Where INTERSECT/EXCEPT aren't available (older MySQL), use IN/EXISTS for
intersection and NOT EXISTS/LEFT JOIN ... IS NULL for difference.
-- INTERSECT emulation
SELECT DISTINCT email FROM newsletter n
WHERE EXISTS (SELECT 1 FROM customers c WHERE c.email = n.email);
-- EXCEPT emulation
SELECT DISTINCT id FROM users u
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
Prefer EXISTS/NOT EXISTS over IN/NOT IN here to stay NULL-safe.
Rule of thumb: emulate INTERSECT with EXISTS, EXCEPT with NOT EXISTS.
The SQL standard gives INTERSECT higher precedence than UNION and EXCEPT,
so INTERSECT binds first. When mixing them, use parentheses to make the
intended grouping explicit.
-- ambiguous to readers — parenthesize instead
(SELECT id FROM a UNION SELECT id FROM b)
EXCEPT
SELECT id FROM c;
Different databases have followed the rule inconsistently over the years, so never rely on implicit precedence in a chain.
Rule of thumb: parenthesize any query that mixes UNION/INTERSECT/EXCEPT.
No — UNION and UNION DISTINCT are identical; DISTINCT is just the explicit
default. Some teams write UNION DISTINCT to make the dedup intent obvious next to
UNION ALL.
SELECT id FROM a UNION DISTINCT SELECT id FROM b; -- same as plain UNION
Likewise INTERSECT/EXCEPT default to DISTINCT, with optional ALL variants.
Rule of thumb: UNION already means UNION DISTINCT; write ALL when you want
duplicates.
Add a literal column in each branch to label the source. After combining, that column tells you where each row originated.
SELECT id, total, 'online' AS channel FROM online_orders
UNION ALL
SELECT id, total, 'instore' AS channel FROM instore_orders;
This is handy for merging similar feeds while keeping provenance, and lets you later
filter or group by channel.
Rule of thumb: add a constant label column per branch to track each row's source.
They overlap but aren't the same. UNION dedups across two queries; for a
single query, SELECT DISTINCT dedups its rows. SELECT ... UNION SELECT ...
with one branch is just a slow DISTINCT.
-- these return the same rows; the second is clearer
SELECT city FROM users UNION SELECT city FROM users;
SELECT DISTINCT city FROM users;
Use DISTINCT for single-query dedup and reserve UNION for combining separate
result sets.
Rule of thumb: dedup one query with DISTINCT; combine queries with UNION.
The database tries to find a common type and implicitly casts both sides to it
(e.g. int and numeric → numeric). If no safe common type exists (text vs date),
it raises an error.
-- int + numeric unify to numeric; fine
SELECT 1 AS n UNION ALL SELECT 2.5;
-- text + date: cast explicitly to avoid surprises/errors
SELECT created_at::text FROM a UNION ALL SELECT label FROM b;
Implicit coercion can also change precision/formatting unexpectedly, so cast explicitly when the types aren't obviously compatible.
Rule of thumb: cast mismatched columns explicitly so the unified type is intentional.
More Query Basics interview questions
More ways to practice
The self-quiz is live. Get notified when mock interviews and new question packs drop.