Skip to content

Set Operations Interview Questions & Answers

15 questions Updated 2026-06-20 Share:

SQL set operation interview questions — UNION vs UNION ALL, INTERSECT, EXCEPT, column compatibility rules, deduplication cost and dialect differences.

Read the in-depth guideSQL Set Operations — UNION, INTERSECT, and EXCEPT Explained(opens in new tab)
15 of 15

Set operations combine the results of two or more SELECT queries vertically — stacking or comparing rows rather than joining columns. The three are UNION (combine), INTERSECT (common rows), and EXCEPT/MINUS (difference).

SELECT id FROM current_users
UNION
SELECT id FROM archived_users;   -- all ids from either table

Both queries must produce compatible columns (same count, compatible types). Set operations work on whole rows, like mathematical set algebra.

Rule of thumb: joins combine tables side-by-side; set operations stack them top-to-bottom.

UNION combines the rows of both queries and removes duplicates; UNION ALL keeps every row, including duplicates. Because dedup requires a sort/hash, UNION is slower.

-- deduped: a user in both lists appears once
SELECT email FROM newsletter UNION SELECT email FROM customers;

-- all rows kept; faster, but duplicates remain
SELECT email FROM newsletter UNION ALL SELECT email FROM customers;

Use UNION ALL when you know the inputs are disjoint or duplicates are fine — it skips the expensive dedup step.

Rule of thumb: default to UNION ALL unless you specifically need duplicates removed.

Each query must have the same number of columns, in the same order, with compatible data types positionally. Column names needn't match — the result takes its names from the first query.

SELECT id, name        FROM employees
UNION ALL
SELECT id, full_name   FROM contractors;   -- result columns: id, name

Type mismatches either error or force an implicit cast. If a query has fewer columns, add literals/NULLs to line them up.

Rule of thumb: align column count, order, and types; names come from the first query.

INTERSECT returns only the rows that appear in both result sets, deduplicated. It's the set intersection of the two queries.

-- users who are both newsletter subscribers AND customers
SELECT email FROM newsletter
INTERSECT
SELECT email FROM customers;

Like UNION, it dedupes by default; INTERSECT ALL (where supported) keeps the minimum count of duplicates. MySQL gained INTERSECT in 8.0.31; older versions emulate it with IN/EXISTS.

Rule of thumb: INTERSECT = "rows present in both queries."

EXCEPT (Postgres/SQL Server/standard) and MINUS (Oracle) return rows from the first query that are not in the second — set difference. They're the same operation under two names.

-- users who signed up but never ordered
SELECT id FROM users
EXCEPT
SELECT user_id FROM orders;

Order matters: A EXCEPT BB EXCEPT A. It dedupes by default; EXCEPT ALL keeps duplicate multiplicity. It's a clean way to express an anti-join when you only need the keys.

Rule of thumb: EXCEPT/MINUS = "rows in the first query but not the second."

For deduplication, set operations treat two NULLs as equal (unlike =, where NULL = NULL is unknown). So UNION/INTERSECT/EXCEPT collapse duplicate NULL-containing rows.

-- the two NULL rows are treated as duplicates and collapse to one
SELECT NULL AS x UNION SELECT NULL;   -- returns a single NULL row

This "NULL-as-equal" rule is why EXCEPT works as an anti-join even with NULLs, whereas NOT IN famously breaks on them.

Rule of thumb: set operations consider NULLs equal for dedup; row comparisons (=) don't.

Apply ORDER BY / LIMIT to the whole combined result, once, at the very end — not to the individual SELECTs. The sort refers to the result columns (by name or position).

SELECT name, created_at FROM users
UNION ALL
SELECT name, created_at FROM archived_users
ORDER BY created_at DESC
LIMIT 20;

To order within a branch (e.g. limit each side), wrap each SELECT in a subquery or CTE first.

Rule of thumb: one trailing ORDER BY/LIMIT applies to the entire set result.

Use UNION to append rows from sources with the same shape (current + archived orders, multiple regions' tables). Use a JOIN to add columns by relating rows across tables on a key.

-- UNION: more rows, same columns
SELECT id, total FROM orders_2025
UNION ALL
SELECT id, total FROM orders_2026;

A telltale sign you want UNION is "combine these similarly-structured datasets into one list."

Rule of thumb: same columns, more rows → UNION; related tables, more columns → JOIN.

UNION must deduplicate the combined result, which means sorting or hashing all rows — extra CPU and memory. UNION ALL simply concatenates the inputs and streams them out, with no dedup step.

-- no dedup work; fastest when you know rows can't overlap
SELECT id FROM a UNION ALL SELECT id FROM b;

On large datasets the difference is significant. Only pay for UNION when duplicates are actually possible and unwanted.

Rule of thumb: prefer UNION ALL and only upgrade to UNION when dedup is truly needed.

Where INTERSECT/EXCEPT aren't available (older MySQL), use IN/EXISTS for intersection and NOT EXISTS/LEFT JOIN ... IS NULL for difference.

-- INTERSECT emulation
SELECT DISTINCT email FROM newsletter n
WHERE EXISTS (SELECT 1 FROM customers c WHERE c.email = n.email);

-- EXCEPT emulation
SELECT DISTINCT id FROM users u
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

Prefer EXISTS/NOT EXISTS over IN/NOT IN here to stay NULL-safe.

Rule of thumb: emulate INTERSECT with EXISTS, EXCEPT with NOT EXISTS.

The SQL standard gives INTERSECT higher precedence than UNION and EXCEPT, so INTERSECT binds first. When mixing them, use parentheses to make the intended grouping explicit.

-- ambiguous to readers — parenthesize instead
(SELECT id FROM a UNION SELECT id FROM b)
EXCEPT
SELECT id FROM c;

Different databases have followed the rule inconsistently over the years, so never rely on implicit precedence in a chain.

Rule of thumb: parenthesize any query that mixes UNION/INTERSECT/EXCEPT.

No — UNION and UNION DISTINCT are identical; DISTINCT is just the explicit default. Some teams write UNION DISTINCT to make the dedup intent obvious next to UNION ALL.

SELECT id FROM a UNION DISTINCT SELECT id FROM b;  -- same as plain UNION

Likewise INTERSECT/EXCEPT default to DISTINCT, with optional ALL variants.

Rule of thumb: UNION already means UNION DISTINCT; write ALL when you want duplicates.

Add a literal column in each branch to label the source. After combining, that column tells you where each row originated.

SELECT id, total, 'online'  AS channel FROM online_orders
UNION ALL
SELECT id, total, 'instore' AS channel FROM instore_orders;

This is handy for merging similar feeds while keeping provenance, and lets you later filter or group by channel.

Rule of thumb: add a constant label column per branch to track each row's source.

They overlap but aren't the same. UNION dedups across two queries; for a single query, SELECT DISTINCT dedups its rows. SELECT ... UNION SELECT ... with one branch is just a slow DISTINCT.

-- these return the same rows; the second is clearer
SELECT city FROM users UNION SELECT city FROM users;
SELECT DISTINCT city FROM users;

Use DISTINCT for single-query dedup and reserve UNION for combining separate result sets.

Rule of thumb: dedup one query with DISTINCT; combine queries with UNION.

The database tries to find a common type and implicitly casts both sides to it (e.g. int and numericnumeric). If no safe common type exists (text vs date), it raises an error.

-- int + numeric unify to numeric; fine
SELECT 1 AS n UNION ALL SELECT 2.5;

-- text + date: cast explicitly to avoid surprises/errors
SELECT created_at::text FROM a UNION ALL SELECT label FROM b;

Implicit coercion can also change precision/formatting unexpectedly, so cast explicitly when the types aren't obviously compatible.

Rule of thumb: cast mismatched columns explicitly so the unified type is intentional.

More ways to practice

The self-quiz is live. Get notified when mock interviews and new question packs drop.

or
Join our WhatsApp Channel