SELECT reads rows from one or more tables and returns a result set. It has
two core jobs: projection (which columns to return, listed after SELECT)
and source (which table, named after FROM). Everything else — filtering,
grouping, ordering — refines that result.
-- project two columns from every row of users
SELECT id, name
FROM users;
SELECT is read-only: it never changes the data. SELECT * returns every
column, convenient for exploring but discouraged in production code because the
result silently changes when the table's columns do.
Rule of thumb: list the columns you actually need instead of SELECT *.
WHERE filters rows before they're returned, keeping only those for which the
condition evaluates to true. Rows that evaluate to false or unknown
(NULL comparisons) are dropped.
SELECT name, age
FROM users
WHERE age >= 18; -- only adults
WHERE runs after FROM (the rows exist) but **before GROUP BY,
SELECT, and ORDER BY. Because it runs before SELECT, you generally can't
reference a column alias defined in the SELECT list inside WHERE.
Rule of thumb: WHERE filters individual rows; use HAVING to filter groups.
The standard set: =, <> (or !=) for not-equal, <, >, <=, >=. They
work on numbers, strings (lexicographic order), and dates. Combine them with the
logical operators AND, OR, and NOT.
SELECT *
FROM orders
WHERE total > 100 AND status <> 'cancelled';
Comparisons against NULL are special — total > 100 is unknown when total
is NULL, so that row is excluded. Use IS NULL / IS NOT NULL for NULL tests.
Rule of thumb: <> is the SQL-standard not-equal; != works in most dialects but
isn't standard.
AND binds tighter than OR (like * vs + in arithmetic), so OR groups
are evaluated last. Mixing them without parentheses is a classic bug that quietly
returns the wrong rows.
-- WRONG: parsed as status='active' OR (status='trial' AND age > 18)
SELECT * FROM users WHERE status = 'active' OR status = 'trial' AND age > 18;
-- RIGHT: parenthesize the OR group
SELECT * FROM users WHERE (status = 'active' OR status = 'trial') AND age > 18;
Rule of thumb: when a WHERE clause mixes AND and OR, always add explicit
parentheses — don't rely on precedence.
DISTINCT removes duplicate rows from the result, based on all selected
columns together — not just the first one. Two rows are duplicates only if every
projected column matches.
-- unique cities
SELECT DISTINCT city FROM users;
-- unique (city, country) PAIRS, not unique cities
SELECT DISTINCT city, country FROM users;
DISTINCT requires sorting or hashing the result to find duplicates, so it has a
cost on large sets. If you're really testing for existence, EXISTS is often
cheaper than SELECT DISTINCT.
Rule of thumb: DISTINCT applies to the whole row, not to one column.
LIKE does pattern matching on strings with two wildcards: % matches any
sequence of characters (including none), and _ matches exactly one
character.
SELECT * FROM users
WHERE email LIKE '%@gmail.com' -- ends with @gmail.com
AND name LIKE 'A%' -- starts with A
AND code LIKE 'A_C'; -- A, any one char, C
Matching is case-sensitive in some databases (Postgres) and insensitive in others
(MySQL default); Postgres offers ILIKE for case-insensitive matching. A leading
% ('%term') usually can't use an index, making it slow on big tables.
Rule of thumb: avoid a leading wildcard if you need the query to use an index.
IN tests whether a value matches any item in a list (or subquery result). It's
shorthand for a chain of OR equality checks.
-- these are equivalent
SELECT * FROM orders WHERE status IN ('shipped', 'delivered');
SELECT * FROM orders WHERE status = 'shipped' OR status = 'delivered';
IN also accepts a subquery: WHERE user_id IN (SELECT id FROM vips). Beware the
NOT IN NULL trap — if the list/subquery contains a NULL, NOT IN returns
no rows at all. Prefer NOT EXISTS when NULLs are possible.
Rule of thumb: IN for a known list; NOT EXISTS instead of NOT IN when NULLs
might appear.
BETWEEN a AND b tests whether a value falls in a range, and it is inclusive of
both endpoints — equivalent to >= a AND <= b.
SELECT * FROM orders
WHERE total BETWEEN 100 AND 200; -- includes exactly 100 and 200
The inclusive upper bound is a trap with dates/timestamps: BETWEEN '2026-01-01' AND '2026-01-31' misses times on Jan 31 after midnight. For timestamps, prefer a
half-open range: >= '2026-01-01' AND < '2026-02-01'.
Rule of thumb: use BETWEEN for inclusive integer ranges; use >= ... < ... for
date/time ranges.
Use IS NULL and IS NOT NULL — never = NULL. In SQL, NULL means
"unknown," and any comparison with NULL (including NULL = NULL) evaluates to
unknown, which WHERE treats as not-true.
SELECT * FROM users WHERE deleted_at IS NULL; -- active users
SELECT * FROM users WHERE phone = NULL; -- BUG: returns nothing
This three-valued logic (true / false / unknown) is why NULL rows slip
through filters unexpectedly. Functions like COALESCE(col, default) let you treat
NULLs as a concrete value when comparing.
Rule of thumb: NULL tests always use IS NULL / IS NOT NULL.
An alias renames a column or table for the duration of the query, using AS
(optional for columns, conventional to omit for tables). Column aliases clean up
output and name computed expressions; table aliases shorten qualified references.
SELECT u.name AS customer,
o.total * 1.1 AS total_with_tax
FROM users u
JOIN orders o ON o.user_id = u.id;
Because aliases are assigned in the SELECT step (which runs after WHERE), you
usually can't use a column alias in WHERE — but you can in ORDER BY,
which runs last.
Rule of thumb: alias every computed column so the result has meaningful names.
Though you write SELECT first, the database evaluates clauses in this logical
order: FROM/JOIN → WHERE → GROUP BY → HAVING → SELECT → DISTINCT →
ORDER BY → LIMIT.
SELECT user_id, COUNT(*) AS n -- 5: projection + alias
FROM orders -- 1: source rows
WHERE total > 0 -- 2: filter rows
GROUP BY user_id -- 3: group
HAVING COUNT(*) > 5 -- 4: filter groups
ORDER BY n DESC -- 6: alias visible here
LIMIT 10; -- 7
This order explains the rules: WHERE can't see SELECT aliases (it runs first),
but ORDER BY can (it runs last), and HAVING can reference aggregates.
Rule of thumb: remember FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT.
Yes — the SELECT list can contain arithmetic, function calls, and concatenation,
not just bare columns. The expression is computed per row.
SELECT name,
price * quantity AS line_total,
ROUND(price * 0.2, 2) AS tax
FROM order_items;
Watch integer division in some databases: 5 / 2 returns 2, not 2.5,
unless an operand is a decimal/float (5.0 / 2). Cast when you need fractional
results.
Rule of thumb: cast to a decimal type before dividing integers if you want a fractional answer.
NOT negates a condition, but under three-valued logic NOT unknown is still
unknown — so negating a comparison that involves NULL doesn't "flip in" the
NULL rows you might expect.
-- rows where status IS NULL are excluded by BOTH of these
SELECT * FROM tasks WHERE status = 'done';
SELECT * FROM tasks WHERE NOT (status = 'done');
To include the NULLs, handle them explicitly: WHERE status <> 'done' OR status IS NULL. This surprises people expecting a query and its NOT to partition the table.
Rule of thumb: when negating a condition, decide explicitly what should happen to NULL rows.
Use a CASE expression — SQL's if/else inside a query. It evaluates WHEN
conditions top to bottom and returns the first match's value, or the ELSE (or
NULL if no ELSE).
SELECT name,
CASE
WHEN age < 13 THEN 'child'
WHEN age < 20 THEN 'teen'
ELSE 'adult'
END AS age_group
FROM users;
CASE works anywhere an expression is allowed — SELECT, WHERE, ORDER BY, even
inside aggregates like SUM(CASE WHEN ... THEN 1 ELSE 0 END) for conditional counts.
Rule of thumb: reach for CASE whenever you need branching logic inside a query.
Either normalize both sides with LOWER()/UPPER(), or use a dialect feature.
Forcing the case on both sides works everywhere but can defeat a plain index.
-- portable: compare in a single case
SELECT * FROM users WHERE LOWER(email) = LOWER('Alice@Example.com');
-- Postgres shortcut for patterns
SELECT * FROM users WHERE email ILIKE 'alice@%';
For performance on large tables, store a normalized column or build a functional
index on LOWER(email) so the comparison stays index-friendly.
Rule of thumb: normalize with LOWER() for portability; add a functional index if
it's a hot path.
Limit the row count: LIMIT n in Postgres/MySQL/SQLite, FETCH FIRST n ROWS ONLY
in the SQL standard / Oracle, and SELECT TOP n in SQL Server.
SELECT * FROM users LIMIT 10; -- Postgres/MySQL
SELECT * FROM users FETCH FIRST 10 ROWS ONLY; -- standard
SELECT TOP 10 * FROM users; -- SQL Server
Without an ORDER BY, the rows returned are arbitrary — the database may return
any 10. Add ORDER BY for a deterministic preview.
Rule of thumb: LIMIT without ORDER BY gives unpredictable rows.
WHERE filters individual rows before grouping; HAVING filters groups after
aggregation. Because WHERE runs first, it can't reference aggregate functions
like COUNT(); HAVING can.
SELECT user_id, COUNT(*) AS orders
FROM orders
WHERE total > 0 -- drop junk rows first (per-row)
GROUP BY user_id
HAVING COUNT(*) > 5; -- keep only busy users (per-group)
Filtering early in WHERE is also cheaper — fewer rows reach the grouping step.
Rule of thumb: filter raw rows in WHERE, filter aggregates in HAVING.
More Query Basics interview questions
More ways to practice
The self-quiz is live. Get notified when mock interviews and new question packs drop.