An aggregate function collapses many rows into a single value —
COUNT, SUM, AVG, MIN, MAX. Used alone, they summarize the whole table;
with GROUP BY, they summarize each group.
SELECT COUNT(*) AS total_orders,
SUM(total) AS revenue,
AVG(total) AS avg_order
FROM orders;
Aggregates run after WHERE (on the filtered rows) and before HAVING.
Without GROUP BY, an aggregate over the whole result returns exactly one row.
Rule of thumb: aggregates turn a set of rows into one summary value.
GROUP BY partitions rows into groups that share the same values in the grouping
columns, then computes one aggregate result per group. The result has one row per
distinct group.
SELECT user_id, COUNT(*) AS orders, SUM(total) AS spent
FROM orders
GROUP BY user_id; -- one row per user
You can group by multiple columns (GROUP BY country, city) — groups are then the
distinct combinations. Grouping happens after WHERE filters the rows.
Rule of thumb: GROUP BY defines what "one row of the result" means; aggregates
summarize each.
In standard SQL, every column in the SELECT list must either be inside an
aggregate or listed in GROUP BY. Otherwise the column has many values per
group and the database can't pick one.
-- ERROR in standard SQL: name isn't grouped or aggregated
SELECT user_id, name, COUNT(*) FROM orders GROUP BY user_id;
-- fix: group by it too, or aggregate it
SELECT user_id, MAX(name) AS name, COUNT(*) FROM orders GROUP BY user_id;
MySQL historically allowed this (returning an arbitrary value) but now rejects it by
default under ONLY_FULL_GROUP_BY. Postgres allows ungrouped columns only if they're
functionally dependent on the primary key.
Rule of thumb: if it's in SELECT and not aggregated, it must be in GROUP BY.
COUNT(*) counts rows; COUNT(column) counts rows where that column is not
NULL. The difference shows up whenever the column has NULLs.
SELECT COUNT(*) AS rows, -- every row
COUNT(phone) AS with_phone, -- rows where phone IS NOT NULL
COUNT(DISTINCT country) AS countries
FROM users;
COUNT(DISTINCT col) counts distinct non-NULL values. This matters after outer
joins, where COUNT(*) would count the NULL-filled placeholder row.
Rule of thumb: COUNT(*) for rows, COUNT(col) for non-NULL values, COUNT(DISTINCT col) for unique values.
All aggregates ignore NULLs (except COUNT(*)). SUM, AVG, MIN, MAX, and
COUNT(col) skip rows where the value is NULL — they don't treat NULL as zero.
-- AVG divides by the count of NON-NULL scores, not all rows
SELECT AVG(score) FROM tests; -- NULL scores excluded entirely
This is usually right, but watch AVG: if you want NULLs counted as 0, convert them
first with AVG(COALESCE(score, 0)). SUM of all-NULL (or no) rows returns NULL,
not 0.
Rule of thumb: aggregates skip NULLs; use COALESCE first if NULLs should count as
zero.
HAVING filters groups after aggregation, the way WHERE filters rows before
it. It's the only place you can filter on an aggregate's result.
SELECT user_id, COUNT(*) AS orders
FROM orders
GROUP BY user_id
HAVING COUNT(*) >= 10; -- only users with 10+ orders
HAVING can reference aggregates and grouping columns. Put per-row conditions in
WHERE (cheaper, runs first) and reserve HAVING for conditions on the aggregates.
Rule of thumb: filter rows in WHERE, filter aggregated groups in HAVING.
WHERE runs before grouping and aggregation, so the aggregate values don't
exist yet. Referencing COUNT()/SUM() in WHERE is an error; those belong in
HAVING, which runs after.
-- ERROR: aggregate not allowed in WHERE
SELECT user_id FROM orders WHERE COUNT(*) > 5 GROUP BY user_id;
-- correct
SELECT user_id FROM orders GROUP BY user_id HAVING COUNT(*) > 5;
The logical order WHERE → GROUP BY → HAVING is the whole reason for the split.
Rule of thumb: aggregate condition → HAVING; raw-column condition → WHERE.
Wrap a CASE inside the aggregate so it only counts/sums rows meeting a condition.
This produces a pivot — multiple conditional columns in one pass.
SELECT user_id,
COUNT(*) FILTER (WHERE status = 'paid') AS paid, -- Postgres
SUM(CASE WHEN status = 'refunded' THEN 1 ELSE 0 END) AS refunded
FROM orders
GROUP BY user_id;
The portable form is SUM(CASE WHEN ... THEN 1 ELSE 0 END); Postgres/SQLite offer
the cleaner FILTER (WHERE ...) clause. One scan yields several conditional totals.
Rule of thumb: CASE inside an aggregate (or FILTER) turns rows into pivoted
columns.
If the column is an integer type, some databases compute AVG (or the underlying
SUM/count division) using integer arithmetic and truncate the fraction. You get
3 instead of 3.5.
-- cast to numeric to keep the fraction
SELECT AVG(rating) AS maybe_truncated,
AVG(rating::numeric) AS exact -- Postgres cast
FROM reviews;
Postgres's AVG actually returns numeric for integer input, but manual SUM(x)/ COUNT(x) will truncate. MySQL/SQL Server can truncate depending on types.
Rule of thumb: cast integer columns to decimal before averaging or manually dividing.
List several columns in GROUP BY; groups become the distinct combinations of
those columns. The result has one row per unique combination.
SELECT country, city, COUNT(*) AS users
FROM users
GROUP BY country, city
ORDER BY country, users DESC;
Adding a column to GROUP BY makes the groups finer (more, smaller groups).
Every grouped column may appear bare in SELECT.
Rule of thumb: grouping by more columns = more, smaller groups.
Use COUNT(DISTINCT col), which counts unique non-NULL values. You can combine
it with grouping to count distinct values per group.
-- distinct products bought by each user
SELECT user_id, COUNT(DISTINCT product_id) AS unique_products
FROM order_items
GROUP BY user_id;
COUNT(DISTINCT ...) is more expensive than COUNT(*) because it must deduplicate.
For huge tables, approximate counts (APPROX_COUNT_DISTINCT, Postgres HLL) trade
accuracy for speed.
Rule of thumb: COUNT(DISTINCT col) for unique counts; consider approximate
variants at scale.
MIN/MAX return the smallest/largest value by the type's natural ordering: numeric
order for numbers, chronological for dates, and lexicographic for strings. They
ignore NULLs.
SELECT MIN(created_at) AS first_signup,
MAX(total) AS biggest_order,
MIN(name) AS alphabetically_first
FROM orders;
A common pattern is MAX(created_at) per group to find the latest event time — but
that only gives the time, not the whole row (use a window function or
DISTINCT ON for that).
Rule of thumb: MIN/MAX give the extreme value, not the row it came from.
SUM over zero rows (or all-NULL values) returns NULL, not 0, because
there's nothing to add. This bites after filters or outer joins that leave a group
empty.
-- returns NULL if the user has no matching orders
SELECT COALESCE(SUM(total), 0) AS spent
FROM orders
WHERE user_id = 42 AND status = 'paid';
Wrap with COALESCE(SUM(x), 0) whenever a missing/empty group should read as zero.
Rule of thumb: COALESCE(SUM(x), 0) to turn "no rows" into a 0 instead of NULL.
Yes — GROUP BY accepts expressions, not just bare columns. This is how you bucket
continuous values (by month, by range, by derived category).
-- orders per month
SELECT DATE_TRUNC('month', created_at) AS month, COUNT(*) AS n
FROM orders
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;
Repeat the same expression in GROUP BY and SELECT. Some dialects let you group by
the alias or position number, but repeating the expression is the portable form.
Rule of thumb: group by the same expression you select to bucket continuous data.
They compute multiple grouping levels in one query, adding subtotal/grand-total
rows. ROLLUP makes hierarchical subtotals; CUBE makes every combination;
GROUPING SETS lists exactly the groupings you want.
-- subtotals per (country, city), per country, and a grand total
SELECT country, city, SUM(total)
FROM sales
GROUP BY ROLLUP (country, city);
The subtotal rows have NULL in the rolled-up columns; GROUPING() distinguishes a
"real NULL" from a subtotal marker.
Rule of thumb: ROLLUP/CUBE add subtotal rows without multiple UNIONed queries.
FILTER (WHERE condition) restricts an aggregate to rows matching the condition —
the standard, readable alternative to SUM(CASE WHEN ...). Supported in
Postgres and SQLite.
SELECT
COUNT(*) AS total,
COUNT(*) FILTER (WHERE status = 'paid') AS paid,
AVG(total) FILTER (WHERE total > 0) AS avg_nonzero
FROM orders;
Each aggregate can have its own FILTER, so one query produces several differently
filtered metrics. Where unsupported, fall back to CASE inside the aggregate.
Rule of thumb: prefer FILTER (WHERE ...) over CASE-inside-aggregate where your
database supports it.
Joining a parent to a one-to-many child multiplies the parent's rows, so
SUM/COUNT over the joined result double-counts. Pre-aggregate the child in a
subquery first, then join.
SELECT u.name, o.order_count, o.revenue
FROM users u
LEFT JOIN (
SELECT user_id, COUNT(*) AS order_count, SUM(total) AS revenue
FROM orders GROUP BY user_id
) o ON o.user_id = u.id;
Joining two different one-to-many children to the same parent is the classic "fan trap" that inflates sums — pre-aggregate each child separately.
Rule of thumb: pre-aggregate one-to-many children before joining to avoid inflated totals.
More Query Basics interview questions
More ways to practice
The self-quiz is live. Get notified when mock interviews and new question packs drop.