Skip to content

Aggregation & GROUP BY Interview Questions & Answers

17 questions Updated 2026-06-20 Share:

SQL aggregation interview questions — COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING, NULL handling, COUNT(*) vs COUNT(col) and conditional aggregation.

Read the in-depth guideSQL Aggregation — GROUP BY, HAVING, and Aggregate Functions(opens in new tab)
17 of 17

An aggregate function collapses many rows into a single valueCOUNT, SUM, AVG, MIN, MAX. Used alone, they summarize the whole table; with GROUP BY, they summarize each group.

SELECT COUNT(*)   AS total_orders,
       SUM(total) AS revenue,
       AVG(total) AS avg_order
FROM orders;

Aggregates run after WHERE (on the filtered rows) and before HAVING. Without GROUP BY, an aggregate over the whole result returns exactly one row.

Rule of thumb: aggregates turn a set of rows into one summary value.

GROUP BY partitions rows into groups that share the same values in the grouping columns, then computes one aggregate result per group. The result has one row per distinct group.

SELECT user_id, COUNT(*) AS orders, SUM(total) AS spent
FROM orders
GROUP BY user_id;          -- one row per user

You can group by multiple columns (GROUP BY country, city) — groups are then the distinct combinations. Grouping happens after WHERE filters the rows.

Rule of thumb: GROUP BY defines what "one row of the result" means; aggregates summarize each.

In standard SQL, every column in the SELECT list must either be inside an aggregate or listed in GROUP BY. Otherwise the column has many values per group and the database can't pick one.

-- ERROR in standard SQL: name isn't grouped or aggregated
SELECT user_id, name, COUNT(*) FROM orders GROUP BY user_id;

-- fix: group by it too, or aggregate it
SELECT user_id, MAX(name) AS name, COUNT(*) FROM orders GROUP BY user_id;

MySQL historically allowed this (returning an arbitrary value) but now rejects it by default under ONLY_FULL_GROUP_BY. Postgres allows ungrouped columns only if they're functionally dependent on the primary key.

Rule of thumb: if it's in SELECT and not aggregated, it must be in GROUP BY.

COUNT(*) counts rows; COUNT(column) counts rows where that column is not NULL. The difference shows up whenever the column has NULLs.

SELECT COUNT(*)        AS rows,        -- every row
       COUNT(phone)    AS with_phone,  -- rows where phone IS NOT NULL
       COUNT(DISTINCT country) AS countries
FROM users;

COUNT(DISTINCT col) counts distinct non-NULL values. This matters after outer joins, where COUNT(*) would count the NULL-filled placeholder row.

Rule of thumb: COUNT(*) for rows, COUNT(col) for non-NULL values, COUNT(DISTINCT col) for unique values.

All aggregates ignore NULLs (except COUNT(*)). SUM, AVG, MIN, MAX, and COUNT(col) skip rows where the value is NULL — they don't treat NULL as zero.

-- AVG divides by the count of NON-NULL scores, not all rows
SELECT AVG(score) FROM tests;       -- NULL scores excluded entirely

This is usually right, but watch AVG: if you want NULLs counted as 0, convert them first with AVG(COALESCE(score, 0)). SUM of all-NULL (or no) rows returns NULL, not 0.

Rule of thumb: aggregates skip NULLs; use COALESCE first if NULLs should count as zero.

HAVING filters groups after aggregation, the way WHERE filters rows before it. It's the only place you can filter on an aggregate's result.

SELECT user_id, COUNT(*) AS orders
FROM orders
GROUP BY user_id
HAVING COUNT(*) >= 10;     -- only users with 10+ orders

HAVING can reference aggregates and grouping columns. Put per-row conditions in WHERE (cheaper, runs first) and reserve HAVING for conditions on the aggregates.

Rule of thumb: filter rows in WHERE, filter aggregated groups in HAVING.

WHERE runs before grouping and aggregation, so the aggregate values don't exist yet. Referencing COUNT()/SUM() in WHERE is an error; those belong in HAVING, which runs after.

-- ERROR: aggregate not allowed in WHERE
SELECT user_id FROM orders WHERE COUNT(*) > 5 GROUP BY user_id;

-- correct
SELECT user_id FROM orders GROUP BY user_id HAVING COUNT(*) > 5;

The logical order WHERE → GROUP BY → HAVING is the whole reason for the split.

Rule of thumb: aggregate condition → HAVING; raw-column condition → WHERE.

Wrap a CASE inside the aggregate so it only counts/sums rows meeting a condition. This produces a pivot — multiple conditional columns in one pass.

SELECT user_id,
       COUNT(*) FILTER (WHERE status = 'paid')      AS paid,     -- Postgres
       SUM(CASE WHEN status = 'refunded' THEN 1 ELSE 0 END) AS refunded
FROM orders
GROUP BY user_id;

The portable form is SUM(CASE WHEN ... THEN 1 ELSE 0 END); Postgres/SQLite offer the cleaner FILTER (WHERE ...) clause. One scan yields several conditional totals.

Rule of thumb: CASE inside an aggregate (or FILTER) turns rows into pivoted columns.

If the column is an integer type, some databases compute AVG (or the underlying SUM/count division) using integer arithmetic and truncate the fraction. You get 3 instead of 3.5.

-- cast to numeric to keep the fraction
SELECT AVG(rating)          AS maybe_truncated,
       AVG(rating::numeric) AS exact            -- Postgres cast
FROM reviews;

Postgres's AVG actually returns numeric for integer input, but manual SUM(x)/ COUNT(x) will truncate. MySQL/SQL Server can truncate depending on types.

Rule of thumb: cast integer columns to decimal before averaging or manually dividing.

List several columns in GROUP BY; groups become the distinct combinations of those columns. The result has one row per unique combination.

SELECT country, city, COUNT(*) AS users
FROM users
GROUP BY country, city
ORDER BY country, users DESC;

Adding a column to GROUP BY makes the groups finer (more, smaller groups). Every grouped column may appear bare in SELECT.

Rule of thumb: grouping by more columns = more, smaller groups.

Use COUNT(DISTINCT col), which counts unique non-NULL values. You can combine it with grouping to count distinct values per group.

-- distinct products bought by each user
SELECT user_id, COUNT(DISTINCT product_id) AS unique_products
FROM order_items
GROUP BY user_id;

COUNT(DISTINCT ...) is more expensive than COUNT(*) because it must deduplicate. For huge tables, approximate counts (APPROX_COUNT_DISTINCT, Postgres HLL) trade accuracy for speed.

Rule of thumb: COUNT(DISTINCT col) for unique counts; consider approximate variants at scale.

MIN/MAX return the smallest/largest value by the type's natural ordering: numeric order for numbers, chronological for dates, and lexicographic for strings. They ignore NULLs.

SELECT MIN(created_at) AS first_signup,
       MAX(total)      AS biggest_order,
       MIN(name)       AS alphabetically_first
FROM orders;

A common pattern is MAX(created_at) per group to find the latest event time — but that only gives the time, not the whole row (use a window function or DISTINCT ON for that).

Rule of thumb: MIN/MAX give the extreme value, not the row it came from.

SUM over zero rows (or all-NULL values) returns NULL, not 0, because there's nothing to add. This bites after filters or outer joins that leave a group empty.

-- returns NULL if the user has no matching orders
SELECT COALESCE(SUM(total), 0) AS spent
FROM orders
WHERE user_id = 42 AND status = 'paid';

Wrap with COALESCE(SUM(x), 0) whenever a missing/empty group should read as zero.

Rule of thumb: COALESCE(SUM(x), 0) to turn "no rows" into a 0 instead of NULL.

Yes — GROUP BY accepts expressions, not just bare columns. This is how you bucket continuous values (by month, by range, by derived category).

-- orders per month
SELECT DATE_TRUNC('month', created_at) AS month, COUNT(*) AS n
FROM orders
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;

Repeat the same expression in GROUP BY and SELECT. Some dialects let you group by the alias or position number, but repeating the expression is the portable form.

Rule of thumb: group by the same expression you select to bucket continuous data.

They compute multiple grouping levels in one query, adding subtotal/grand-total rows. ROLLUP makes hierarchical subtotals; CUBE makes every combination; GROUPING SETS lists exactly the groupings you want.

-- subtotals per (country, city), per country, and a grand total
SELECT country, city, SUM(total)
FROM sales
GROUP BY ROLLUP (country, city);

The subtotal rows have NULL in the rolled-up columns; GROUPING() distinguishes a "real NULL" from a subtotal marker.

Rule of thumb: ROLLUP/CUBE add subtotal rows without multiple UNIONed queries.

FILTER (WHERE condition) restricts an aggregate to rows matching the condition — the standard, readable alternative to SUM(CASE WHEN ...). Supported in Postgres and SQLite.

SELECT
  COUNT(*)                               AS total,
  COUNT(*) FILTER (WHERE status = 'paid') AS paid,
  AVG(total) FILTER (WHERE total > 0)     AS avg_nonzero
FROM orders;

Each aggregate can have its own FILTER, so one query produces several differently filtered metrics. Where unsupported, fall back to CASE inside the aggregate.

Rule of thumb: prefer FILTER (WHERE ...) over CASE-inside-aggregate where your database supports it.

Joining a parent to a one-to-many child multiplies the parent's rows, so SUM/COUNT over the joined result double-counts. Pre-aggregate the child in a subquery first, then join.

SELECT u.name, o.order_count, o.revenue
FROM users u
LEFT JOIN (
  SELECT user_id, COUNT(*) AS order_count, SUM(total) AS revenue
  FROM orders GROUP BY user_id
) o ON o.user_id = u.id;

Joining two different one-to-many children to the same parent is the classic "fan trap" that inflates sums — pre-aggregate each child separately.

Rule of thumb: pre-aggregate one-to-many children before joining to avoid inflated totals.

More ways to practice

The self-quiz is live. Get notified when mock interviews and new question packs drop.

or
Join our WhatsApp Channel