Question 1

What is an aggregate function?

Accepted Answer

An aggregate function collapses many rows into a single value — , , , , . Used alone, they summarize the whole table; with , they summarize each group. Aggregates run after  (on the filtered rows) and before . Without , an aggregate over the whole result returns exactly one row. Rule of thumb: aggregates turn a set of rows into one summary value.

Question 2

What does GROUP BY do?

Accepted Answer

partitions rows into groups that share the same values in the grouping columns, then computes one aggregate result per group. The result has one row per distinct group. You can group by multiple columns () — groups are then the distinct combinations. Grouping happens after  filters the rows. Rule of thumb:  defines what "one row of the result" means; aggregates summarize each.

Question 3

Why must non-aggregated SELECT columns appear in GROUP BY?

Accepted Answer

In standard SQL, every column in the  list must either be inside an aggregate or listed in . Otherwise the column has many values per group and the database can't pick one. MySQL historically allowed this (returning an arbitrary value) but now rejects it by default under . Postgres allows ungrouped columns only if they're functionally dependent on the primary key. Rule of thumb: if it's in  and not aggregated, it must be in .

Question 4

What is the difference between COUNT(*) and COUNT(column)?

Accepted Answer

counts rows;  counts rows where that column is not NULL. The difference shows up whenever the column has NULLs.  counts distinct non-NULL values. This matters after outer joins, where  would count the NULL-filled placeholder row. Rule of thumb:  for rows,  for non-NULL values,  for unique values.

Question 5

How do aggregate functions handle NULLs?

Accepted Answer

All aggregates ignore NULLs (except ). , , , , and  skip rows where the value is NULL — they don't treat NULL as zero. This is usually right, but watch : if you want NULLs counted as 0, convert them first with .  of all-NULL (or no) rows returns , not 0. Rule of thumb: aggregates skip NULLs; use  first if NULLs should count as zero.

Question 6

What does the HAVING clause do?

Accepted Answer

filters groups after aggregation, the way  filters rows before it. It's the only place you can filter on an aggregate's result.  can reference aggregates and grouping columns. Put per-row conditions in  (cheaper, runs first) and reserve  for conditions on the aggregates. Rule of thumb: filter rows in , filter aggregated groups in .

Question 7

Why can't you use an aggregate in WHERE?

Accepted Answer

runs before grouping and aggregation, so the aggregate values don't exist yet. Referencing / in  is an error; those belong in , which runs after. The logical order  is the whole reason for the split. Rule of thumb: aggregate condition → ; raw-column condition → .

Question 8

How do you aggregate conditionally (pivot)?

Accepted Answer

Wrap a  inside the aggregate so it only counts/sums rows meeting a condition. This produces a pivot — multiple conditional columns in one pass. The portable form is ; Postgres/SQLite offer the cleaner  clause. One scan yields several conditional totals. Rule of thumb:  inside an aggregate (or ) turns rows into pivoted columns.

Question 9

Why might AVG return a truncated value?

Accepted Answer

If the column is an integer type, some databases compute  (or the underlying /count division) using integer arithmetic and truncate the fraction. You get  instead of . Postgres's  actually returns numeric for integer input, but manual  will truncate. MySQL/SQL Server can truncate depending on types. Rule of thumb: cast integer columns to decimal before averaging or manually dividing.

Question 10

How do you group by more than one column?

Accepted Answer

List several columns in ; groups become the distinct combinations of those columns. The result has one row per unique combination. Adding a column to  makes the groups finer (more, smaller groups). Every grouped column may appear bare in . Rule of thumb: grouping by more columns = more, smaller groups.

Question 11

How do you count distinct values?

Accepted Answer

Use , which counts unique non-NULL values. You can combine it with grouping to count distinct values per group.  is more expensive than  because it must deduplicate. For huge tables, approximate counts (, Postgres ) trade accuracy for speed. Rule of thumb:  for unique counts; consider approximate variants at scale.

Question 12

What do MIN and MAX return on different types?

Accepted Answer

/ return the smallest/largest value by the type's natural ordering: numeric order for numbers, chronological for dates, and lexicographic for strings. They ignore NULLs. A common pattern is  per group to find the latest event time — but that only gives the time, not the whole row (use a window function or  for that). Rule of thumb: / give the extreme value, not the row it came from.

Question 13

Why does SUM sometimes return NULL instead of 0?

Accepted Answer

over zero rows (or all-NULL values) returns , not , because there's nothing to add. This bites after filters or outer joins that leave a group empty. Wrap with  whenever a missing/empty group should read as zero. Rule of thumb:  to turn "no rows" into a 0 instead of NULL.

Question 14

Can you group by a computed expression?

Accepted Answer

Yes —  accepts expressions, not just bare columns. This is how you bucket continuous values (by month, by range, by derived category). Repeat the same expression in  and . Some dialects let you group by the alias or position number, but repeating the expression is the portable form. Rule of thumb: group by the same expression you select to bucket continuous data.

Question 15

What do GROUPING SETS, ROLLUP and CUBE do?

Accepted Answer

They compute multiple grouping levels in one query, adding subtotal/grand-total rows.  makes hierarchical subtotals;  makes every combination;  lists exactly the groupings you want. The subtotal rows have  in the rolled-up columns;  distinguishes a "real NULL" from a subtotal marker. Rule of thumb: / add subtotal rows without multiple ed queries.

Question 16

What is the FILTER clause on an aggregate?

Accepted Answer

restricts an aggregate to rows matching the condition — the standard, readable alternative to . Supported in Postgres and SQLite. Each aggregate can have its own , so one query produces several differently filtered metrics. Where unsupported, fall back to  inside the aggregate. Rule of thumb: prefer  over -inside-aggregate where your database supports it.

Question 17

How do you avoid double-counting when aggregating over joins?

Accepted Answer

Joining a parent to a one-to-many child multiplies the parent's rows, so SUM/COUNT over the joined result double-counts. Pre-aggregate the child in a subquery first, then join.

SELECT u.name, o.order_count, o.revenue
FROM users u
LEFT JOIN (
  SELECT user_id, COUNT(*) AS order_count, SUM(total) AS revenue
  FROM orders GROUP BY user_id
) o ON o.user_id = u.id;

Joining two different one-to-many children to the same parent is the classic "fan trap" that inflates sums — pre-aggregate each child separately.

Rule of thumb: pre-aggregate one-to-many children before joining to avoid inflated totals.

Aggregation & GROUP BY Interview Questions & Answers

What is an aggregate function?

What does GROUP BY do?

Why must non-aggregated SELECT columns appear in GROUP BY?

What is the difference between COUNT(*) and COUNT(column)?

How do aggregate functions handle NULLs?

What does the HAVING clause do?

Why can't you use an aggregate in WHERE?

How do you aggregate conditionally (pivot)?

Why might AVG return a truncated value?

How do you group by more than one column?

How do you count distinct values?

What do MIN and MAX return on different types?

Why does SUM sometimes return NULL instead of 0?

Can you group by a computed expression?

What do GROUPING SETS, ROLLUP and CUBE do?

What is the FILTER clause on an aggregate?

How do you avoid double-counting when aggregating over joins?

More ways to practice

What is an aggregate function?

What does GROUP BY do?

Why must non-aggregated SELECT columns appear in GROUP BY?

What is the difference between COUNT(*) and COUNT(column)?

How do aggregate functions handle NULLs?

What does the HAVING clause do?

Why can't you use an aggregate in WHERE?

How do you aggregate conditionally (pivot)?

Why might AVG return a truncated value?

How do you group by more than one column?

How do you count distinct values?

What do MIN and MAX return on different types?

Why does SUM sometimes return NULL instead of 0?

Can you group by a computed expression?

What do GROUPING SETS, ROLLUP and CUBE do?

What is the FILTER clause on an aggregate?

How do you avoid double-counting when aggregating over joins?

More Query Basics interview questions

More ways to practice