What aggregation is for
Raw tables store one fact per row. Aggregation collapses many rows into summary statistics — total revenue per month, average order value per customer, number of open support tickets per team. Every report, dashboard, and analytics query you write will use it.
SQL's aggregate functions are: COUNT, SUM, AVG, MIN, MAX, and
STRING_AGG / GROUP_CONCAT. They appear in SELECT and HAVING, and they
operate on groups defined by GROUP BY.
COUNT — the four forms
-- COUNT(*) counts every row including those with NULLs
SELECT COUNT(*) AS total_orders FROM orders;
-- COUNT(column) counts non-NULL values only
SELECT COUNT(coupon_code) AS orders_with_coupon FROM orders;
-- COUNT(DISTINCT column) counts unique non-NULL values
SELECT COUNT(DISTINCT customer_id) AS unique_buyers FROM orders;
-- Conditional count (orders placed this month)
SELECT COUNT(*) FILTER (WHERE created_at >= DATE_TRUNC('month', NOW()))
AS orders_this_month
FROM orders;
-- MySQL equivalent: COUNT(CASE WHEN created_at >= ... THEN 1 END)
A common bug: using COUNT(*) with a LEFT JOIN and expecting zero for
unmatched rows. It returns 1, because the NULL-padded row still counts. Use
COUNT(right_table.id) instead.
SUM, AVG, MIN, MAX
-- Revenue summary for last 30 days
SELECT
SUM(total_amount) AS gross_revenue,
AVG(total_amount) AS avg_order_value,
MIN(total_amount) AS smallest_order,
MAX(total_amount) AS largest_order,
COUNT(*) AS order_count
FROM orders
WHERE created_at >= NOW() - INTERVAL '30 days';
All aggregate functions except COUNT(*) ignore NULLs. This means
AVG(discount) divides by the count of non-NULL rows — if half your orders
have no discount (NULL), the average overstates the real per-order discount.
Use AVG(COALESCE(discount, 0)) when you want NULLs counted as zero.
GROUP BY — summaries per group
GROUP BY splits rows into buckets by one or more columns, then applies the
aggregate function within each bucket.
-- Revenue and order count per customer
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total_amount) AS lifetime_value,
MAX(created_at) AS last_order_date
FROM orders
GROUP BY customer_id
ORDER BY lifetime_value DESC;
Every column in SELECT that is not inside an aggregate function must
appear in GROUP BY. The database raises an error otherwise (except in MySQL
with ONLY_FULL_GROUP_BY disabled, where the behaviour is undefined).
Multi-column GROUP BY
-- Monthly revenue by product category
SELECT
DATE_TRUNC('month', o.created_at) AS month,
p.category,
SUM(i.unit_price * i.quantity) AS revenue,
COUNT(DISTINCT o.id) AS orders
FROM orders o
JOIN order_items i ON i.order_id = o.id
JOIN products p ON p.id = i.product_id
GROUP BY DATE_TRUNC('month', o.created_at), p.category
ORDER BY month DESC, revenue DESC;
HAVING — filtering on aggregate values
WHERE runs before aggregation and cannot reference aggregate results. HAVING
runs after aggregation and filters the groups.
-- Customers who have placed more than 5 orders AND spent over £500
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5
AND SUM(total_amount) > 500
ORDER BY total_spent DESC;
-- Product categories with average price above £50
SELECT category, AVG(unit_price) AS avg_price
FROM products
GROUP BY category
HAVING AVG(unit_price) > 50;
Think of WHERE as "which rows go into the calculation" and HAVING as
"which calculation results do I keep".
Conditional aggregation — pivoting without a join
You can combine CASE with an aggregate to compute multiple summaries in a
single pass:
-- Order counts broken down by status in one query
SELECT
COUNT(*) AS total,
COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending,
COUNT(CASE WHEN status = 'shipped' THEN 1 END) AS shipped,
COUNT(CASE WHEN status = 'delivered' THEN 1 END) AS delivered,
SUM(CASE WHEN status = 'pending' THEN total_amount ELSE 0 END) AS pending_revenue
FROM orders
WHERE created_at >= '2026-01-01';
GROUP BY with ROLLUP
ROLLUP adds subtotal rows automatically — useful for reporting hierarchies:
-- Revenue by year and month, with yearly subtotals
SELECT
EXTRACT(YEAR FROM created_at) AS yr,
EXTRACT(MONTH FROM created_at) AS mo,
SUM(total_amount) AS revenue
FROM orders
GROUP BY ROLLUP (
EXTRACT(YEAR FROM created_at),
EXTRACT(MONTH FROM created_at)
)
ORDER BY yr, mo;
-- Rows where mo IS NULL are year-level subtotals
-- The row where both yr and mo are NULL is the grand total
Recap
Aggregate functions collapse many rows into one value per group. GROUP BY
defines the groups; every non-aggregated SELECT column must be in GROUP BY.
Filter raw rows with WHERE before aggregation; filter groups with HAVING
after. Use conditional aggregation (CASE inside COUNT/SUM) to produce
multi-column summaries in a single pass instead of running multiple queries.