SQL Aggregation — GROUP BY, HAVING, and Aggregate Functions

What aggregation is for

Raw tables store one fact per row. Aggregation collapses many rows into summary statistics — total revenue per month, average order value per customer, number of open support tickets per team. Every report, dashboard, and analytics query you write will use it.

SQL's aggregate functions are: COUNT, SUM, AVG, MIN, MAX, and STRING_AGG / GROUP_CONCAT. They appear in SELECT and HAVING, and they operate on groups defined by GROUP BY.

COUNT — the four forms

-- COUNT(*) counts every row including those with NULLs
SELECT COUNT(*) AS total_orders FROM orders;

-- COUNT(column) counts non-NULL values only
SELECT COUNT(coupon_code) AS orders_with_coupon FROM orders;

-- COUNT(DISTINCT column) counts unique non-NULL values
SELECT COUNT(DISTINCT customer_id) AS unique_buyers FROM orders;

-- Conditional count (orders placed this month)
SELECT COUNT(*) FILTER (WHERE created_at >= DATE_TRUNC('month', NOW()))
AS orders_this_month
FROM orders;
-- MySQL equivalent: COUNT(CASE WHEN created_at >= ... THEN 1 END)

A common bug: using COUNT(*) with a LEFT JOIN and expecting zero for unmatched rows. It returns 1, because the NULL-padded row still counts. Use COUNT(right_table.id) instead.

SUM, AVG, MIN, MAX

-- Revenue summary for last 30 days
SELECT
    SUM(total_amount)            AS gross_revenue,
    AVG(total_amount)            AS avg_order_value,
    MIN(total_amount)            AS smallest_order,
    MAX(total_amount)            AS largest_order,
    COUNT(*)                     AS order_count
FROM orders
WHERE created_at >= NOW() - INTERVAL '30 days';

All aggregate functions except COUNT(*) ignore NULLs. This means AVG(discount) divides by the count of non-NULL rows — if half your orders have no discount (NULL), the average overstates the real per-order discount. Use AVG(COALESCE(discount, 0)) when you want NULLs counted as zero.

GROUP BY — summaries per group

GROUP BY splits rows into buckets by one or more columns, then applies the aggregate function within each bucket.

-- Revenue and order count per customer
SELECT
    customer_id,
    COUNT(*)                   AS order_count,
    SUM(total_amount)          AS lifetime_value,
    MAX(created_at)            AS last_order_date
FROM orders
GROUP BY customer_id
ORDER BY lifetime_value DESC;

Every column in SELECT that is not inside an aggregate function must appear in GROUP BY. The database raises an error otherwise (except in MySQL with ONLY_FULL_GROUP_BY disabled, where the behaviour is undefined).

Multi-column GROUP BY

-- Monthly revenue by product category
SELECT
    DATE_TRUNC('month', o.created_at) AS month,
    p.category,
    SUM(i.unit_price * i.quantity)    AS revenue,
    COUNT(DISTINCT o.id)              AS orders
FROM orders o
JOIN order_items i ON i.order_id = o.id
JOIN products    p ON p.id = i.product_id
GROUP BY DATE_TRUNC('month', o.created_at), p.category
ORDER BY month DESC, revenue DESC;

HAVING — filtering on aggregate values

WHERE runs before aggregation and cannot reference aggregate results. HAVING runs after aggregation and filters the groups.

-- Customers who have placed more than 5 orders AND spent over £500
SELECT
    customer_id,
    COUNT(*)          AS order_count,
    SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5
   AND SUM(total_amount) > 500
ORDER BY total_spent DESC;

-- Product categories with average price above £50
SELECT category, AVG(unit_price) AS avg_price
FROM products
GROUP BY category
HAVING AVG(unit_price) > 50;

Think of WHERE as "which rows go into the calculation" and HAVING as "which calculation results do I keep".

Conditional aggregation — pivoting without a join

You can combine CASE with an aggregate to compute multiple summaries in a single pass:

-- Order counts broken down by status in one query
SELECT
    COUNT(*)                                              AS total,
    COUNT(CASE WHEN status = 'pending'   THEN 1 END)     AS pending,
    COUNT(CASE WHEN status = 'shipped'   THEN 1 END)     AS shipped,
    COUNT(CASE WHEN status = 'delivered' THEN 1 END)     AS delivered,
    SUM(CASE WHEN status = 'pending' THEN total_amount ELSE 0 END) AS pending_revenue
FROM orders
WHERE created_at >= '2026-01-01';

GROUP BY with ROLLUP

ROLLUP adds subtotal rows automatically — useful for reporting hierarchies:

-- Revenue by year and month, with yearly subtotals
SELECT
    EXTRACT(YEAR  FROM created_at) AS yr,
    EXTRACT(MONTH FROM created_at) AS mo,
    SUM(total_amount)              AS revenue
FROM orders
GROUP BY ROLLUP (
    EXTRACT(YEAR FROM created_at),
    EXTRACT(MONTH FROM created_at)
)
ORDER BY yr, mo;
-- Rows where mo IS NULL are year-level subtotals
-- The row where both yr and mo are NULL is the grand total

Recap

Aggregate functions collapse many rows into one value per group. GROUP BY defines the groups; every non-aggregated SELECT column must be in GROUP BY. Filter raw rows with WHERE before aggregation; filter groups with HAVING after. Use conditional aggregation (CASE inside COUNT/SUM) to produce multi-column summaries in a single pass instead of running multiple queries.

More ways to practice