[{"data":1,"prerenderedAt":123},["ShallowReactive",2],{"qa-\u002Fsql\u002Fbasics\u002Faggregation":3},{"page":4,"siblings":103,"blog":120},{"id":5,"title":6,"body":7,"description":11,"difficulty":14,"extension":15,"framework":16,"frameworkSlug":17,"meta":18,"navigation":19,"order":20,"path":21,"questions":22,"questionsCount":93,"related":94,"seo":95,"seoDescription":96,"stem":97,"subtopic":98,"topic":99,"topicSlug":100,"updated":101,"__hash__":102},"qa\u002Fsql\u002Fbasics\u002Faggregation.md","Aggregation",{"type":8,"value":9,"toc":10},"minimark",[],{"title":11,"searchDepth":12,"depth":12,"links":13},"",2,[],"medium","md","SQL","sql",{},true,4,"\u002Fsql\u002Fbasics\u002Faggregation",[23,28,32,36,40,44,48,52,57,61,65,69,73,77,81,85,89],{"id":24,"difficulty":25,"q":26,"a":27},"what-is-aggregation","easy","What is an aggregate function?","An **aggregate function** collapses **many rows into a single value** —\n`COUNT`, `SUM`, `AVG`, `MIN`, `MAX`. Used alone, they summarize the whole table;\nwith `GROUP BY`, they summarize each group.\n\n```sql\nSELECT COUNT(*)   AS total_orders,\n       SUM(total) AS revenue,\n       AVG(total) AS avg_order\nFROM orders;\n```\n\nAggregates run **after `WHERE`** (on the filtered rows) and **before `HAVING`**.\nWithout `GROUP BY`, an aggregate over the whole result returns exactly one row.\n\nRule of thumb: aggregates turn a set of rows into one summary value.\n",{"id":29,"difficulty":25,"q":30,"a":31},"group-by","What does GROUP BY do?","`GROUP BY` **partitions rows into groups** that share the same values in the grouping\ncolumns, then computes one aggregate result **per group**. The result has one row per\ndistinct group.\n\n```sql\nSELECT user_id, COUNT(*) AS orders, SUM(total) AS spent\nFROM orders\nGROUP BY user_id;          -- one row per user\n```\n\nYou can group by multiple columns (`GROUP BY country, city`) — groups are then the\ndistinct **combinations**. Grouping happens after `WHERE` filters the rows.\n\nRule of thumb: `GROUP BY` defines what \"one row of the result\" means; aggregates\nsummarize each.\n",{"id":33,"difficulty":14,"q":34,"a":35},"group-by-rule","Why must non-aggregated SELECT columns appear in GROUP BY?","In standard SQL, every column in the `SELECT` list must either be **inside an\naggregate** or **listed in `GROUP BY`**. Otherwise the column has many values per\ngroup and the database can't pick one.\n\n```sql\n-- ERROR in standard SQL: name isn't grouped or aggregated\nSELECT user_id, name, COUNT(*) FROM orders GROUP BY user_id;\n\n-- fix: group by it too, or aggregate it\nSELECT user_id, MAX(name) AS name, COUNT(*) FROM orders GROUP BY user_id;\n```\n\nMySQL historically allowed this (returning an arbitrary value) but now rejects it by\ndefault under `ONLY_FULL_GROUP_BY`. Postgres allows ungrouped columns only if they're\nfunctionally dependent on the primary key.\n\nRule of thumb: if it's in `SELECT` and not aggregated, it must be in `GROUP BY`.\n",{"id":37,"difficulty":14,"q":38,"a":39},"count-star-vs-col","What is the difference between COUNT(*) and COUNT(column)?","`COUNT(*)` counts **rows**; `COUNT(column)` counts rows where that column is **not\nNULL**. The difference shows up whenever the column has NULLs.\n\n```sql\nSELECT COUNT(*)        AS rows,        -- every row\n       COUNT(phone)    AS with_phone,  -- rows where phone IS NOT NULL\n       COUNT(DISTINCT country) AS countries\nFROM users;\n```\n\n`COUNT(DISTINCT col)` counts distinct non-NULL values. This matters after outer\njoins, where `COUNT(*)` would count the NULL-filled placeholder row.\n\nRule of thumb: `COUNT(*)` for rows, `COUNT(col)` for non-NULL values, `COUNT(DISTINCT\ncol)` for unique values.\n",{"id":41,"difficulty":14,"q":42,"a":43},"aggregates-ignore-null","How do aggregate functions handle NULLs?","All aggregates **ignore NULLs** (except `COUNT(*)`). `SUM`, `AVG`, `MIN`, `MAX`, and\n`COUNT(col)` skip rows where the value is NULL — they don't treat NULL as zero.\n\n```sql\n-- AVG divides by the count of NON-NULL scores, not all rows\nSELECT AVG(score) FROM tests;       -- NULL scores excluded entirely\n```\n\nThis is usually right, but watch `AVG`: if you want NULLs counted as 0, convert them\nfirst with `AVG(COALESCE(score, 0))`. `SUM` of all-NULL (or no) rows returns `NULL`,\nnot 0.\n\nRule of thumb: aggregates skip NULLs; use `COALESCE` first if NULLs should count as\nzero.\n",{"id":45,"difficulty":14,"q":46,"a":47},"having-clause","What does the HAVING clause do?","`HAVING` filters **groups after aggregation**, the way `WHERE` filters rows before\nit. It's the only place you can filter on an aggregate's result.\n\n```sql\nSELECT user_id, COUNT(*) AS orders\nFROM orders\nGROUP BY user_id\nHAVING COUNT(*) >= 10;     -- only users with 10+ orders\n```\n\n`HAVING` can reference aggregates and grouping columns. Put per-row conditions in\n`WHERE` (cheaper, runs first) and reserve `HAVING` for conditions on the aggregates.\n\nRule of thumb: filter rows in `WHERE`, filter aggregated groups in `HAVING`.\n",{"id":49,"difficulty":14,"q":50,"a":51},"where-vs-having-agg","Why can't you use an aggregate in WHERE?","`WHERE` runs **before** grouping and aggregation, so the aggregate values don't\nexist yet. Referencing `COUNT()`\u002F`SUM()` in `WHERE` is an error; those belong in\n`HAVING`, which runs after.\n\n```sql\n-- ERROR: aggregate not allowed in WHERE\nSELECT user_id FROM orders WHERE COUNT(*) > 5 GROUP BY user_id;\n\n-- correct\nSELECT user_id FROM orders GROUP BY user_id HAVING COUNT(*) > 5;\n```\n\nThe logical order `WHERE → GROUP BY → HAVING` is the whole reason for the split.\n\nRule of thumb: aggregate condition → `HAVING`; raw-column condition → `WHERE`.\n",{"id":53,"difficulty":54,"q":55,"a":56},"conditional-aggregation-basics","hard","How do you aggregate conditionally (pivot)?","Wrap a `CASE` inside the aggregate so it only counts\u002Fsums rows meeting a condition.\nThis produces a **pivot** — multiple conditional columns in one pass.\n\n```sql\nSELECT user_id,\n       COUNT(*) FILTER (WHERE status = 'paid')      AS paid,     -- Postgres\n       SUM(CASE WHEN status = 'refunded' THEN 1 ELSE 0 END) AS refunded\nFROM orders\nGROUP BY user_id;\n```\n\nThe portable form is `SUM(CASE WHEN ... THEN 1 ELSE 0 END)`; Postgres\u002FSQLite offer\nthe cleaner `FILTER (WHERE ...)` clause. One scan yields several conditional totals.\n\nRule of thumb: `CASE` inside an aggregate (or `FILTER`) turns rows into pivoted\ncolumns.\n",{"id":58,"difficulty":14,"q":59,"a":60},"avg-integer-trap","Why might AVG return a truncated value?","If the column is an **integer type**, some databases compute `AVG` (or the underlying\n`SUM`\u002Fcount division) using integer arithmetic and truncate the fraction. You get\n`3` instead of `3.5`.\n\n```sql\n-- cast to numeric to keep the fraction\nSELECT AVG(rating)          AS maybe_truncated,\n       AVG(rating::numeric) AS exact            -- Postgres cast\nFROM reviews;\n```\n\nPostgres's `AVG` actually returns numeric for integer input, but manual `SUM(x)\u002F\nCOUNT(x)` will truncate. MySQL\u002FSQL Server can truncate depending on types.\n\nRule of thumb: cast integer columns to decimal before averaging or manually\ndividing.\n",{"id":62,"difficulty":25,"q":63,"a":64},"grouping-multiple-columns","How do you group by more than one column?","List several columns in `GROUP BY`; groups become the distinct **combinations** of\nthose columns. The result has one row per unique combination.\n\n```sql\nSELECT country, city, COUNT(*) AS users\nFROM users\nGROUP BY country, city\nORDER BY country, users DESC;\n```\n\nAdding a column to `GROUP BY` makes the groups **finer** (more, smaller groups).\nEvery grouped column may appear bare in `SELECT`.\n\nRule of thumb: grouping by more columns = more, smaller groups.\n",{"id":66,"difficulty":14,"q":67,"a":68},"count-distinct","How do you count distinct values?","Use `COUNT(DISTINCT col)`, which counts unique **non-NULL** values. You can combine\nit with grouping to count distinct values per group.\n\n```sql\n-- distinct products bought by each user\nSELECT user_id, COUNT(DISTINCT product_id) AS unique_products\nFROM order_items\nGROUP BY user_id;\n```\n\n`COUNT(DISTINCT ...)` is more expensive than `COUNT(*)` because it must deduplicate.\nFor huge tables, approximate counts (`APPROX_COUNT_DISTINCT`, Postgres `HLL`) trade\naccuracy for speed.\n\nRule of thumb: `COUNT(DISTINCT col)` for unique counts; consider approximate\nvariants at scale.\n",{"id":70,"difficulty":25,"q":71,"a":72},"min-max","What do MIN and MAX return on different types?","`MIN`\u002F`MAX` return the smallest\u002Flargest value by the type's natural ordering: numeric\norder for numbers, chronological for dates, and lexicographic for strings. They\nignore NULLs.\n\n```sql\nSELECT MIN(created_at) AS first_signup,\n       MAX(total)      AS biggest_order,\n       MIN(name)       AS alphabetically_first\nFROM orders;\n```\n\nA common pattern is `MAX(created_at)` per group to find the latest event time — but\nthat only gives the *time*, not the *whole row* (use a window function or\n`DISTINCT ON` for that).\n\nRule of thumb: `MIN`\u002F`MAX` give the extreme **value**, not the row it came from.\n",{"id":74,"difficulty":14,"q":75,"a":76},"sum-null-result","Why does SUM sometimes return NULL instead of 0?","`SUM` over **zero rows** (or all-NULL values) returns `NULL`, not `0`, because\nthere's nothing to add. This bites after filters or outer joins that leave a group\nempty.\n\n```sql\n-- returns NULL if the user has no matching orders\nSELECT COALESCE(SUM(total), 0) AS spent\nFROM orders\nWHERE user_id = 42 AND status = 'paid';\n```\n\nWrap with `COALESCE(SUM(x), 0)` whenever a missing\u002Fempty group should read as zero.\n\nRule of thumb: `COALESCE(SUM(x), 0)` to turn \"no rows\" into a 0 instead of NULL.\n",{"id":78,"difficulty":14,"q":79,"a":80},"group-by-expression","Can you group by a computed expression?","Yes — `GROUP BY` accepts expressions, not just bare columns. This is how you bucket\ncontinuous values (by month, by range, by derived category).\n\n```sql\n-- orders per month\nSELECT DATE_TRUNC('month', created_at) AS month, COUNT(*) AS n\nFROM orders\nGROUP BY DATE_TRUNC('month', created_at)\nORDER BY month;\n```\n\nRepeat the same expression in `GROUP BY` and `SELECT`. Some dialects let you group by\nthe alias or position number, but repeating the expression is the portable form.\n\nRule of thumb: group by the same expression you select to bucket continuous data.\n",{"id":82,"difficulty":54,"q":83,"a":84},"rollup-cube","What do GROUPING SETS, ROLLUP and CUBE do?","They compute **multiple grouping levels in one query**, adding subtotal\u002Fgrand-total\nrows. `ROLLUP` makes hierarchical subtotals; `CUBE` makes every combination;\n`GROUPING SETS` lists exactly the groupings you want.\n\n```sql\n-- subtotals per (country, city), per country, and a grand total\nSELECT country, city, SUM(total)\nFROM sales\nGROUP BY ROLLUP (country, city);\n```\n\nThe subtotal rows have `NULL` in the rolled-up columns; `GROUPING()` distinguishes a\n\"real NULL\" from a subtotal marker.\n\nRule of thumb: `ROLLUP`\u002F`CUBE` add subtotal rows without multiple `UNION`ed queries.\n",{"id":86,"difficulty":14,"q":87,"a":88},"filter-clause","What is the FILTER clause on an aggregate?","`FILTER (WHERE condition)` restricts an aggregate to rows matching the condition —\nthe standard, readable alternative to `SUM(CASE WHEN ...)`. Supported in\nPostgres and SQLite.\n\n```sql\nSELECT\n  COUNT(*)                               AS total,\n  COUNT(*) FILTER (WHERE status = 'paid') AS paid,\n  AVG(total) FILTER (WHERE total > 0)     AS avg_nonzero\nFROM orders;\n```\n\nEach aggregate can have its own `FILTER`, so one query produces several differently\nfiltered metrics. Where unsupported, fall back to `CASE` inside the aggregate.\n\nRule of thumb: prefer `FILTER (WHERE ...)` over `CASE`-inside-aggregate where your\ndatabase supports it.\n",{"id":90,"difficulty":54,"q":91,"a":92},"aggregate-over-join","How do you avoid double-counting when aggregating over joins?","Joining a parent to a one-to-many child multiplies the parent's rows, so\n`SUM`\u002F`COUNT` over the joined result **double-counts**. Pre-aggregate the child in a\nsubquery first, then join.\n\n```sql\nSELECT u.name, o.order_count, o.revenue\nFROM users u\nLEFT JOIN (\n  SELECT user_id, COUNT(*) AS order_count, SUM(total) AS revenue\n  FROM orders GROUP BY user_id\n) o ON o.user_id = u.id;\n```\n\nJoining two different one-to-many children to the same parent is the classic \"fan\ntrap\" that inflates sums — pre-aggregate each child separately.\n\nRule of thumb: pre-aggregate one-to-many children before joining to avoid inflated\ntotals.\n",17,null,{"description":11},"SQL aggregation interview questions — COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING, NULL handling, COUNT(*) vs COUNT(col) and conditional aggregation.","sql\u002Fbasics\u002Faggregation","Aggregation & GROUP BY","Query Basics","basics","2026-06-20","yqRLOYzJKFw3JP7IDpqvPVRPdqSrjTrYXCTLbGafvl4",[104,108,111,115,116],{"subtopic":105,"path":106,"order":107},"Joins","\u002Fsql\u002Fbasics\u002Fjoins",1,{"subtopic":109,"path":110,"order":12},"SELECT & WHERE","\u002Fsql\u002Fbasics\u002Fselect-where",{"subtopic":112,"path":113,"order":114},"Sorting & Limiting","\u002Fsql\u002Fbasics\u002Fsorting-limiting",3,{"subtopic":98,"path":21,"order":20},{"subtopic":117,"path":118,"order":119},"Set Operations","\u002Fsql\u002Fbasics\u002Fset-operations",5,{"path":121,"title":122},"\u002Fblog\u002Fsql-aggregation-group-by-having","SQL Aggregation — GROUP BY, HAVING, and Aggregate Functions",1782244106875]