Why ranking functions matter
Ranking functions assign a position number to each row within a window partition. They are the practical solution for three everyday problems:
- Top-N per group — the best-selling product per category, the latest order per customer, the highest-scoring ticket per agent.
- Deduplication — keeping one row per entity when a table has duplicates.
- Bucketing — dividing a result set into percentiles or quartiles.
ROW_NUMBER — unique sequential number
ROW_NUMBER() gives each row a unique integer starting from 1 within its
partition. Ties get different numbers (arbitrarily assigned by the database).
-- Assign a row number within each customer's orders (newest first)
SELECT
id,
customer_id,
total_amount,
created_at,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders;
Pattern: get the most recent row per group (the classic N=1 use case):
-- Latest order per customer
WITH ranked AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders
)
SELECT * FROM ranked WHERE rn = 1;
This works even when MAX(created_at) is ambiguous (multiple orders at the same
timestamp) — ROW_NUMBER always assigns a unique 1 to exactly one row.
RANK vs DENSE_RANK — handling ties
Both assign the same number to tied rows, but differ in what comes next.
RANK()— skips position numbers after a tie: 1, 2, 2, 4 (gap)DENSE_RANK()— no gaps: 1, 2, 2, 3
-- Sales leaderboard for the current month
SELECT
employee_id,
total_sales,
RANK() OVER (ORDER BY total_sales DESC) AS rank,
DENSE_RANK() OVER (ORDER BY total_sales DESC) AS dense_rank
FROM monthly_sales
WHERE month = '2026-06-01';
-- rank: 1, 2, 2, 4, 5
-- dense_rank: 1, 2, 2, 3, 4
Use DENSE_RANK when "rank 3" should mean "the third distinct performance
level". Use RANK when gaps are acceptable or meaningful (e.g., a sports
league where tied teams share a position and the next is truly skipped).
Top-N per group with RANK
-- Top 3 products by revenue in each category
WITH product_revenue AS (
SELECT
p.id,
p.product_name,
p.category,
SUM(i.unit_price * i.quantity) AS revenue
FROM order_items i
JOIN products p ON p.id = i.product_id
GROUP BY p.id, p.product_name, p.category
),
ranked AS (
SELECT *,
RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS rnk
FROM product_revenue
)
SELECT category, product_name, revenue, rnk
FROM ranked
WHERE rnk <= 3
ORDER BY category, rnk;
NTILE — dividing into buckets
NTILE(n) divides rows into n roughly equal buckets and labels each row with
its bucket number (1 to n). Useful for quartiles, deciles, and percentile bands.
-- Divide customers into 4 revenue quartiles
WITH customer_revenue AS (
SELECT customer_id, SUM(total_amount) AS lifetime_value
FROM orders
GROUP BY customer_id
)
SELECT
customer_id,
lifetime_value,
NTILE(4) OVER (ORDER BY lifetime_value DESC) AS quartile
FROM customer_revenue;
-- quartile 1 = top 25% spenders (highest LTV)
-- quartile 4 = bottom 25% spenders
PERCENT_RANK and CUME_DIST
PERCENT_RANK() returns a value between 0 and 1 representing relative rank.
CUME_DIST() returns the fraction of rows with a value ≤ the current row.
-- What percentile is each order in, by total amount?
SELECT
id,
total_amount,
ROUND(PERCENT_RANK() OVER (ORDER BY total_amount) * 100, 1) AS percentile,
ROUND(CUME_DIST() OVER (ORDER BY total_amount) * 100, 1) AS cumulative_pct
FROM orders;
Deduplication with ROW_NUMBER
Ranking functions are the cleanest way to remove duplicates in a DELETE or
identify them in a report:
-- Find duplicate customer rows (same email, keep the oldest id)
WITH dupes AS (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id ASC) AS rn
FROM customers
)
DELETE FROM customers WHERE id IN (
SELECT id FROM dupes WHERE rn > 1
);
Recap
ROW_NUMBER gives every row a unique number — ideal for top-1-per-group and
deduplication. RANK and DENSE_RANK handle ties: RANK leaves gaps,
DENSE_RANK does not. NTILE(n) buckets rows into n equal groups. All four
require an ORDER BY in the OVER clause to be meaningful. For top-N per
group, wrap the ranking in a CTE and filter WHERE rn <= N in the outer query.