The optimization mindset
The database query optimizer generates an execution plan for every query. It
estimates the cost of different approaches — which indexes to use, in what order
to join tables, whether to hash or sort — and picks the cheapest. Sometimes it
guesses wrong. EXPLAIN shows you the plan it chose; EXPLAIN ANALYZE runs the
query and shows actual timings. That is where optimization starts.
Reading EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, c.email, o.total_amount
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
AND o.created_at >= '2026-06-01'
ORDER BY o.created_at DESC
LIMIT 50;
Sample output annotations:
Limit (cost=... rows=50) (actual time=4.2..4.3 ms rows=50)
-> Sort (actual time=4.1..4.2 ms rows=50)
Sort Key: o.created_at DESC
Sort Method: top-N heapsort Memory: 30kB
-> Hash Join (actual time=1.2..3.8 ms rows=412)
Hash Cond: (o.customer_id = c.id)
-> Index Scan using idx_orders_status_created on orders o
(actual time=0.05..2.1 ms rows=412)
Index Cond: (status = 'pending')
Filter: (created_at >= '2026-06-01')
-> Hash (actual time=0.8..0.8 ms rows=9541)
-> Seq Scan on customers c
Key things to read:
- Seq Scan on a large table — look for a missing index.
- Rows estimate vs actual rows — a big gap means stale statistics; run
ANALYZE. - Buffers: hit vs read — "read" means disk I/O; "hit" means cache.
- Sort Method: external merge — sort spilled to disk; increase
work_mem.
The most common slow-query patterns
1. Sequential scan on a large table
-- Slow: full scan of 10M-row orders table
SELECT * FROM orders WHERE customer_id = 1001;
-- EXPLAIN shows: Seq Scan on orders (rows=10000000)
-- Fix: add an index
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
2. Function on an indexed column
-- Slow: index on email is not used
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
-- Fix: functional index
CREATE INDEX idx_users_email_ci ON users (LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com'; -- now uses index
3. N+1 query — the hidden loop
The N+1 problem is not visible in EXPLAIN because each query is individually
fast. The problem is issuing N queries in a loop instead of one set-based query.
# N+1 — 1 query for orders, then 1 per order to fetch customer (BAD)
orders = db.query("SELECT * FROM orders WHERE status = 'pending'")
for order in orders:
customer = db.query("SELECT * FROM customers WHERE id = ?", order.customer_id)
# → 1 + N queries total
# Fix: join everything in one query
orders = db.query("""
SELECT o.*, c.email, c.display_name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
""")
4. SELECT * pulling unused columns
-- Slow: fetches all columns including large TEXT and JSONB blobs
SELECT * FROM products WHERE category = 'Electronics';
-- Fast: fetch only what the application needs
SELECT id, product_name, unit_price, stock_qty
FROM products
WHERE category = 'Electronics';
5. Missing join index
-- Without index on order_items.order_id, joining is a full scan each time
EXPLAIN SELECT o.id, COUNT(oi.id) AS item_count
FROM orders o JOIN order_items oi ON oi.order_id = o.id
GROUP BY o.id;
-- Hash Join with Seq Scan on order_items — bad at scale
CREATE INDEX idx_order_items_order_id ON order_items (order_id);
-- Now uses Index Scan — much faster
Stale statistics — when the planner guesses wrong
The optimizer uses table statistics (row counts, value distributions) to estimate
the cost of each plan. If a table grew significantly since the last ANALYZE,
estimates diverge from reality and the planner picks bad plans.
-- Update statistics manually
ANALYZE orders;
ANALYZE VERBOSE customers; -- shows what it collected
-- Postgres autovacuum runs ANALYZE automatically, but you can force it
-- after a large bulk load:
INSERT INTO orders SELECT ... FROM staging; -- bulk insert
ANALYZE orders;
work_mem — sort and hash join memory
Sorts and hash joins spill to disk when they exceed work_mem. Setting it too
low forces disk I/O; setting it globally too high wastes RAM. Set it per session
for expensive queries:
-- For this session only, allow more memory for sorts and hash joins
SET work_mem = '256MB';
EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, SUM(total_amount)
FROM orders
GROUP BY customer_id
ORDER BY SUM(total_amount) DESC;
-- Sort Method changes from "external merge Disk" to "quicksort Memory"
Using pg_stat_statements to find the slowest queries
-- Top 10 queries by total execution time (Postgres extension)
SELECT
query,
calls,
ROUND(total_exec_time::NUMERIC, 2) AS total_ms,
ROUND(mean_exec_time::NUMERIC, 2) AS avg_ms,
ROUND(stddev_exec_time::NUMERIC, 2) AS stddev_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
Focus on total execution time — a query called 100,000 times for 1 ms each (100 s total) is a better optimization target than one called once for 5 s.
Recap
Start optimization with EXPLAIN (ANALYZE, BUFFERS) — look for sequential scans
on large tables, row estimate errors, and disk sorts. Fix sequential scans with
indexes; fix N+1 with joins; fix stale statistics with ANALYZE; fix spilling
sorts by raising work_mem per session. Use pg_stat_statements to find which
queries consume the most cumulative time across all calls — that is where
optimization pays off most.