SQL Query Optimization — Reading EXPLAIN, Fixing Slow Queries

The optimization mindset

The database query optimizer generates an execution plan for every query. It estimates the cost of different approaches — which indexes to use, in what order to join tables, whether to hash or sort — and picks the cheapest. Sometimes it guesses wrong. EXPLAIN shows you the plan it chose; EXPLAIN ANALYZE runs the query and shows actual timings. That is where optimization starts.

Reading EXPLAIN ANALYZE

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, c.email, o.total_amount
FROM   orders    o
JOIN   customers c ON c.id = o.customer_id
WHERE  o.status = 'pending'
  AND  o.created_at >= '2026-06-01'
ORDER  BY o.created_at DESC
LIMIT  50;

Sample output annotations:

Limit  (cost=... rows=50) (actual time=4.2..4.3 ms rows=50)
  -> Sort  (actual time=4.1..4.2 ms rows=50)
       Sort Key: o.created_at DESC
       Sort Method: top-N heapsort  Memory: 30kB
       -> Hash Join  (actual time=1.2..3.8 ms rows=412)
            Hash Cond: (o.customer_id = c.id)
            -> Index Scan using idx_orders_status_created on orders o
                 (actual time=0.05..2.1 ms rows=412)
                 Index Cond: (status = 'pending')
                 Filter: (created_at >= '2026-06-01')
            -> Hash  (actual time=0.8..0.8 ms rows=9541)
                 -> Seq Scan on customers c

Key things to read:

Seq Scan on a large table — look for a missing index.
Rows estimate vs actual rows — a big gap means stale statistics; run ANALYZE.
Buffers: hit vs read — "read" means disk I/O; "hit" means cache.
Sort Method: external merge — sort spilled to disk; increase work_mem.

The most common slow-query patterns

1. Sequential scan on a large table

-- Slow: full scan of 10M-row orders table
SELECT * FROM orders WHERE customer_id = 1001;

-- EXPLAIN shows: Seq Scan on orders (rows=10000000)
-- Fix: add an index
CREATE INDEX idx_orders_customer_id ON orders (customer_id);

2. Function on an indexed column

-- Slow: index on email is not used
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

-- Fix: functional index
CREATE INDEX idx_users_email_ci ON users (LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';  -- now uses index

3. N+1 query — the hidden loop

The N+1 problem is not visible in EXPLAIN because each query is individually fast. The problem is issuing N queries in a loop instead of one set-based query.

# N+1 — 1 query for orders, then 1 per order to fetch customer (BAD)
orders = db.query("SELECT * FROM orders WHERE status = 'pending'")
for order in orders:
    customer = db.query("SELECT * FROM customers WHERE id = ?", order.customer_id)
    # → 1 + N queries total

# Fix: join everything in one query
orders = db.query("""
    SELECT o.*, c.email, c.display_name
    FROM   orders o
    JOIN   customers c ON c.id = o.customer_id
    WHERE  o.status = 'pending'
""")

4. SELECT * pulling unused columns

-- Slow: fetches all columns including large TEXT and JSONB blobs
SELECT * FROM products WHERE category = 'Electronics';

-- Fast: fetch only what the application needs
SELECT id, product_name, unit_price, stock_qty
FROM   products
WHERE  category = 'Electronics';

5. Missing join index

-- Without index on order_items.order_id, joining is a full scan each time
EXPLAIN SELECT o.id, COUNT(oi.id) AS item_count
FROM orders o JOIN order_items oi ON oi.order_id = o.id
GROUP BY o.id;
-- Hash Join with Seq Scan on order_items — bad at scale

CREATE INDEX idx_order_items_order_id ON order_items (order_id);
-- Now uses Index Scan — much faster

Stale statistics — when the planner guesses wrong

The optimizer uses table statistics (row counts, value distributions) to estimate the cost of each plan. If a table grew significantly since the last ANALYZE, estimates diverge from reality and the planner picks bad plans.

-- Update statistics manually
ANALYZE orders;
ANALYZE VERBOSE customers;  -- shows what it collected

-- Postgres autovacuum runs ANALYZE automatically, but you can force it
-- after a large bulk load:
INSERT INTO orders SELECT ... FROM staging;  -- bulk insert
ANALYZE orders;

work_mem — sort and hash join memory

Sorts and hash joins spill to disk when they exceed work_mem. Setting it too low forces disk I/O; setting it globally too high wastes RAM. Set it per session for expensive queries:

-- For this session only, allow more memory for sorts and hash joins
SET work_mem = '256MB';

EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, SUM(total_amount)
FROM orders
GROUP BY customer_id
ORDER BY SUM(total_amount) DESC;
-- Sort Method changes from "external merge Disk" to "quicksort Memory"

Using pg_stat_statements to find the slowest queries

-- Top 10 queries by total execution time (Postgres extension)
SELECT
    query,
    calls,
    ROUND(total_exec_time::NUMERIC, 2) AS total_ms,
    ROUND(mean_exec_time::NUMERIC, 2)  AS avg_ms,
    ROUND(stddev_exec_time::NUMERIC, 2) AS stddev_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Focus on total execution time — a query called 100,000 times for 1 ms each (100 s total) is a better optimization target than one called once for 5 s.

Recap

Start optimization with EXPLAIN (ANALYZE, BUFFERS) — look for sequential scans on large tables, row estimate errors, and disk sorts. Fix sequential scans with indexes; fix N+1 with joins; fix stale statistics with ANALYZE; fix spilling sorts by raising work_mem per session. Use pg_stat_statements to find which queries consume the most cumulative time across all calls — that is where optimization pays off most.

More ways to practice