EXPLAIN shows the query execution plan the database chose — which
indexes are used, what join strategies are applied, and the estimated cost
and row counts at each step. EXPLAIN ANALYZE actually runs the query and
adds real timings and row counts alongside the estimates.
-- Postgres: estimated plan only (does not run the query)
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Postgres: run the query, show actual vs estimated
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 42;
-- MySQL
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42; -- MySQL 8.0.18+
-- SQL Server
SET STATISTICS IO ON;
SET SHOWPLAN_TEXT ON;
SELECT * FROM orders WHERE customer_id = 42;
Key fields to read in Postgres output:
Seq Scan/Index Scan/Index Only Scan— access methodrows=N— estimated rows; compare toactual rows=Nin ANALYZEcost=start..total— planner's cost units (not wall-clock ms)Buffers: shared hit=N read=N— cache hits vs disk reads
Rule of thumb: always use EXPLAIN ANALYZE (not just EXPLAIN) on
slow queries — large discrepancies between estimated and actual rows reveal
stale statistics, which is the root cause of most bad query plans.
A sequential scan on a large table is the most common source of slow queries. Work through this checklist:
- Is there an index on the WHERE column? If not, create one.
- Is the index being used? Check
EXPLAIN— if not, the planner may think a seq scan is cheaper (see next steps). - Are statistics fresh? Run
ANALYZE table_nameand re-check the plan. - Is selectivity high? An index on a low-cardinality column (e.g. a boolean) won't help if 80 % of rows match.
- Is there a function in the WHERE clause?
WHERE lower(email) = '...'won't use an index onemail— create a functional index. - Is
random_page_costtuned for SSDs? Default is 4.0 (HDD); set to 1.1 for SSD storage to make index scans more attractive.
-- Run ANALYZE to refresh statistics
ANALYZE orders;
-- Tune cost parameters for SSD (session level)
SET random_page_cost = 1.1;
SET effective_cache_size = '4GB';
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
Rule of thumb: stale statistics cause the planner to choose bad plans
more often than missing indexes. Always ANALYZE before concluding that an
index isn't working.
The N+1 problem occurs when code fetches a list of N parent records and then issues one additional query per record to load its children — totalling N+1 round-trips instead of 1.
-- N+1: 1 query for customers + N queries for orders (one per customer)
SELECT id, name FROM customers WHERE active = TRUE; -- → N rows
-- then in a loop:
SELECT * FROM orders WHERE customer_id = ?; -- N times
-- Fix: JOIN or subquery — 1 round-trip total
SELECT c.id, c.name, o.id AS order_id, o.total
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE c.active = TRUE;
-- Or use a lateral join to get the latest order per customer
SELECT c.id, c.name, latest.total
FROM customers c
LEFT JOIN LATERAL (
SELECT total FROM orders WHERE customer_id = c.id
ORDER BY created_at DESC LIMIT 1
) latest ON TRUE
WHERE c.active = TRUE;
Rule of thumb: if application logs show many nearly-identical queries
differing only by a primary key value, you have an N+1 problem. Fix it with
a JOIN, an IN (...) batch fetch, or a lateral join.
The query planner chooses from three physical join algorithms:
- Nested Loop Join — for each row in the outer table, scan the inner table (optionally using an index). O(N × M) worst case. Best when the outer set is very small or an index on the inner table makes the inner scan cheap.
- Hash Join — build a hash table from the smaller side, then probe it with each row from the larger side. O(N + M). Best for large unsorted inputs with no useful index.
- Merge Join — sort both sides by the join key, then merge in O(N + M). Best when both sides are already sorted (e.g., both have B-tree index scans in join-key order).
-- Force a specific strategy (Postgres — for testing only)
SET enable_hashjoin = off;
SET enable_mergejoin = off;
EXPLAIN SELECT * FROM orders o JOIN customers c ON c.id = o.customer_id;
-- Now forced to Nested Loop
Rule of thumb: trust the optimizer to pick the right join strategy. Add an index on the join column of the larger table to make nested-loop joins efficient, and ensure statistics are current so the planner estimates set sizes correctly.
Statistics are metadata the query planner uses to estimate how many rows a filter will return — column histograms, most common values, null fractions, and row counts. Stale statistics lead to bad execution plans.
-- Postgres: update statistics for a table (fast, non-blocking)
ANALYZE orders;
-- Update all tables in the database
ANALYZE;
-- Check when statistics were last collected
SELECT relname, last_analyze, last_autoanalyze, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'orders';
-- Increase statistics target for a column with non-uniform distribution
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;
-- Default is 100 (100 histogram buckets); raise for skewed data.
ANALYZE orders;
Postgres runs autovacuum which calls ANALYZE automatically, but it may
lag behind on high-churn tables.
Rule of thumb: if a query plan suddenly gets worse after a large data
load or bulk delete, run ANALYZE table_name immediately. For columns with
very skewed distributions (e.g., status with 99 % of rows as active),
raise the statistics target.
LIMIT N OFFSET M forces the database to scan and discard the first M rows
before returning N. As M grows, the query gets slower — page 1 000 of 50
results means scanning 50 000 rows.
-- Slow: O(offset) scan for every page
SELECT * FROM posts ORDER BY created_at DESC LIMIT 50 OFFSET 10000;
-- Fast: keyset (cursor) pagination — O(log n) via index
-- Page 1:
SELECT id, title, created_at FROM posts
ORDER BY created_at DESC, id DESC
LIMIT 50;
-- → last row: created_at = '2026-06-01 12:00:00', id = 9834
-- Next page: use the last row's values as the cursor
SELECT id, title, created_at FROM posts
WHERE (created_at, id) < ('2026-06-01 12:00:00', 9834)
ORDER BY created_at DESC, id DESC
LIMIT 50;
-- Index on (created_at DESC, id DESC) makes this O(log n)
Rule of thumb: use keyset (cursor) pagination for any list that can
grow large. Reserve OFFSET only for small datasets (< 10 000 rows) or
where jumping to arbitrary page numbers is a hard requirement.
In most modern databases (Postgres, MySQL 8+, SQL Server), the optimizer can rewrite correlated subqueries as joins automatically. However, correlated subqueries that reference the outer query run once per outer row — O(N) inner scans — and may not be rewritten:
-- Correlated subquery: potentially O(N) inner scans
SELECT id, total,
(SELECT name FROM customers WHERE id = o.customer_id) AS customer_name
FROM orders o;
-- Equivalent JOIN: one pass, one lookup
SELECT o.id, o.total, c.name AS customer_name
FROM orders o
JOIN customers c ON c.id = o.customer_id;
-- Check with EXPLAIN: if you see "SubPlan" or "Nested Loop" with inner
-- rows = outer rows, the subquery is not being optimised away.
Non-correlated subqueries in IN (SELECT …) are usually optimised to a
semi-join and are equivalent in performance to a JOIN.
Rule of thumb: if EXPLAIN shows a SubPlan node repeating for every
outer row, rewrite it as a JOIN or a lateral join. For EXISTS / IN
checks, the optimizer almost always handles them correctly on its own.
An OR across different columns often prevents the optimizer from using a
single index efficiently because no single index covers both branches.
-- This may cause a Seq Scan even if both columns are indexed separately
SELECT * FROM users WHERE email = 'alice@example.com' OR phone = '555-1234';
-- Fix 1: UNION ALL (each branch uses its own index)
SELECT * FROM users WHERE email = 'alice@example.com'
UNION ALL
SELECT * FROM users WHERE phone = '555-1234' AND email <> 'alice@example.com';
-- Fix 2: Postgres bitmap index scan (auto-handles OR over different indexes)
-- Postgres may already do this — check EXPLAIN for "BitmapAnd"/"BitmapOr"
-- Fix 3: denormalise into a single search column / use full-text search
Rule of thumb: use EXPLAIN to check whether an OR query is doing
a full scan. If so, split it into a UNION ALL so each branch can exploit
its own index independently.
SELECT * fetches every column from the table, including large TEXT,
BYTEA, or JSONB columns that the query may not need. This increases
network transfer, memory use, and makes covering-index optimisations
impossible.
-- BAD: fetches all 40 columns including a 1 MB blob column
SELECT * FROM products WHERE category_id = 5;
-- GOOD: only the columns the caller actually needs
SELECT id, name, price, stock FROM products WHERE category_id = 5;
Additional reasons to avoid SELECT *:
- Adding a column to the table silently changes what the query returns, breaking application code that expects a fixed schema.
- Prevents the planner from choosing an index-only scan.
- Makes query intent unclear to future readers.
Rule of thumb: always list columns explicitly in production queries.
SELECT * is fine for ad-hoc exploration but should never appear in
application code or stored procedures.
In Postgres pre-12, CTEs were optimisation fences — the planner materialised (executed and stored) the CTE result before running the outer query, preventing predicates from being pushed inside. This could cause full scans on the CTE that a plain subquery would have avoided.
-- Postgres < 12: this CTE is materialised; the WHERE id = 42 is applied
-- AFTER the full scan of orders inside the CTE
WITH recent AS (
SELECT * FROM orders WHERE created_at > now() - INTERVAL '30 days'
)
SELECT * FROM recent WHERE id = 42;
-- Postgres 12+: CTEs are inlined by default (no longer an optimisation fence)
-- Force materialisation when you WANT the fence (e.g., to prevent repeated execution):
WITH recent AS MATERIALIZED (
SELECT * FROM orders WHERE created_at > now() - INTERVAL '30 days'
)
SELECT * FROM recent WHERE id = 42;
MySQL and SQL Server have always inlined non-recursive CTEs.
Rule of thumb: on Postgres 12+, CTEs behave like subqueries and are not
a performance concern. On older Postgres, replace CTEs with subqueries in
the FROM clause if EXPLAIN shows the CTE is preventing index use.
In Postgres's MVCC model, UPDATE and DELETE do not overwrite rows —
they mark old row versions as dead and write new versions. Dead tuples
accumulate until VACUUM reclaims their space. Without regular vacuuming,
the table grows (bloat), sequential scans slow down, and indexes carry dead
entries.
-- Check dead tuple accumulation
SELECT relname, n_live_tup, n_dead_tup,
round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1)
AS dead_pct,
last_vacuum, last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;
-- Manual vacuum (reclaims space for reuse; does not shrink the file)
VACUUM orders;
-- Full vacuum (reclaims and shrinks the file; locks the table)
VACUUM FULL orders;
-- Rebuild indexes alongside
VACUUM (ANALYZE, VERBOSE) orders;
Rule of thumb: rely on autovacuum for routine maintenance. Run
VACUUM ANALYZE manually after large bulk deletes or updates. Only use
VACUUM FULL on heavily bloated tables during maintenance windows — it
acquires an exclusive lock.
Long-running queries can exhaust connection pools, hold locks, and degrade the whole database. Most databases allow a maximum query duration:
-- Postgres: statement timeout (raises error if exceeded)
SET statement_timeout = '5s'; -- session level
SET LOCAL statement_timeout = '2s'; -- transaction level only
-- Postgres: lock wait timeout (fail fast rather than queue behind a blocker)
SET lock_timeout = '500ms';
-- MySQL: per-query timeout (optimizer hint)
SELECT /*+ MAX_EXECUTION_TIME(3000) */ * FROM orders WHERE customer_id = 42;
-- SQL Server: per-connection timeout (set by client driver)
-- In T-SQL:
SET QUERY_GOVERNOR_COST_LIMIT 1000; -- abort if estimated cost > 1000
In application code, always set a reasonable timeout at the connection or query level — never leave it at infinity (the default).
Rule of thumb: set statement_timeout to a value appropriate for the
context: 5–30 s for OLTP API queries; longer for batch jobs. Use
lock_timeout separately to fail fast on lock contention rather than
queuing indefinitely.
-- Postgres: pg_stat_statements (requires extension) — top slow queries
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
SELECT query,
calls,
round(total_exec_time::numeric / calls, 2) AS avg_ms,
round(total_exec_time::numeric, 0) AS total_ms,
rows / calls AS avg_rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- Postgres: pg_stat_user_tables — tables with heavy sequential scans
SELECT relname, seq_scan, seq_tup_read,
idx_scan,
round(seq_scan::numeric / NULLIF(seq_scan + idx_scan, 0) * 100, 1) AS seq_pct
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 10;
-- MySQL: slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- log queries > 1 second
Rule of thumb: enable pg_stat_statements (Postgres) or the slow
query log (MySQL/SQL Server) in production from day one. Review the top-10
queries by total time weekly — optimising one heavily-called query often
has more impact than tuning ten rarely-run ones.
Partition pruning is the optimizer's ability to skip entire table
partitions that cannot contain rows matching the query's WHERE clause.
Instead of scanning all partitions, it reads only the ones that could
have relevant data.
-- Partitioned table (Postgres)
CREATE TABLE events (
id BIGINT NOT NULL,
created_at DATE NOT NULL,
payload JSONB
) PARTITION BY RANGE (created_at);
CREATE TABLE events_2026_q1 PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');
CREATE TABLE events_2026_q2 PARTITION OF events
FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');
-- Query: optimizer prunes events_2026_q1 — only scans events_2026_q2
EXPLAIN SELECT * FROM events WHERE created_at >= '2026-04-01';
-- Plan shows: Seq Scan on events_2026_q2 (events_2026_q1 not mentioned)
Pruning only works when the WHERE clause filters on the partition key
with a constant (not a function call or a join column).
Rule of thumb: for pruning to work, the WHERE predicate on the
partition key must be a literal or a parameter — not a function like
DATE_TRUNC(...). Check EXPLAIN to confirm partitions are being pruned.
An Index Only Scan reads all needed data directly from the index without touching the main table (heap). It is the fastest read path — no random heap I/O at all.
For an Index Only Scan to be chosen:
- All columns in
SELECT,WHERE, andORDER BYmust be in the index. - The table's visibility map must show that pages are all-visible (recently vacuumed). If many pages are not all-visible, Postgres falls back to heap fetches.
-- Query: SELECT email FROM users WHERE created_at > '2026-01-01'
-- Index needed: (created_at) INCLUDE (email)
CREATE INDEX idx_users_created_email
ON users (created_at DESC)
INCLUDE (email);
EXPLAIN (ANALYZE, BUFFERS)
SELECT email FROM users WHERE created_at > '2026-01-01';
-- → Index Only Scan using idx_users_created_email (Heap Fetches: 0)
-- If Heap Fetches > 0, run VACUUM to update the visibility map:
VACUUM users;
Rule of thumb: convert an Index Scan to an Index Only Scan by
adding the SELECTed columns to the index via INCLUDE. Then ensure the
table is regularly vacuumed so the visibility map stays current.
More Indexes & Performance interview questions
More ways to practice
The self-quiz is live. Get notified when mock interviews and new question packs drop.