An index is a separate data structure (usually a B-tree) that the database maintains alongside a table. It stores copies of one or more column values in sorted order together with pointers to the full row, allowing the engine to locate matching rows in O(log n) time instead of scanning every row (O(n)).
-- Without an index: full table scan — reads every row
SELECT * FROM orders WHERE customer_id = 42;
-- After adding an index: index seek — reads only the matching branch
CREATE INDEX idx_orders_customer ON orders (customer_id);
SELECT * FROM orders WHERE customer_id = 42;
-- Execution plan changes from Seq Scan to Index Scan
The trade-off: indexes consume disk space and must be updated on every
INSERT, UPDATE, or DELETE on the indexed columns — adding write
overhead.
Rule of thumb: add an index on any column that appears frequently in
WHERE, JOIN ON, or ORDER BY clauses of slow queries. Verify with
EXPLAIN before and after.
A B-tree (Balanced-tree) index is the default index type in all major databases. It keeps values in sorted order across a balanced tree of pages, making it efficient for equality, range, and sort operations.
Supported query types:
- Equality:
WHERE col = value - Range:
WHERE col > value,WHERE col BETWEEN a AND b - Prefix:
WHERE col LIKE 'abc%'(but not'%abc') - Sorting:
ORDER BY col(the optimizer may use the index to avoid a sort) - Prefix of a composite index:
WHERE (a, b)when index is on(a, b, c)
-- B-tree index covers all of these:
CREATE INDEX idx_users_email ON users (email);
SELECT * FROM users WHERE email = 'alice@example.com'; -- equality
SELECT * FROM users WHERE email > 'm@example.com'; -- range
SELECT * FROM users WHERE email LIKE 'alice%'; -- prefix
SELECT * FROM users ORDER BY email LIMIT 10; -- sort
Rule of thumb: always start with a B-tree index. Only reach for specialised types (hash, GIN, GiST) when B-tree cannot satisfy the access pattern (e.g., full-text search, containment on arrays/JSON).
A composite index covers two or more columns. The database sorts rows by the first column, then by the second within each first-column group, and so on. The optimizer can only use the index starting from the leftmost column — this is the left-prefix rule.
CREATE INDEX idx_orders_status_date ON orders (status, created_at);
-- Uses the index (status is the leftmost column)
SELECT * FROM orders WHERE status = 'pending';
-- Uses the index (both columns used, in order)
SELECT * FROM orders WHERE status = 'pending' AND created_at > '2026-01-01';
-- CANNOT use the index (skips the first column)
SELECT * FROM orders WHERE created_at > '2026-01-01';
-- → falls back to Seq Scan
Column order in the index matters: put the column used in equality filters first, then range-filtered columns, then sort columns.
Rule of thumb: design composite indexes as (equality_cols, range_col, sort_col). The most selective equality column goes first. A query that
skips the leftmost column cannot use the index.
A covering index contains all columns a query needs — the database can answer the query entirely from the index without touching the main table (the "heap"). This eliminates the extra I/O of the table lookup (also called a "heap fetch" or "bookmark lookup").
-- Query needs id, status, total — all three must be in the index
CREATE INDEX idx_orders_covering
ON orders (customer_id)
INCLUDE (status, total); -- Postgres 11+ / SQL Server: INCLUDE clause
SELECT id, status, total
FROM orders
WHERE customer_id = 42;
-- Execution plan: Index Only Scan (no heap access)
In MySQL, covering indexes work without an INCLUDE clause — all columns
in the SELECT list just need to be part of the index definition.
Rule of thumb: when EXPLAIN shows an Index Scan on a high-traffic
query, check whether adding the SELECTed columns to the index via
INCLUDE can convert it to an Index Only Scan.
A partial index is an index built on a subset of rows — those satisfying
a WHERE clause in the index definition. It is smaller, faster to update,
and more selective than a full-column index.
-- Only index pending orders (the rows that are actually queried)
CREATE INDEX idx_orders_pending
ON orders (created_at)
WHERE status = 'pending';
-- This query uses the partial index (matches the WHERE condition)
SELECT * FROM orders WHERE status = 'pending' AND created_at < now() - INTERVAL '1 day';
-- This query cannot use the partial index (status ≠ 'pending')
SELECT * FROM orders WHERE status = 'shipped' AND created_at < now() - INTERVAL '1 day';
Also valid for partial unique indexes (see the constraints topic).
Rule of thumb: use partial indexes when a large table has a small "hot" subset that most queries filter on (e.g., active records, unprocessed jobs, non-deleted rows). The index shrinks dramatically and fits in cache more easily.
A hash index maps each column value to a hash bucket, giving O(1) average-case lookup for equality-only queries. It cannot support range queries, sorting, or prefix matches.
-- Postgres: explicit hash index
CREATE INDEX idx_sessions_token ON sessions USING HASH (token);
-- Useful for: WHERE token = 'abc123' (pure equality)
-- Useless for: WHERE token > 'abc123' (range)
-- Useless for: ORDER BY token (sort)
In older Postgres versions (< 10), hash indexes were not WAL-logged and were lost on crash. Since Postgres 10, they are crash-safe. MySQL and SQL Server do not offer explicit hash indexes on disk (MySQL Memory engine does).
Rule of thumb: prefer B-tree in almost all cases — it handles equality too and adds range/sort support for free. Only consider a hash index when profiling shows that a very high-throughput equality-only lookup would measurably benefit from the marginal O(1) vs O(log n) difference.
GIN (Generalized Inverted Index) is optimised for columns that contain
multiple values per row — arrays, JSONB, tsvector (full-text search).
It maps each individual element (word, key, array item) to the set of rows
containing it.
-- Full-text search index
CREATE INDEX idx_articles_fts ON articles USING GIN (to_tsvector('english', body));
SELECT * FROM articles WHERE to_tsvector('english', body) @@ to_tsquery('postgres & index');
-- JSONB containment index
CREATE INDEX idx_events_payload ON events USING GIN (payload);
SELECT * FROM events WHERE payload @> '{"type": "click"}';
-- Array containment index
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);
SELECT * FROM posts WHERE tags @> ARRAY['sql', 'performance'];
GIN indexes are large and slow to build but very fast for containment
queries (@>, @@, ?).
Rule of thumb: use GIN for full-text search and JSONB/array containment
queries. Use a regular B-tree for simple = or range filters on JSONB
extracted scalar values ((payload->>'user_id')::int).
Index bloat occurs when an index grows much larger than the live data it
covers, usually because DELETE and UPDATE operations leave dead index
entries that accumulate faster than VACUUM can reclaim them.
Symptoms: index scans slow down, the index file on disk is much larger than
expected, and VACUUM verbose output shows many dead tuples.
-- Postgres: check index bloat (pgstattuple extension)
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
round(leaf_fragmentation::numeric, 2) AS fragmentation_pct
FROM pgstattuple_approx('idx_orders_customer') t,
pg_indexes WHERE indexname = 'idx_orders_customer';
-- Fix: rebuild the index (locks table — use CONCURRENTLY for large tables)
REINDEX INDEX idx_orders_customer;
-- Non-blocking rebuild (Postgres 12+)
REINDEX INDEX CONCURRENTLY idx_orders_customer;
Rule of thumb: monitor index sizes relative to table sizes. If an index
is more than 2–3× the expected size, run REINDEX CONCURRENTLY. Tune
autovacuum to run more aggressively on high-churn tables.
Indexes are not free — they slow down writes and consume disk space. Avoid adding an index when:
- The table is tiny — a full scan of a 500-row table is faster than an index lookup because the whole table fits in one or two pages.
- The column has very low selectivity — an index on a
statuscolumn with only two values (active/inactive) where 90 % of rows areactivegives the optimizer no benefit forWHERE status = 'active'. - The table is write-heavy — a table with hundreds of inserts/second pays a high cost to maintain indexes. Batch-load tables often drop indexes before the load and rebuild them after.
- The column is rarely queried — unused indexes consume space and slow every write without ever benefiting a read.
-- Postgres: find unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE '%pkey%' -- skip PKs
ORDER BY pg_relation_size(indexrelid) DESC;
Rule of thumb: only add an index when you can show — via EXPLAIN ANALYZE — that it is used and that it reduces query time. Drop unused
indexes; they are pure overhead.
The query optimizer uses cost-based planning to choose between a sequential scan and an index scan. It chooses sequential scan when:
- Many rows match — if 30 %+ of table rows match the
WHEREclause, reading the table sequentially in disk order is cheaper than the random I/O of following index pointers row-by-row. - Table statistics are stale — if
ANALYZEhas not run recently, the planner may underestimate or overestimate selectivity. - Small table — the whole table fits in a few pages; sequential I/O is faster.
- Cost configuration —
random_page_cost(Postgres) affects the relative cost of index vs sequential I/O. Lowering it (e.g. to 1.1 for SSDs) makes index scans more attractive.
-- Check the plan and actual vs estimated rows
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE status = 'pending';
-- If 'Rows Removed by Filter' is huge → Seq Scan is correct
-- If 'Rows Removed by Filter' is small → missing index or stale stats
-- Update statistics
ANALYZE orders;
Rule of thumb: trust the optimizer; a sequential scan on 40 % of rows
IS faster than an index scan. If a plan looks wrong, check statistics with
ANALYZE before forcing an index with a hint.
A functional index (expression index) indexes the result of a
function or expression applied to a column rather than the raw column value.
This allows the optimizer to use the index when the same expression appears
in a WHERE clause.
-- Without expression index: full scan (function applied to every row)
SELECT * FROM users WHERE lower(email) = 'alice@example.com';
-- Create an index on the expression
CREATE INDEX idx_users_email_lower ON users (lower(email));
-- Now this uses the index
SELECT * FROM users WHERE lower(email) = 'alice@example.com';
-- Also useful for JSON extraction
CREATE INDEX idx_events_user ON events ((payload->>'user_id'));
SELECT * FROM events WHERE payload->>'user_id' = '42';
The expression in the WHERE clause must match the expression in the index
exactly for the planner to use it.
Rule of thumb: create a functional index whenever a WHERE clause
applies a deterministic function to a column (case-insensitive email,
JSON field extraction, date truncation). Run ANALYZE after creating it
so the planner sees up-to-date statistics.
-- Postgres: find unused indexes (not scanned since last stats reset)
SELECT schemaname, tablename, indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size,
idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
-- Postgres: find duplicate indexes (same columns, same table)
SELECT indrelid::regclass AS table,
array_agg(indexrelid::regclass) AS duplicate_indexes
FROM pg_index
GROUP BY indrelid, indkey
HAVING COUNT(*) > 1;
-- Drop a redundant index (non-blocking in Postgres)
DROP INDEX CONCURRENTLY idx_orders_old_customer;
An index is redundant when another index on the same table starts with the
same column(s). For example, an index on (customer_id) is made redundant
by a composite index on (customer_id, created_at) for equality lookups.
Rule of thumb: audit indexes quarterly. Drop unused ones — they slow writes and mislead developers into thinking a column is important for lookups. Keep an index removal in a migration script so it can be re-added if monitoring reveals it is needed.
- Clustered index: the table data is physically stored in the order of
the index. There can be only one per table. In SQL Server and MySQL
InnoDB, the primary key is always the clustered index. In Postgres, there
is no automatic clustering, but
CLUSTER table USING indexphysically reorders the table once (not maintained dynamically). - Non-clustered index: a separate structure that stores the indexed values and pointers (row IDs / PKs) back to the heap. Multiple non- clustered indexes can exist per table.
-- SQL Server: clustered index (the PK is clustered by default)
CREATE TABLE orders (
id INT PRIMARY KEY CLUSTERED, -- data pages sorted by id
customer_id INT NOT NULL
);
-- Non-clustered index
CREATE NONCLUSTERED INDEX idx_orders_customer ON orders (customer_id);
-- Postgres: one-time physical sort (does NOT stay sorted after future writes)
CLUSTER orders USING idx_orders_customer_date;
Rule of thumb: in SQL Server and MySQL, choose the clustered index (usually the PK) carefully — sequential integer PKs cause minimal page splits. Random UUIDs as clustered keys cause fragmentation and slow inserts.
Yes, in almost all cases. Foreign key columns are used in JOIN ON
conditions and in cascade operations (ON DELETE CASCADE). Without an index,
both joins and FK enforcement scans become full table scans.
-- Child table without an index on the FK column:
-- DELETE FROM customers WHERE id = 1
-- → DB must scan ALL order rows to find children — O(n)
-- With an index on the FK column:
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
-- DELETE FROM customers WHERE id = 1
-- → Index lookup to find children — O(log n)
Postgres does NOT automatically create an index on FK columns (unlike the PK side). MySQL InnoDB DOES create one automatically. SQL Server does not.
Rule of thumb: after declaring a REFERENCES constraint in Postgres or
SQL Server, immediately add a CREATE INDEX on the child's FK column unless
the child table is very small or the FK is never used in queries.
| Type | Best for | Notes |
|---|---|---|
BTREE |
equality, range, sort, prefix LIKE | Default; handles 95 % of use cases |
HASH |
equality only | Marginally faster than B-tree for pure equality; no range |
GIN |
arrays, JSONB containment, full-text | Large index; slow build; fast @> / @@ |
GiST |
geometry, ranges, nearest-neighbour | PostGIS spatial; && / <-> operators |
BRIN |
huge tables with natural physical ordering | Very small index; only useful for append-only tables (time-series) |
SP-GiST |
non-balanced partitioned structures | Quad-trees, radix trees; niche use |
-- BRIN: tiny index on a massive append-only events table
-- (works because newer rows have higher timestamps and are stored later on disk)
CREATE INDEX idx_events_brin ON events USING BRIN (created_at);
Rule of thumb: use BTREE by default. Use GIN for arrays/JSONB/FTS.
Use BRIN only on truly append-only tables (logs, IoT data) where the
indexed column correlates with physical storage order — otherwise it will
not be used.
More Indexes & Performance interview questions
More ways to practice
The self-quiz is live. Get notified when mock interviews and new question packs drop.