Skip to content

Indexes Interview Questions & Answers

15 questions Updated 2026-06-20 Share:

SQL indexes interview questions — B-tree, hash, GIN, GiST, partial and composite indexes, covering indexes, index bloat, VACUUM, and when not to index across Postgres, MySQL, and SQL Server.

Read the in-depth guideSQL Indexes — B-tree, Composite, Partial, and Covering Indexes(opens in new tab)
15 of 15

An index is a separate data structure (usually a B-tree) that the database maintains alongside a table. It stores copies of one or more column values in sorted order together with pointers to the full row, allowing the engine to locate matching rows in O(log n) time instead of scanning every row (O(n)).

-- Without an index: full table scan — reads every row
SELECT * FROM orders WHERE customer_id = 42;

-- After adding an index: index seek — reads only the matching branch
CREATE INDEX idx_orders_customer ON orders (customer_id);
SELECT * FROM orders WHERE customer_id = 42;
-- Execution plan changes from Seq Scan to Index Scan

The trade-off: indexes consume disk space and must be updated on every INSERT, UPDATE, or DELETE on the indexed columns — adding write overhead.

Rule of thumb: add an index on any column that appears frequently in WHERE, JOIN ON, or ORDER BY clauses of slow queries. Verify with EXPLAIN before and after.

A B-tree (Balanced-tree) index is the default index type in all major databases. It keeps values in sorted order across a balanced tree of pages, making it efficient for equality, range, and sort operations.

Supported query types:

  • Equality: WHERE col = value
  • Range: WHERE col > value, WHERE col BETWEEN a AND b
  • Prefix: WHERE col LIKE 'abc%' (but not '%abc')
  • Sorting: ORDER BY col (the optimizer may use the index to avoid a sort)
  • Prefix of a composite index: WHERE (a, b) when index is on (a, b, c)
-- B-tree index covers all of these:
CREATE INDEX idx_users_email ON users (email);

SELECT * FROM users WHERE email = 'alice@example.com';       -- equality
SELECT * FROM users WHERE email > 'm@example.com';           -- range
SELECT * FROM users WHERE email LIKE 'alice%';               -- prefix
SELECT * FROM users ORDER BY email LIMIT 10;                 -- sort

Rule of thumb: always start with a B-tree index. Only reach for specialised types (hash, GIN, GiST) when B-tree cannot satisfy the access pattern (e.g., full-text search, containment on arrays/JSON).

A composite index covers two or more columns. The database sorts rows by the first column, then by the second within each first-column group, and so on. The optimizer can only use the index starting from the leftmost column — this is the left-prefix rule.

CREATE INDEX idx_orders_status_date ON orders (status, created_at);

-- Uses the index (status is the leftmost column)
SELECT * FROM orders WHERE status = 'pending';

-- Uses the index (both columns used, in order)
SELECT * FROM orders WHERE status = 'pending' AND created_at > '2026-01-01';

-- CANNOT use the index (skips the first column)
SELECT * FROM orders WHERE created_at > '2026-01-01';
-- → falls back to Seq Scan

Column order in the index matters: put the column used in equality filters first, then range-filtered columns, then sort columns.

Rule of thumb: design composite indexes as (equality_cols, range_col, sort_col). The most selective equality column goes first. A query that skips the leftmost column cannot use the index.

A covering index contains all columns a query needs — the database can answer the query entirely from the index without touching the main table (the "heap"). This eliminates the extra I/O of the table lookup (also called a "heap fetch" or "bookmark lookup").

-- Query needs id, status, total — all three must be in the index
CREATE INDEX idx_orders_covering
  ON orders (customer_id)
  INCLUDE (status, total);   -- Postgres 11+ / SQL Server: INCLUDE clause

SELECT id, status, total
FROM   orders
WHERE  customer_id = 42;
-- Execution plan: Index Only Scan (no heap access)

In MySQL, covering indexes work without an INCLUDE clause — all columns in the SELECT list just need to be part of the index definition.

Rule of thumb: when EXPLAIN shows an Index Scan on a high-traffic query, check whether adding the SELECTed columns to the index via INCLUDE can convert it to an Index Only Scan.

A partial index is an index built on a subset of rows — those satisfying a WHERE clause in the index definition. It is smaller, faster to update, and more selective than a full-column index.

-- Only index pending orders (the rows that are actually queried)
CREATE INDEX idx_orders_pending
  ON orders (created_at)
  WHERE status = 'pending';

-- This query uses the partial index (matches the WHERE condition)
SELECT * FROM orders WHERE status = 'pending' AND created_at < now() - INTERVAL '1 day';

-- This query cannot use the partial index (status ≠ 'pending')
SELECT * FROM orders WHERE status = 'shipped' AND created_at < now() - INTERVAL '1 day';

Also valid for partial unique indexes (see the constraints topic).

Rule of thumb: use partial indexes when a large table has a small "hot" subset that most queries filter on (e.g., active records, unprocessed jobs, non-deleted rows). The index shrinks dramatically and fits in cache more easily.

A hash index maps each column value to a hash bucket, giving O(1) average-case lookup for equality-only queries. It cannot support range queries, sorting, or prefix matches.

-- Postgres: explicit hash index
CREATE INDEX idx_sessions_token ON sessions USING HASH (token);

-- Useful for: WHERE token = 'abc123'  (pure equality)
-- Useless for: WHERE token > 'abc123'  (range)
-- Useless for: ORDER BY token          (sort)

In older Postgres versions (< 10), hash indexes were not WAL-logged and were lost on crash. Since Postgres 10, they are crash-safe. MySQL and SQL Server do not offer explicit hash indexes on disk (MySQL Memory engine does).

Rule of thumb: prefer B-tree in almost all cases — it handles equality too and adds range/sort support for free. Only consider a hash index when profiling shows that a very high-throughput equality-only lookup would measurably benefit from the marginal O(1) vs O(log n) difference.

GIN (Generalized Inverted Index) is optimised for columns that contain multiple values per row — arrays, JSONB, tsvector (full-text search). It maps each individual element (word, key, array item) to the set of rows containing it.

-- Full-text search index
CREATE INDEX idx_articles_fts ON articles USING GIN (to_tsvector('english', body));
SELECT * FROM articles WHERE to_tsvector('english', body) @@ to_tsquery('postgres & index');

-- JSONB containment index
CREATE INDEX idx_events_payload ON events USING GIN (payload);
SELECT * FROM events WHERE payload @> '{"type": "click"}';

-- Array containment index
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);
SELECT * FROM posts WHERE tags @> ARRAY['sql', 'performance'];

GIN indexes are large and slow to build but very fast for containment queries (@>, @@, ?).

Rule of thumb: use GIN for full-text search and JSONB/array containment queries. Use a regular B-tree for simple = or range filters on JSONB extracted scalar values ((payload->>'user_id')::int).

Index bloat occurs when an index grows much larger than the live data it covers, usually because DELETE and UPDATE operations leave dead index entries that accumulate faster than VACUUM can reclaim them.

Symptoms: index scans slow down, the index file on disk is much larger than expected, and VACUUM verbose output shows many dead tuples.

-- Postgres: check index bloat (pgstattuple extension)
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT index_name,
       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
       round(leaf_fragmentation::numeric, 2)       AS fragmentation_pct
FROM   pgstattuple_approx('idx_orders_customer') t,
       pg_indexes WHERE indexname = 'idx_orders_customer';

-- Fix: rebuild the index (locks table — use CONCURRENTLY for large tables)
REINDEX INDEX idx_orders_customer;

-- Non-blocking rebuild (Postgres 12+)
REINDEX INDEX CONCURRENTLY idx_orders_customer;

Rule of thumb: monitor index sizes relative to table sizes. If an index is more than 2–3× the expected size, run REINDEX CONCURRENTLY. Tune autovacuum to run more aggressively on high-churn tables.

Indexes are not free — they slow down writes and consume disk space. Avoid adding an index when:

  1. The table is tiny — a full scan of a 500-row table is faster than an index lookup because the whole table fits in one or two pages.
  2. The column has very low selectivity — an index on a status column with only two values (active/inactive) where 90 % of rows are active gives the optimizer no benefit for WHERE status = 'active'.
  3. The table is write-heavy — a table with hundreds of inserts/second pays a high cost to maintain indexes. Batch-load tables often drop indexes before the load and rebuild them after.
  4. The column is rarely queried — unused indexes consume space and slow every write without ever benefiting a read.
-- Postgres: find unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM   pg_stat_user_indexes
WHERE  idx_scan = 0
  AND  indexname NOT LIKE '%pkey%'   -- skip PKs
ORDER  BY pg_relation_size(indexrelid) DESC;

Rule of thumb: only add an index when you can show — via EXPLAIN ANALYZE — that it is used and that it reduces query time. Drop unused indexes; they are pure overhead.

The query optimizer uses cost-based planning to choose between a sequential scan and an index scan. It chooses sequential scan when:

  1. Many rows match — if 30 %+ of table rows match the WHERE clause, reading the table sequentially in disk order is cheaper than the random I/O of following index pointers row-by-row.
  2. Table statistics are stale — if ANALYZE has not run recently, the planner may underestimate or overestimate selectivity.
  3. Small table — the whole table fits in a few pages; sequential I/O is faster.
  4. Cost configurationrandom_page_cost (Postgres) affects the relative cost of index vs sequential I/O. Lowering it (e.g. to 1.1 for SSDs) makes index scans more attractive.
-- Check the plan and actual vs estimated rows
EXPLAIN (ANALYZE, BUFFERS)
  SELECT * FROM orders WHERE status = 'pending';
-- If 'Rows Removed by Filter' is huge → Seq Scan is correct
-- If 'Rows Removed by Filter' is small → missing index or stale stats

-- Update statistics
ANALYZE orders;

Rule of thumb: trust the optimizer; a sequential scan on 40 % of rows IS faster than an index scan. If a plan looks wrong, check statistics with ANALYZE before forcing an index with a hint.

A functional index (expression index) indexes the result of a function or expression applied to a column rather than the raw column value. This allows the optimizer to use the index when the same expression appears in a WHERE clause.

-- Without expression index: full scan (function applied to every row)
SELECT * FROM users WHERE lower(email) = 'alice@example.com';

-- Create an index on the expression
CREATE INDEX idx_users_email_lower ON users (lower(email));

-- Now this uses the index
SELECT * FROM users WHERE lower(email) = 'alice@example.com';

-- Also useful for JSON extraction
CREATE INDEX idx_events_user ON events ((payload->>'user_id'));
SELECT * FROM events WHERE payload->>'user_id' = '42';

The expression in the WHERE clause must match the expression in the index exactly for the planner to use it.

Rule of thumb: create a functional index whenever a WHERE clause applies a deterministic function to a column (case-insensitive email, JSON field extraction, date truncation). Run ANALYZE after creating it so the planner sees up-to-date statistics.

-- Postgres: find unused indexes (not scanned since last stats reset)
SELECT schemaname, tablename, indexname,
       pg_size_pretty(pg_relation_size(indexrelid)) AS size,
       idx_scan
FROM   pg_stat_user_indexes
WHERE  idx_scan = 0
ORDER  BY pg_relation_size(indexrelid) DESC;

-- Postgres: find duplicate indexes (same columns, same table)
SELECT indrelid::regclass AS table,
       array_agg(indexrelid::regclass) AS duplicate_indexes
FROM   pg_index
GROUP  BY indrelid, indkey
HAVING COUNT(*) > 1;

-- Drop a redundant index (non-blocking in Postgres)
DROP INDEX CONCURRENTLY idx_orders_old_customer;

An index is redundant when another index on the same table starts with the same column(s). For example, an index on (customer_id) is made redundant by a composite index on (customer_id, created_at) for equality lookups.

Rule of thumb: audit indexes quarterly. Drop unused ones — they slow writes and mislead developers into thinking a column is important for lookups. Keep an index removal in a migration script so it can be re-added if monitoring reveals it is needed.

  • Clustered index: the table data is physically stored in the order of the index. There can be only one per table. In SQL Server and MySQL InnoDB, the primary key is always the clustered index. In Postgres, there is no automatic clustering, but CLUSTER table USING index physically reorders the table once (not maintained dynamically).
  • Non-clustered index: a separate structure that stores the indexed values and pointers (row IDs / PKs) back to the heap. Multiple non- clustered indexes can exist per table.
-- SQL Server: clustered index (the PK is clustered by default)
CREATE TABLE orders (
  id INT PRIMARY KEY CLUSTERED,    -- data pages sorted by id
  customer_id INT NOT NULL
);

-- Non-clustered index
CREATE NONCLUSTERED INDEX idx_orders_customer ON orders (customer_id);

-- Postgres: one-time physical sort (does NOT stay sorted after future writes)
CLUSTER orders USING idx_orders_customer_date;

Rule of thumb: in SQL Server and MySQL, choose the clustered index (usually the PK) carefully — sequential integer PKs cause minimal page splits. Random UUIDs as clustered keys cause fragmentation and slow inserts.

Yes, in almost all cases. Foreign key columns are used in JOIN ON conditions and in cascade operations (ON DELETE CASCADE). Without an index, both joins and FK enforcement scans become full table scans.

-- Child table without an index on the FK column:
-- DELETE FROM customers WHERE id = 1
-- → DB must scan ALL order rows to find children — O(n)

-- With an index on the FK column:
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
-- DELETE FROM customers WHERE id = 1
-- → Index lookup to find children — O(log n)

Postgres does NOT automatically create an index on FK columns (unlike the PK side). MySQL InnoDB DOES create one automatically. SQL Server does not.

Rule of thumb: after declaring a REFERENCES constraint in Postgres or SQL Server, immediately add a CREATE INDEX on the child's FK column unless the child table is very small or the FK is never used in queries.

Type Best for Notes
BTREE equality, range, sort, prefix LIKE Default; handles 95 % of use cases
HASH equality only Marginally faster than B-tree for pure equality; no range
GIN arrays, JSONB containment, full-text Large index; slow build; fast @> / @@
GiST geometry, ranges, nearest-neighbour PostGIS spatial; && / <-> operators
BRIN huge tables with natural physical ordering Very small index; only useful for append-only tables (time-series)
SP-GiST non-balanced partitioned structures Quad-trees, radix trees; niche use
-- BRIN: tiny index on a massive append-only events table
-- (works because newer rows have higher timestamps and are stored later on disk)
CREATE INDEX idx_events_brin ON events USING BRIN (created_at);

Rule of thumb: use BTREE by default. Use GIN for arrays/JSONB/FTS. Use BRIN only on truly append-only tables (logs, IoT data) where the indexed column correlates with physical storage order — otherwise it will not be used.

More ways to practice

The self-quiz is live. Get notified when mock interviews and new question packs drop.

or
Join our WhatsApp Channel