Q: What is a GIN index and when do you use it in Postgres?

GIN (Generalized Inverted Index) is optimised for columns that contain multiple values per row — arrays, , (full-text search). It maps each individual element (word, key, array item) to the set of rows containing it. GIN indexes are large and slow to build but very fast for containment queries (, , ). Rule of thumb: use GIN for full-text search and JSONB/array containment queries. Use a regular B-tree for simple or range filters on JSONB extracted scalar values ().

Q: What is index bloat and how do you fix it?

Index bloat occurs when an index grows much larger than the live data it covers, usually because and operations leave dead index entries that accumulate faster than can reclaim them. Symptoms: index scans slow down, the index file on disk is much larger than expected, and verbose output shows many dead tuples. Rule of thumb: monitor index sizes relative to table sizes. If an index is more than 2–3× the expected size, run . Tune to run more aggressively on high-churn tables.

Q: When should you NOT add an index?

Indexes are not free — they slow down writes and consume disk space. Avoid adding an index when: 1. The table is tiny — a full scan of a 500-row table is faster than an index lookup because the whole table fits in one or two pages. 2. The column has very low selectivity — an index on a column with only two values (/) where 90 % of rows are gives the optimizer no benefit for . 3. The table is write-heavy — a table with hundreds of inserts/second pays a high cost to maintain indexes. Batch-load tables often drop indexes before the load and rebuild them after. 4. The column is rarely queried — unused indexes consume space and slow every write without ever benefiting a read. Rule of thumb: only add an index when you can show — via — that it is used and that it reduces query time. Drop unused indexes; they are pure overhead.

Q: When does the optimizer choose a sequential scan over an index scan?

The query optimizer uses cost-based planning to choose between a sequential scan and an index scan. It chooses sequential scan when: 1. Many rows match — if 30 %+ of table rows match the clause, reading the table sequentially in disk order is cheaper than the random I/O of following index pointers row-by-row. 2. Table statistics are stale — if has not run recently, the planner may underestimate or overestimate selectivity. 3. Small table — the whole table fits in a few pages; sequential I/O is faster. 4. Cost configuration — (Postgres) affects the relative cost of index vs sequential I/O. Lowering it (e.g. to 1.1 for SSDs) makes index scans more attractive. Rule of thumb: trust the optimizer; a sequential scan on 40 % of rows IS faster than an index scan. If a plan looks wrong, check statistics with before forcing an index with a hint.

Question 1

What is a database index and how does it speed up queries?

Accepted Answer

An index is a separate data structure (usually a B-tree) that the database maintains alongside a table. It stores copies of one or more column values in sorted order together with pointers to the full row, allowing the engine to locate matching rows in O(log n) time instead of scanning every row (O(n)). The trade-off: indexes consume disk space and must be updated on every , , or  on the indexed columns — adding write overhead. Rule of thumb: add an index on any column that appears frequently in , , or  clauses of slow queries. Verify with  before and after.

Question 2

What is a B-tree index and what kinds of queries does it support?

Accepted Answer

A B-tree (Balanced-tree) index is the default index type in all major databases. It keeps values in sorted order across a balanced tree of pages, making it efficient for equality, range, and sort operations. Supported query types: - Equality:  - Range: ,  - Prefix:  (but not ) - Sorting:  (the optimizer may use the index to avoid a sort) - Prefix of a composite index:  when index is on  Rule of thumb: always start with a B-tree index. Only reach for specialised types (hash, GIN, GiST) when B-tree cannot satisfy the access pattern (e.g., full-text search, containment on arrays/JSON).

Question 3

What is a composite index and what is the left-prefix rule?

Accepted Answer

A composite index covers two or more columns. The database sorts rows by the first column, then by the second within each first-column group, and so on. The optimizer can only use the index starting from the leftmost column — this is the left-prefix rule. Column order in the index matters: put the column used in equality filters first, then range-filtered columns, then sort columns. Rule of thumb: design composite indexes as . The most selective equality column goes first. A query that skips the leftmost column cannot use the index.

Question 4

What is a covering index and how does it eliminate table lookups?

Accepted Answer

A covering index contains all columns a query needs — the database can answer the query entirely from the index without touching the main table (the "heap"). This eliminates the extra I/O of the table lookup (also called a "heap fetch" or "bookmark lookup"). In MySQL, covering indexes work without an  clause — all columns in the  list just need to be part of the index definition. Rule of thumb: when  shows an  on a high-traffic query, check whether adding the ed columns to the index via  can convert it to an .

Question 5

What is a partial index and when does it help?

Accepted Answer

A partial index is an index built on a subset of rows — those satisfying a  clause in the index definition. It is smaller, faster to update, and more selective than a full-column index. Also valid for partial unique indexes (see the constraints topic). Rule of thumb: use partial indexes when a large table has a small "hot" subset that most queries filter on (e.g., active records, unprocessed jobs, non-deleted rows). The index shrinks dramatically and fits in cache more easily.

Question 6

When should you use a hash index instead of a B-tree?

Accepted Answer

A hash index maps each column value to a hash bucket, giving O(1) average-case lookup for equality-only queries. It cannot support range queries, sorting, or prefix matches. In older Postgres versions (< 10), hash indexes were not WAL-logged and were lost on crash. Since Postgres 10, they are crash-safe. MySQL and SQL Server do not offer explicit hash indexes on disk (MySQL Memory engine does). Rule of thumb: prefer B-tree in almost all cases — it handles equality too and adds range/sort support for free. Only consider a hash index when profiling shows that a very high-throughput equality-only lookup would measurably benefit from the marginal O(1) vs O(log n) difference.

Question 7

What is a GIN index and when do you use it in Postgres?

Accepted Answer

GIN (Generalized Inverted Index) is optimised for columns that contain multiple values per row — arrays, ,  (full-text search). It maps each individual element (word, key, array item) to the set of rows containing it. GIN indexes are large and slow to build but very fast for containment queries (, , ). Rule of thumb: use GIN for full-text search and JSONB/array containment queries. Use a regular B-tree for simple  or range filters on JSONB extracted scalar values ().

Question 8

What is index bloat and how do you fix it?

Accepted Answer

Index bloat occurs when an index grows much larger than the live data it covers, usually because  and  operations leave dead index entries that accumulate faster than  can reclaim them. Symptoms: index scans slow down, the index file on disk is much larger than expected, and  verbose output shows many dead tuples. Rule of thumb: monitor index sizes relative to table sizes. If an index is more than 2–3× the expected size, run . Tune  to run more aggressively on high-churn tables.

Question 9

When should you NOT add an index?

Accepted Answer

Indexes are not free — they slow down writes and consume disk space. Avoid adding an index when:

The table is tiny — a full scan of a 500-row table is faster than an index lookup because the whole table fits in one or two pages.
The column has very low selectivity — an index on a status column with only two values (active/inactive) where 90 % of rows are active gives the optimizer no benefit for WHERE status = 'active'.
The table is write-heavy — a table with hundreds of inserts/second pays a high cost to maintain indexes. Batch-load tables often drop indexes before the load and rebuild them after.
The column is rarely queried — unused indexes consume space and slow every write without ever benefiting a read.

-- Postgres: find unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM   pg_stat_user_indexes
WHERE  idx_scan = 0
  AND  indexname NOT LIKE '%pkey%'   -- skip PKs
ORDER  BY pg_relation_size(indexrelid) DESC;

Rule of thumb: only add an index when you can show — via EXPLAIN ANALYZE — that it is used and that it reduces query time. Drop unused indexes; they are pure overhead.

Question 10

When does the optimizer choose a sequential scan over an index scan?

Accepted Answer

The query optimizer uses cost-based planning to choose between a sequential scan and an index scan. It chooses sequential scan when: 1. Many rows match — if 30 %+ of table rows match the  clause,    reading the table sequentially in disk order is cheaper than the random    I/O of following index pointers row-by-row. 2. Table statistics are stale — if  has not run recently, the    planner may underestimate or overestimate selectivity. 3. Small table — the whole table fits in a few pages; sequential I/O    is faster. 4. Cost configuration —  (Postgres) affects the    relative cost of index vs sequential I/O. Lowering it (e.g. to 1.1 for    SSDs) makes index scans more attractive. Rule of thumb: trust the optimizer; a sequential scan on 40 % of rows IS faster than an index scan. If a plan looks wrong, check statistics with  before forcing an index with a hint.

Question 11

What is a functional (expression) index?

Accepted Answer

A functional index (expression index) indexes the result of a function or expression applied to a column rather than the raw column value. This allows the optimizer to use the index when the same expression appears in a  clause. The expression in the  clause must match the expression in the index exactly for the planner to use it. Rule of thumb: create a functional index whenever a  clause applies a deterministic function to a column (case-insensitive email, JSON field extraction, date truncation). Run  after creating it so the planner sees up-to-date statistics.

Question 12

How do you find and remove duplicate or unused indexes?

Accepted Answer

An index is redundant when another index on the same table starts with the same column(s). For example, an index on  is made redundant by a composite index on  for equality lookups. Rule of thumb: audit indexes quarterly. Drop unused ones — they slow writes and mislead developers into thinking a column is important for lookups. Keep an index removal in a migration script so it can be re-added if monitoring reveals it is needed.

Question 13

What is the difference between a clustered and a non-clustered index?

Accepted Answer

- Clustered index: the table data is physically stored in the order of   the index. There can be only one per table. In SQL Server and MySQL   InnoDB, the primary key is always the clustered index. In Postgres, there   is no automatic clustering, but  physically   reorders the table once (not maintained dynamically). - Non-clustered index: a separate structure that stores the indexed   values and pointers (row IDs / PKs) back to the heap. Multiple non-   clustered indexes can exist per table. Rule of thumb: in SQL Server and MySQL, choose the clustered index (usually the PK) carefully — sequential integer PKs cause minimal page splits. Random UUIDs as clustered keys cause fragmentation and slow inserts.

Question 14

Should you always index a foreign key column?

Accepted Answer

Yes, in almost all cases. Foreign key columns are used in  conditions and in cascade operations (). Without an index, both joins and FK enforcement scans become full table scans. Postgres does NOT automatically create an index on FK columns (unlike the PK side). MySQL InnoDB DOES create one automatically. SQL Server does not. Rule of thumb: after declaring a  constraint in Postgres or SQL Server, immediately add a  on the child's FK column unless the child table is very small or the FK is never used in queries.

Question 15

What index types are available in Postgres and when do you use each?

Accepted Answer

| Type | Best for | Notes | |---|---|---| |  | equality, range, sort, prefix LIKE | Default; handles 95 % of use cases | |  | equality only | Marginally faster than B-tree for pure equality; no range | |  | arrays, JSONB containment, full-text | Large index; slow build; fast  /  | |  | geometry, ranges, nearest-neighbour | PostGIS spatial;  /  operators | |  | huge tables with natural physical ordering | Very small index; only useful for append-only tables (time-series) | |  | non-balanced partitioned structures | Quad-trees, radix trees; niche use | Rule of thumb: use  by default. Use  for arrays/JSONB/FTS. Use  only on truly append-only tables (logs, IoT data) where the indexed column correlates with physical storage order — otherwise it will not be used.

Indexes Interview Questions & Answers

What is a database index and how does it speed up queries?

What is a B-tree index and what kinds of queries does it support?

What is a composite index and what is the left-prefix rule?

What is a covering index and how does it eliminate table lookups?

What is a partial index and when does it help?

When should you use a hash index instead of a B-tree?

What is a GIN index and when do you use it in Postgres?

What is index bloat and how do you fix it?

When should you NOT add an index?

When does the optimizer choose a sequential scan over an index scan?

What is a functional (expression) index?

How do you find and remove duplicate or unused indexes?

What is the difference between a clustered and a non-clustered index?

Should you always index a foreign key column?

What index types are available in Postgres and when do you use each?

More ways to practice

Type	Best for	Notes
`BTREE`	equality, range, sort, prefix LIKE	Default; handles 95 % of use cases
`HASH`	equality only	Marginally faster than B-tree for pure equality; no range
`GIN`	arrays, JSONB containment, full-text	Large index; slow build; fast `@>` / `@@`
`GiST`	geometry, ranges, nearest-neighbour	PostGIS spatial; `&&` / `<->` operators
`BRIN`	huge tables with natural physical ordering	Very small index; only useful for append-only tables (time-series)
`SP-GiST`	non-balanced partitioned structures	Quad-trees, radix trees; niche use

What is a database index and how does it speed up queries?

What is a B-tree index and what kinds of queries does it support?

What is a composite index and what is the left-prefix rule?

What is a covering index and how does it eliminate table lookups?

What is a partial index and when does it help?

When should you use a hash index instead of a B-tree?

What is a GIN index and when do you use it in Postgres?

What is index bloat and how do you fix it?

When should you NOT add an index?

When does the optimizer choose a sequential scan over an index scan?

What is a functional (expression) index?

How do you find and remove duplicate or unused indexes?

What is the difference between a clustered and a non-clustered index?

Should you always index a foreign key column?

What index types are available in Postgres and when do you use each?

More Indexes & Performance interview questions

More ways to practice