[{"data":1,"prerenderedAt":102},["ShallowReactive",2],{"qa-\u002Fsql\u002Fperformance\u002Findexes":3},{"page":4,"siblings":94,"blog":99},{"id":5,"title":6,"body":7,"description":11,"difficulty":14,"extension":15,"framework":16,"frameworkSlug":17,"meta":18,"navigation":19,"order":20,"path":21,"questions":22,"questionsCount":85,"related":86,"seo":87,"seoDescription":88,"stem":89,"subtopic":6,"topic":90,"topicSlug":91,"updated":92,"__hash__":93},"qa\u002Fsql\u002Fperformance\u002Findexes.md","Indexes",{"type":8,"value":9,"toc":10},"minimark",[],{"title":11,"searchDepth":12,"depth":12,"links":13},"",2,[],"medium","md","SQL","sql",{},true,1,"\u002Fsql\u002Fperformance\u002Findexes",[23,28,32,36,40,44,48,53,57,61,65,69,73,77,81],{"id":24,"difficulty":25,"q":26,"a":27},"what-is-index","easy","What is a database index and how does it speed up queries?","An **index** is a separate data structure (usually a **B-tree**) that the\ndatabase maintains alongside a table. It stores copies of one or more column\nvalues in sorted order together with pointers to the full row, allowing the\nengine to locate matching rows in **O(log n)** time instead of scanning\nevery row (O(n)).\n\n```sql\n-- Without an index: full table scan — reads every row\nSELECT * FROM orders WHERE customer_id = 42;\n\n-- After adding an index: index seek — reads only the matching branch\nCREATE INDEX idx_orders_customer ON orders (customer_id);\nSELECT * FROM orders WHERE customer_id = 42;\n-- Execution plan changes from Seq Scan to Index Scan\n```\n\nThe trade-off: indexes consume disk space and must be updated on every\n`INSERT`, `UPDATE`, or `DELETE` on the indexed columns — adding write\noverhead.\n\n**Rule of thumb:** add an index on any column that appears frequently in\n`WHERE`, `JOIN ON`, or `ORDER BY` clauses of slow queries. Verify with\n`EXPLAIN` before and after.\n",{"id":29,"difficulty":25,"q":30,"a":31},"btree-index","What is a B-tree index and what kinds of queries does it support?","A **B-tree** (Balanced-tree) index is the default index type in all major\ndatabases. It keeps values in sorted order across a balanced tree of pages,\nmaking it efficient for equality, range, and sort operations.\n\nSupported query types:\n- Equality: `WHERE col = value`\n- Range: `WHERE col > value`, `WHERE col BETWEEN a AND b`\n- Prefix: `WHERE col LIKE 'abc%'` (but not `'%abc'`)\n- Sorting: `ORDER BY col` (the optimizer may use the index to avoid a sort)\n- Prefix of a composite index: `WHERE (a, b)` when index is on `(a, b, c)`\n\n```sql\n-- B-tree index covers all of these:\nCREATE INDEX idx_users_email ON users (email);\n\nSELECT * FROM users WHERE email = 'alice@example.com';       -- equality\nSELECT * FROM users WHERE email > 'm@example.com';           -- range\nSELECT * FROM users WHERE email LIKE 'alice%';               -- prefix\nSELECT * FROM users ORDER BY email LIMIT 10;                 -- sort\n```\n\n**Rule of thumb:** always start with a B-tree index. Only reach for\nspecialised types (hash, GIN, GiST) when B-tree cannot satisfy the\naccess pattern (e.g., full-text search, containment on arrays\u002FJSON).\n",{"id":33,"difficulty":14,"q":34,"a":35},"composite-index","What is a composite index and what is the left-prefix rule?","A **composite index** covers two or more columns. The database sorts rows\nby the first column, then by the second within each first-column group, and\nso on. The optimizer can only use the index starting from the leftmost column\n— this is the **left-prefix rule**.\n\n```sql\nCREATE INDEX idx_orders_status_date ON orders (status, created_at);\n\n-- Uses the index (status is the leftmost column)\nSELECT * FROM orders WHERE status = 'pending';\n\n-- Uses the index (both columns used, in order)\nSELECT * FROM orders WHERE status = 'pending' AND created_at > '2026-01-01';\n\n-- CANNOT use the index (skips the first column)\nSELECT * FROM orders WHERE created_at > '2026-01-01';\n-- → falls back to Seq Scan\n```\n\nColumn order in the index matters: put the column used in equality filters\nfirst, then range-filtered columns, then sort columns.\n\n**Rule of thumb:** design composite indexes as `(equality_cols, range_col,\nsort_col)`. The most selective equality column goes first. A query that\nskips the leftmost column cannot use the index.\n",{"id":37,"difficulty":14,"q":38,"a":39},"covering-index","What is a covering index and how does it eliminate table lookups?","A **covering index** contains all columns a query needs — the database can\nanswer the query entirely from the index without touching the main table\n(the \"heap\"). This eliminates the extra I\u002FO of the **table lookup** (also\ncalled a \"heap fetch\" or \"bookmark lookup\").\n\n```sql\n-- Query needs id, status, total — all three must be in the index\nCREATE INDEX idx_orders_covering\n  ON orders (customer_id)\n  INCLUDE (status, total);   -- Postgres 11+ \u002F SQL Server: INCLUDE clause\n\nSELECT id, status, total\nFROM   orders\nWHERE  customer_id = 42;\n-- Execution plan: Index Only Scan (no heap access)\n```\n\nIn MySQL, covering indexes work without an `INCLUDE` clause — all columns\nin the `SELECT` list just need to be part of the index definition.\n\n**Rule of thumb:** when `EXPLAIN` shows an `Index Scan` on a high-traffic\nquery, check whether adding the `SELECT`ed columns to the index via\n`INCLUDE` can convert it to an `Index Only Scan`.\n",{"id":41,"difficulty":14,"q":42,"a":43},"partial-index","What is a partial index and when does it help?","A **partial index** is an index built on a subset of rows — those satisfying\na `WHERE` clause in the index definition. It is smaller, faster to update,\nand more selective than a full-column index.\n\n```sql\n-- Only index pending orders (the rows that are actually queried)\nCREATE INDEX idx_orders_pending\n  ON orders (created_at)\n  WHERE status = 'pending';\n\n-- This query uses the partial index (matches the WHERE condition)\nSELECT * FROM orders WHERE status = 'pending' AND created_at \u003C now() - INTERVAL '1 day';\n\n-- This query cannot use the partial index (status ≠ 'pending')\nSELECT * FROM orders WHERE status = 'shipped' AND created_at \u003C now() - INTERVAL '1 day';\n```\n\nAlso valid for partial unique indexes (see the constraints topic).\n\n**Rule of thumb:** use partial indexes when a large table has a small\n\"hot\" subset that most queries filter on (e.g., active records, unprocessed\njobs, non-deleted rows). The index shrinks dramatically and fits in cache\nmore easily.\n",{"id":45,"difficulty":14,"q":46,"a":47},"hash-index","When should you use a hash index instead of a B-tree?","A **hash index** maps each column value to a hash bucket, giving O(1)\naverage-case lookup for **equality-only** queries. It cannot support range\nqueries, sorting, or prefix matches.\n\n```sql\n-- Postgres: explicit hash index\nCREATE INDEX idx_sessions_token ON sessions USING HASH (token);\n\n-- Useful for: WHERE token = 'abc123'  (pure equality)\n-- Useless for: WHERE token > 'abc123'  (range)\n-- Useless for: ORDER BY token          (sort)\n```\n\nIn older Postgres versions (\u003C 10), hash indexes were not WAL-logged and\nwere lost on crash. Since Postgres 10, they are crash-safe. MySQL and SQL\nServer do not offer explicit hash indexes on disk (MySQL Memory engine does).\n\n**Rule of thumb:** prefer B-tree in almost all cases — it handles equality\ntoo and adds range\u002Fsort support for free. Only consider a hash index when\nprofiling shows that a very high-throughput equality-only lookup would\nmeasurably benefit from the marginal O(1) vs O(log n) difference.\n",{"id":49,"difficulty":50,"q":51,"a":52},"gin-index","hard","What is a GIN index and when do you use it in Postgres?","**GIN** (Generalized Inverted Index) is optimised for columns that contain\nmultiple values per row — arrays, `JSONB`, `tsvector` (full-text search).\nIt maps each individual element (word, key, array item) to the set of rows\ncontaining it.\n\n```sql\n-- Full-text search index\nCREATE INDEX idx_articles_fts ON articles USING GIN (to_tsvector('english', body));\nSELECT * FROM articles WHERE to_tsvector('english', body) @@ to_tsquery('postgres & index');\n\n-- JSONB containment index\nCREATE INDEX idx_events_payload ON events USING GIN (payload);\nSELECT * FROM events WHERE payload @> '{\"type\": \"click\"}';\n\n-- Array containment index\nCREATE INDEX idx_posts_tags ON posts USING GIN (tags);\nSELECT * FROM posts WHERE tags @> ARRAY['sql', 'performance'];\n```\n\nGIN indexes are large and slow to build but very fast for containment\nqueries (`@>`, `@@`, `?`).\n\n**Rule of thumb:** use GIN for full-text search and JSONB\u002Farray containment\nqueries. Use a regular B-tree for simple `=` or range filters on JSONB\nextracted scalar values (`(payload->>'user_id')::int`).\n",{"id":54,"difficulty":50,"q":55,"a":56},"index-bloat","What is index bloat and how do you fix it?","**Index bloat** occurs when an index grows much larger than the live data it\ncovers, usually because `DELETE` and `UPDATE` operations leave dead index\nentries that accumulate faster than `VACUUM` can reclaim them.\n\nSymptoms: index scans slow down, the index file on disk is much larger than\nexpected, and `VACUUM` verbose output shows many dead tuples.\n\n```sql\n-- Postgres: check index bloat (pgstattuple extension)\nCREATE EXTENSION IF NOT EXISTS pgstattuple;\nSELECT index_name,\n       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,\n       round(leaf_fragmentation::numeric, 2)       AS fragmentation_pct\nFROM   pgstattuple_approx('idx_orders_customer') t,\n       pg_indexes WHERE indexname = 'idx_orders_customer';\n\n-- Fix: rebuild the index (locks table — use CONCURRENTLY for large tables)\nREINDEX INDEX idx_orders_customer;\n\n-- Non-blocking rebuild (Postgres 12+)\nREINDEX INDEX CONCURRENTLY idx_orders_customer;\n```\n\n**Rule of thumb:** monitor index sizes relative to table sizes. If an index\nis more than 2–3× the expected size, run `REINDEX CONCURRENTLY`. Tune\n`autovacuum` to run more aggressively on high-churn tables.\n",{"id":58,"difficulty":14,"q":59,"a":60},"when-not-to-index","When should you NOT add an index?","Indexes are not free — they slow down writes and consume disk space. Avoid\nadding an index when:\n\n1. **The table is tiny** — a full scan of a 500-row table is faster than\n   an index lookup because the whole table fits in one or two pages.\n2. **The column has very low selectivity** — an index on a `status` column\n   with only two values (`active`\u002F`inactive`) where 90 % of rows are\n   `active` gives the optimizer no benefit for `WHERE status = 'active'`.\n3. **The table is write-heavy** — a table with hundreds of inserts\u002Fsecond\n   pays a high cost to maintain indexes. Batch-load tables often drop\n   indexes before the load and rebuild them after.\n4. **The column is rarely queried** — unused indexes consume space and slow\n   every write without ever benefiting a read.\n\n```sql\n-- Postgres: find unused indexes\nSELECT schemaname, tablename, indexname, idx_scan\nFROM   pg_stat_user_indexes\nWHERE  idx_scan = 0\n  AND  indexname NOT LIKE '%pkey%'   -- skip PKs\nORDER  BY pg_relation_size(indexrelid) DESC;\n```\n\n**Rule of thumb:** only add an index when you can show — via `EXPLAIN\nANALYZE` — that it is used and that it reduces query time. Drop unused\nindexes; they are pure overhead.\n",{"id":62,"difficulty":14,"q":63,"a":64},"index-scan-vs-seq-scan","When does the optimizer choose a sequential scan over an index scan?","The query optimizer uses **cost-based planning** to choose between a\nsequential scan and an index scan. It chooses sequential scan when:\n\n1. **Many rows match** — if 30 %+ of table rows match the `WHERE` clause,\n   reading the table sequentially in disk order is cheaper than the random\n   I\u002FO of following index pointers row-by-row.\n2. **Table statistics are stale** — if `ANALYZE` has not run recently, the\n   planner may underestimate or overestimate selectivity.\n3. **Small table** — the whole table fits in a few pages; sequential I\u002FO\n   is faster.\n4. **Cost configuration** — `random_page_cost` (Postgres) affects the\n   relative cost of index vs sequential I\u002FO. Lowering it (e.g. to 1.1 for\n   SSDs) makes index scans more attractive.\n\n```sql\n-- Check the plan and actual vs estimated rows\nEXPLAIN (ANALYZE, BUFFERS)\n  SELECT * FROM orders WHERE status = 'pending';\n-- If 'Rows Removed by Filter' is huge → Seq Scan is correct\n-- If 'Rows Removed by Filter' is small → missing index or stale stats\n\n-- Update statistics\nANALYZE orders;\n```\n\n**Rule of thumb:** trust the optimizer; a sequential scan on 40 % of rows\nIS faster than an index scan. If a plan looks wrong, check statistics with\n`ANALYZE` before forcing an index with a hint.\n",{"id":66,"difficulty":50,"q":67,"a":68},"expression-index","What is a functional (expression) index?","A **functional index** (expression index) indexes the *result* of a\nfunction or expression applied to a column rather than the raw column value.\nThis allows the optimizer to use the index when the same expression appears\nin a `WHERE` clause.\n\n```sql\n-- Without expression index: full scan (function applied to every row)\nSELECT * FROM users WHERE lower(email) = 'alice@example.com';\n\n-- Create an index on the expression\nCREATE INDEX idx_users_email_lower ON users (lower(email));\n\n-- Now this uses the index\nSELECT * FROM users WHERE lower(email) = 'alice@example.com';\n\n-- Also useful for JSON extraction\nCREATE INDEX idx_events_user ON events ((payload->>'user_id'));\nSELECT * FROM events WHERE payload->>'user_id' = '42';\n```\n\nThe expression in the `WHERE` clause must match the expression in the index\nexactly for the planner to use it.\n\n**Rule of thumb:** create a functional index whenever a `WHERE` clause\napplies a deterministic function to a column (case-insensitive email,\nJSON field extraction, date truncation). Run `ANALYZE` after creating it\nso the planner sees up-to-date statistics.\n",{"id":70,"difficulty":14,"q":71,"a":72},"index-maintenance","How do you find and remove duplicate or unused indexes?","```sql\n-- Postgres: find unused indexes (not scanned since last stats reset)\nSELECT schemaname, tablename, indexname,\n       pg_size_pretty(pg_relation_size(indexrelid)) AS size,\n       idx_scan\nFROM   pg_stat_user_indexes\nWHERE  idx_scan = 0\nORDER  BY pg_relation_size(indexrelid) DESC;\n\n-- Postgres: find duplicate indexes (same columns, same table)\nSELECT indrelid::regclass AS table,\n       array_agg(indexrelid::regclass) AS duplicate_indexes\nFROM   pg_index\nGROUP  BY indrelid, indkey\nHAVING COUNT(*) > 1;\n\n-- Drop a redundant index (non-blocking in Postgres)\nDROP INDEX CONCURRENTLY idx_orders_old_customer;\n```\n\nAn index is redundant when another index on the same table starts with the\nsame column(s). For example, an index on `(customer_id)` is made redundant\nby a composite index on `(customer_id, created_at)` for equality lookups.\n\n**Rule of thumb:** audit indexes quarterly. Drop unused ones — they slow\nwrites and mislead developers into thinking a column is important for\nlookups. Keep an index removal in a migration script so it can be re-added\nif monitoring reveals it is needed.\n",{"id":74,"difficulty":14,"q":75,"a":76},"clustered-vs-nonclustered","What is the difference between a clustered and a non-clustered index?","- **Clustered index**: the table data is physically stored in the order of\n  the index. There can be only **one** per table. In SQL Server and MySQL\n  InnoDB, the primary key is always the clustered index. In Postgres, there\n  is no automatic clustering, but `CLUSTER table USING index` physically\n  reorders the table once (not maintained dynamically).\n- **Non-clustered index**: a separate structure that stores the indexed\n  values and pointers (row IDs \u002F PKs) back to the heap. Multiple non-\n  clustered indexes can exist per table.\n\n```sql\n-- SQL Server: clustered index (the PK is clustered by default)\nCREATE TABLE orders (\n  id INT PRIMARY KEY CLUSTERED,    -- data pages sorted by id\n  customer_id INT NOT NULL\n);\n\n-- Non-clustered index\nCREATE NONCLUSTERED INDEX idx_orders_customer ON orders (customer_id);\n\n-- Postgres: one-time physical sort (does NOT stay sorted after future writes)\nCLUSTER orders USING idx_orders_customer_date;\n```\n\n**Rule of thumb:** in SQL Server and MySQL, choose the clustered index\n(usually the PK) carefully — sequential integer PKs cause minimal page\nsplits. Random UUIDs as clustered keys cause fragmentation and slow inserts.\n",{"id":78,"difficulty":25,"q":79,"a":80},"index-on-foreign-key","Should you always index a foreign key column?","**Yes, in almost all cases.** Foreign key columns are used in `JOIN ON`\nconditions and in cascade operations (`ON DELETE CASCADE`). Without an index,\nboth joins and FK enforcement scans become full table scans.\n\n```sql\n-- Child table without an index on the FK column:\n-- DELETE FROM customers WHERE id = 1\n-- → DB must scan ALL order rows to find children — O(n)\n\n-- With an index on the FK column:\nCREATE INDEX idx_orders_customer_id ON orders (customer_id);\n-- DELETE FROM customers WHERE id = 1\n-- → Index lookup to find children — O(log n)\n```\n\nPostgres does NOT automatically create an index on FK columns (unlike the\nPK side). MySQL InnoDB DOES create one automatically. SQL Server does not.\n\n**Rule of thumb:** after declaring a `REFERENCES` constraint in Postgres or\nSQL Server, immediately add a `CREATE INDEX` on the child's FK column unless\nthe child table is very small or the FK is never used in queries.\n",{"id":82,"difficulty":25,"q":83,"a":84},"index-types-summary","What index types are available in Postgres and when do you use each?","| Type | Best for | Notes |\n|---|---|---|\n| `BTREE` | equality, range, sort, prefix LIKE | Default; handles 95 % of use cases |\n| `HASH` | equality only | Marginally faster than B-tree for pure equality; no range |\n| `GIN` | arrays, JSONB containment, full-text | Large index; slow build; fast `@>` \u002F `@@` |\n| `GiST` | geometry, ranges, nearest-neighbour | PostGIS spatial; `&&` \u002F `\u003C->` operators |\n| `BRIN` | huge tables with natural physical ordering | Very small index; only useful for append-only tables (time-series) |\n| `SP-GiST` | non-balanced partitioned structures | Quad-trees, radix trees; niche use |\n\n```sql\n-- BRIN: tiny index on a massive append-only events table\n-- (works because newer rows have higher timestamps and are stored later on disk)\nCREATE INDEX idx_events_brin ON events USING BRIN (created_at);\n```\n\n**Rule of thumb:** use `BTREE` by default. Use `GIN` for arrays\u002FJSONB\u002FFTS.\nUse `BRIN` only on truly append-only tables (logs, IoT data) where the\nindexed column correlates with physical storage order — otherwise it will\nnot be used.\n",15,null,{"description":11},"SQL indexes interview questions — B-tree, hash, GIN, GiST, partial and composite indexes, covering indexes, index bloat, VACUUM, and when not to index across Postgres, MySQL, and SQL Server.","sql\u002Fperformance\u002Findexes","Indexes & Performance","performance","2026-06-20","IyWMTZsQsuPFpcSDdisIDYupSCT7mg2HwdOjRH8cwNw",[95,96],{"subtopic":6,"path":21,"order":20},{"subtopic":97,"path":98,"order":12},"Query Optimization","\u002Fsql\u002Fperformance\u002Fquery-optimization",{"path":100,"title":101},"\u002Fblog\u002Fsql-indexes-btree-performance","SQL Indexes — B-tree, Composite, Partial, and Covering Indexes",1782244107356]