[{"data":1,"prerenderedAt":185},["ShallowReactive",2],{"topic-sql-performance":3},{"framework":4,"topic":16,"subtopics":25},{"id":5,"description":6,"extension":7,"icon":8,"meta":9,"name":10,"order":11,"slug":12,"stem":13,"tier":14,"__hash__":15},"frameworks\u002Fframeworks\u002Fsql.yml","SQL interview questions on queries, joins and aggregation — essential for every backend, data and analytics interview.","yml","database",{},"SQL",4,"sql","frameworks\u002Fsql",1,"lpzsOj2p9p9W0Tctwc61nP-ulZAA80R5gJiyaZS6ZeI",{"id":17,"description":18,"extension":7,"frameworkSlug":12,"meta":19,"name":20,"order":21,"slug":22,"stem":23,"__hash__":24},"topics\u002Ftopics\u002Fsql-performance.yml","Indexes, EXPLAIN, query plans and optimization — why a query is slow and how to make it fast.",{},"Indexes & Performance",7,"performance","topics\u002Fsql-performance","t18iPx3n6b0VydJSeyDjwtk6Ips8LGJRk1xij8-HJK0",[26,111],{"id":27,"title":28,"body":29,"description":33,"difficulty":36,"extension":37,"framework":10,"frameworkSlug":12,"meta":38,"navigation":39,"order":14,"path":40,"questions":41,"questionsCount":104,"related":105,"seo":106,"seoDescription":107,"stem":108,"subtopic":28,"topic":20,"topicSlug":22,"updated":109,"__hash__":110},"qa\u002Fsql\u002Fperformance\u002Findexes.md","Indexes",{"type":30,"value":31,"toc":32},"minimark",[],{"title":33,"searchDepth":34,"depth":34,"links":35},"",2,[],"medium","md",{},true,"\u002Fsql\u002Fperformance\u002Findexes",[42,47,51,55,59,63,67,72,76,80,84,88,92,96,100],{"id":43,"difficulty":44,"q":45,"a":46},"what-is-index","easy","What is a database index and how does it speed up queries?","An **index** is a separate data structure (usually a **B-tree**) that the\ndatabase maintains alongside a table. It stores copies of one or more column\nvalues in sorted order together with pointers to the full row, allowing the\nengine to locate matching rows in **O(log n)** time instead of scanning\nevery row (O(n)).\n\n```sql\n-- Without an index: full table scan — reads every row\nSELECT * FROM orders WHERE customer_id = 42;\n\n-- After adding an index: index seek — reads only the matching branch\nCREATE INDEX idx_orders_customer ON orders (customer_id);\nSELECT * FROM orders WHERE customer_id = 42;\n-- Execution plan changes from Seq Scan to Index Scan\n```\n\nThe trade-off: indexes consume disk space and must be updated on every\n`INSERT`, `UPDATE`, or `DELETE` on the indexed columns — adding write\noverhead.\n\n**Rule of thumb:** add an index on any column that appears frequently in\n`WHERE`, `JOIN ON`, or `ORDER BY` clauses of slow queries. Verify with\n`EXPLAIN` before and after.\n",{"id":48,"difficulty":44,"q":49,"a":50},"btree-index","What is a B-tree index and what kinds of queries does it support?","A **B-tree** (Balanced-tree) index is the default index type in all major\ndatabases. It keeps values in sorted order across a balanced tree of pages,\nmaking it efficient for equality, range, and sort operations.\n\nSupported query types:\n- Equality: `WHERE col = value`\n- Range: `WHERE col > value`, `WHERE col BETWEEN a AND b`\n- Prefix: `WHERE col LIKE 'abc%'` (but not `'%abc'`)\n- Sorting: `ORDER BY col` (the optimizer may use the index to avoid a sort)\n- Prefix of a composite index: `WHERE (a, b)` when index is on `(a, b, c)`\n\n```sql\n-- B-tree index covers all of these:\nCREATE INDEX idx_users_email ON users (email);\n\nSELECT * FROM users WHERE email = 'alice@example.com';       -- equality\nSELECT * FROM users WHERE email > 'm@example.com';           -- range\nSELECT * FROM users WHERE email LIKE 'alice%';               -- prefix\nSELECT * FROM users ORDER BY email LIMIT 10;                 -- sort\n```\n\n**Rule of thumb:** always start with a B-tree index. Only reach for\nspecialised types (hash, GIN, GiST) when B-tree cannot satisfy the\naccess pattern (e.g., full-text search, containment on arrays\u002FJSON).\n",{"id":52,"difficulty":36,"q":53,"a":54},"composite-index","What is a composite index and what is the left-prefix rule?","A **composite index** covers two or more columns. The database sorts rows\nby the first column, then by the second within each first-column group, and\nso on. The optimizer can only use the index starting from the leftmost column\n— this is the **left-prefix rule**.\n\n```sql\nCREATE INDEX idx_orders_status_date ON orders (status, created_at);\n\n-- Uses the index (status is the leftmost column)\nSELECT * FROM orders WHERE status = 'pending';\n\n-- Uses the index (both columns used, in order)\nSELECT * FROM orders WHERE status = 'pending' AND created_at > '2026-01-01';\n\n-- CANNOT use the index (skips the first column)\nSELECT * FROM orders WHERE created_at > '2026-01-01';\n-- → falls back to Seq Scan\n```\n\nColumn order in the index matters: put the column used in equality filters\nfirst, then range-filtered columns, then sort columns.\n\n**Rule of thumb:** design composite indexes as `(equality_cols, range_col,\nsort_col)`. The most selective equality column goes first. A query that\nskips the leftmost column cannot use the index.\n",{"id":56,"difficulty":36,"q":57,"a":58},"covering-index","What is a covering index and how does it eliminate table lookups?","A **covering index** contains all columns a query needs — the database can\nanswer the query entirely from the index without touching the main table\n(the \"heap\"). This eliminates the extra I\u002FO of the **table lookup** (also\ncalled a \"heap fetch\" or \"bookmark lookup\").\n\n```sql\n-- Query needs id, status, total — all three must be in the index\nCREATE INDEX idx_orders_covering\n  ON orders (customer_id)\n  INCLUDE (status, total);   -- Postgres 11+ \u002F SQL Server: INCLUDE clause\n\nSELECT id, status, total\nFROM   orders\nWHERE  customer_id = 42;\n-- Execution plan: Index Only Scan (no heap access)\n```\n\nIn MySQL, covering indexes work without an `INCLUDE` clause — all columns\nin the `SELECT` list just need to be part of the index definition.\n\n**Rule of thumb:** when `EXPLAIN` shows an `Index Scan` on a high-traffic\nquery, check whether adding the `SELECT`ed columns to the index via\n`INCLUDE` can convert it to an `Index Only Scan`.\n",{"id":60,"difficulty":36,"q":61,"a":62},"partial-index","What is a partial index and when does it help?","A **partial index** is an index built on a subset of rows — those satisfying\na `WHERE` clause in the index definition. It is smaller, faster to update,\nand more selective than a full-column index.\n\n```sql\n-- Only index pending orders (the rows that are actually queried)\nCREATE INDEX idx_orders_pending\n  ON orders (created_at)\n  WHERE status = 'pending';\n\n-- This query uses the partial index (matches the WHERE condition)\nSELECT * FROM orders WHERE status = 'pending' AND created_at \u003C now() - INTERVAL '1 day';\n\n-- This query cannot use the partial index (status ≠ 'pending')\nSELECT * FROM orders WHERE status = 'shipped' AND created_at \u003C now() - INTERVAL '1 day';\n```\n\nAlso valid for partial unique indexes (see the constraints topic).\n\n**Rule of thumb:** use partial indexes when a large table has a small\n\"hot\" subset that most queries filter on (e.g., active records, unprocessed\njobs, non-deleted rows). The index shrinks dramatically and fits in cache\nmore easily.\n",{"id":64,"difficulty":36,"q":65,"a":66},"hash-index","When should you use a hash index instead of a B-tree?","A **hash index** maps each column value to a hash bucket, giving O(1)\naverage-case lookup for **equality-only** queries. It cannot support range\nqueries, sorting, or prefix matches.\n\n```sql\n-- Postgres: explicit hash index\nCREATE INDEX idx_sessions_token ON sessions USING HASH (token);\n\n-- Useful for: WHERE token = 'abc123'  (pure equality)\n-- Useless for: WHERE token > 'abc123'  (range)\n-- Useless for: ORDER BY token          (sort)\n```\n\nIn older Postgres versions (\u003C 10), hash indexes were not WAL-logged and\nwere lost on crash. Since Postgres 10, they are crash-safe. MySQL and SQL\nServer do not offer explicit hash indexes on disk (MySQL Memory engine does).\n\n**Rule of thumb:** prefer B-tree in almost all cases — it handles equality\ntoo and adds range\u002Fsort support for free. Only consider a hash index when\nprofiling shows that a very high-throughput equality-only lookup would\nmeasurably benefit from the marginal O(1) vs O(log n) difference.\n",{"id":68,"difficulty":69,"q":70,"a":71},"gin-index","hard","What is a GIN index and when do you use it in Postgres?","**GIN** (Generalized Inverted Index) is optimised for columns that contain\nmultiple values per row — arrays, `JSONB`, `tsvector` (full-text search).\nIt maps each individual element (word, key, array item) to the set of rows\ncontaining it.\n\n```sql\n-- Full-text search index\nCREATE INDEX idx_articles_fts ON articles USING GIN (to_tsvector('english', body));\nSELECT * FROM articles WHERE to_tsvector('english', body) @@ to_tsquery('postgres & index');\n\n-- JSONB containment index\nCREATE INDEX idx_events_payload ON events USING GIN (payload);\nSELECT * FROM events WHERE payload @> '{\"type\": \"click\"}';\n\n-- Array containment index\nCREATE INDEX idx_posts_tags ON posts USING GIN (tags);\nSELECT * FROM posts WHERE tags @> ARRAY['sql', 'performance'];\n```\n\nGIN indexes are large and slow to build but very fast for containment\nqueries (`@>`, `@@`, `?`).\n\n**Rule of thumb:** use GIN for full-text search and JSONB\u002Farray containment\nqueries. Use a regular B-tree for simple `=` or range filters on JSONB\nextracted scalar values (`(payload->>'user_id')::int`).\n",{"id":73,"difficulty":69,"q":74,"a":75},"index-bloat","What is index bloat and how do you fix it?","**Index bloat** occurs when an index grows much larger than the live data it\ncovers, usually because `DELETE` and `UPDATE` operations leave dead index\nentries that accumulate faster than `VACUUM` can reclaim them.\n\nSymptoms: index scans slow down, the index file on disk is much larger than\nexpected, and `VACUUM` verbose output shows many dead tuples.\n\n```sql\n-- Postgres: check index bloat (pgstattuple extension)\nCREATE EXTENSION IF NOT EXISTS pgstattuple;\nSELECT index_name,\n       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,\n       round(leaf_fragmentation::numeric, 2)       AS fragmentation_pct\nFROM   pgstattuple_approx('idx_orders_customer') t,\n       pg_indexes WHERE indexname = 'idx_orders_customer';\n\n-- Fix: rebuild the index (locks table — use CONCURRENTLY for large tables)\nREINDEX INDEX idx_orders_customer;\n\n-- Non-blocking rebuild (Postgres 12+)\nREINDEX INDEX CONCURRENTLY idx_orders_customer;\n```\n\n**Rule of thumb:** monitor index sizes relative to table sizes. If an index\nis more than 2–3× the expected size, run `REINDEX CONCURRENTLY`. Tune\n`autovacuum` to run more aggressively on high-churn tables.\n",{"id":77,"difficulty":36,"q":78,"a":79},"when-not-to-index","When should you NOT add an index?","Indexes are not free — they slow down writes and consume disk space. Avoid\nadding an index when:\n\n1. **The table is tiny** — a full scan of a 500-row table is faster than\n   an index lookup because the whole table fits in one or two pages.\n2. **The column has very low selectivity** — an index on a `status` column\n   with only two values (`active`\u002F`inactive`) where 90 % of rows are\n   `active` gives the optimizer no benefit for `WHERE status = 'active'`.\n3. **The table is write-heavy** — a table with hundreds of inserts\u002Fsecond\n   pays a high cost to maintain indexes. Batch-load tables often drop\n   indexes before the load and rebuild them after.\n4. **The column is rarely queried** — unused indexes consume space and slow\n   every write without ever benefiting a read.\n\n```sql\n-- Postgres: find unused indexes\nSELECT schemaname, tablename, indexname, idx_scan\nFROM   pg_stat_user_indexes\nWHERE  idx_scan = 0\n  AND  indexname NOT LIKE '%pkey%'   -- skip PKs\nORDER  BY pg_relation_size(indexrelid) DESC;\n```\n\n**Rule of thumb:** only add an index when you can show — via `EXPLAIN\nANALYZE` — that it is used and that it reduces query time. Drop unused\nindexes; they are pure overhead.\n",{"id":81,"difficulty":36,"q":82,"a":83},"index-scan-vs-seq-scan","When does the optimizer choose a sequential scan over an index scan?","The query optimizer uses **cost-based planning** to choose between a\nsequential scan and an index scan. It chooses sequential scan when:\n\n1. **Many rows match** — if 30 %+ of table rows match the `WHERE` clause,\n   reading the table sequentially in disk order is cheaper than the random\n   I\u002FO of following index pointers row-by-row.\n2. **Table statistics are stale** — if `ANALYZE` has not run recently, the\n   planner may underestimate or overestimate selectivity.\n3. **Small table** — the whole table fits in a few pages; sequential I\u002FO\n   is faster.\n4. **Cost configuration** — `random_page_cost` (Postgres) affects the\n   relative cost of index vs sequential I\u002FO. Lowering it (e.g. to 1.1 for\n   SSDs) makes index scans more attractive.\n\n```sql\n-- Check the plan and actual vs estimated rows\nEXPLAIN (ANALYZE, BUFFERS)\n  SELECT * FROM orders WHERE status = 'pending';\n-- If 'Rows Removed by Filter' is huge → Seq Scan is correct\n-- If 'Rows Removed by Filter' is small → missing index or stale stats\n\n-- Update statistics\nANALYZE orders;\n```\n\n**Rule of thumb:** trust the optimizer; a sequential scan on 40 % of rows\nIS faster than an index scan. If a plan looks wrong, check statistics with\n`ANALYZE` before forcing an index with a hint.\n",{"id":85,"difficulty":69,"q":86,"a":87},"expression-index","What is a functional (expression) index?","A **functional index** (expression index) indexes the *result* of a\nfunction or expression applied to a column rather than the raw column value.\nThis allows the optimizer to use the index when the same expression appears\nin a `WHERE` clause.\n\n```sql\n-- Without expression index: full scan (function applied to every row)\nSELECT * FROM users WHERE lower(email) = 'alice@example.com';\n\n-- Create an index on the expression\nCREATE INDEX idx_users_email_lower ON users (lower(email));\n\n-- Now this uses the index\nSELECT * FROM users WHERE lower(email) = 'alice@example.com';\n\n-- Also useful for JSON extraction\nCREATE INDEX idx_events_user ON events ((payload->>'user_id'));\nSELECT * FROM events WHERE payload->>'user_id' = '42';\n```\n\nThe expression in the `WHERE` clause must match the expression in the index\nexactly for the planner to use it.\n\n**Rule of thumb:** create a functional index whenever a `WHERE` clause\napplies a deterministic function to a column (case-insensitive email,\nJSON field extraction, date truncation). Run `ANALYZE` after creating it\nso the planner sees up-to-date statistics.\n",{"id":89,"difficulty":36,"q":90,"a":91},"index-maintenance","How do you find and remove duplicate or unused indexes?","```sql\n-- Postgres: find unused indexes (not scanned since last stats reset)\nSELECT schemaname, tablename, indexname,\n       pg_size_pretty(pg_relation_size(indexrelid)) AS size,\n       idx_scan\nFROM   pg_stat_user_indexes\nWHERE  idx_scan = 0\nORDER  BY pg_relation_size(indexrelid) DESC;\n\n-- Postgres: find duplicate indexes (same columns, same table)\nSELECT indrelid::regclass AS table,\n       array_agg(indexrelid::regclass) AS duplicate_indexes\nFROM   pg_index\nGROUP  BY indrelid, indkey\nHAVING COUNT(*) > 1;\n\n-- Drop a redundant index (non-blocking in Postgres)\nDROP INDEX CONCURRENTLY idx_orders_old_customer;\n```\n\nAn index is redundant when another index on the same table starts with the\nsame column(s). For example, an index on `(customer_id)` is made redundant\nby a composite index on `(customer_id, created_at)` for equality lookups.\n\n**Rule of thumb:** audit indexes quarterly. Drop unused ones — they slow\nwrites and mislead developers into thinking a column is important for\nlookups. Keep an index removal in a migration script so it can be re-added\nif monitoring reveals it is needed.\n",{"id":93,"difficulty":36,"q":94,"a":95},"clustered-vs-nonclustered","What is the difference between a clustered and a non-clustered index?","- **Clustered index**: the table data is physically stored in the order of\n  the index. There can be only **one** per table. In SQL Server and MySQL\n  InnoDB, the primary key is always the clustered index. In Postgres, there\n  is no automatic clustering, but `CLUSTER table USING index` physically\n  reorders the table once (not maintained dynamically).\n- **Non-clustered index**: a separate structure that stores the indexed\n  values and pointers (row IDs \u002F PKs) back to the heap. Multiple non-\n  clustered indexes can exist per table.\n\n```sql\n-- SQL Server: clustered index (the PK is clustered by default)\nCREATE TABLE orders (\n  id INT PRIMARY KEY CLUSTERED,    -- data pages sorted by id\n  customer_id INT NOT NULL\n);\n\n-- Non-clustered index\nCREATE NONCLUSTERED INDEX idx_orders_customer ON orders (customer_id);\n\n-- Postgres: one-time physical sort (does NOT stay sorted after future writes)\nCLUSTER orders USING idx_orders_customer_date;\n```\n\n**Rule of thumb:** in SQL Server and MySQL, choose the clustered index\n(usually the PK) carefully — sequential integer PKs cause minimal page\nsplits. Random UUIDs as clustered keys cause fragmentation and slow inserts.\n",{"id":97,"difficulty":44,"q":98,"a":99},"index-on-foreign-key","Should you always index a foreign key column?","**Yes, in almost all cases.** Foreign key columns are used in `JOIN ON`\nconditions and in cascade operations (`ON DELETE CASCADE`). Without an index,\nboth joins and FK enforcement scans become full table scans.\n\n```sql\n-- Child table without an index on the FK column:\n-- DELETE FROM customers WHERE id = 1\n-- → DB must scan ALL order rows to find children — O(n)\n\n-- With an index on the FK column:\nCREATE INDEX idx_orders_customer_id ON orders (customer_id);\n-- DELETE FROM customers WHERE id = 1\n-- → Index lookup to find children — O(log n)\n```\n\nPostgres does NOT automatically create an index on FK columns (unlike the\nPK side). MySQL InnoDB DOES create one automatically. SQL Server does not.\n\n**Rule of thumb:** after declaring a `REFERENCES` constraint in Postgres or\nSQL Server, immediately add a `CREATE INDEX` on the child's FK column unless\nthe child table is very small or the FK is never used in queries.\n",{"id":101,"difficulty":44,"q":102,"a":103},"index-types-summary","What index types are available in Postgres and when do you use each?","| Type | Best for | Notes |\n|---|---|---|\n| `BTREE` | equality, range, sort, prefix LIKE | Default; handles 95 % of use cases |\n| `HASH` | equality only | Marginally faster than B-tree for pure equality; no range |\n| `GIN` | arrays, JSONB containment, full-text | Large index; slow build; fast `@>` \u002F `@@` |\n| `GiST` | geometry, ranges, nearest-neighbour | PostGIS spatial; `&&` \u002F `\u003C->` operators |\n| `BRIN` | huge tables with natural physical ordering | Very small index; only useful for append-only tables (time-series) |\n| `SP-GiST` | non-balanced partitioned structures | Quad-trees, radix trees; niche use |\n\n```sql\n-- BRIN: tiny index on a massive append-only events table\n-- (works because newer rows have higher timestamps and are stored later on disk)\nCREATE INDEX idx_events_brin ON events USING BRIN (created_at);\n```\n\n**Rule of thumb:** use `BTREE` by default. Use `GIN` for arrays\u002FJSONB\u002FFTS.\nUse `BRIN` only on truly append-only tables (logs, IoT data) where the\nindexed column correlates with physical storage order — otherwise it will\nnot be used.\n",15,null,{"description":33},"SQL indexes interview questions — B-tree, hash, GIN, GiST, partial and composite indexes, covering indexes, index bloat, VACUUM, and when not to index across Postgres, MySQL, and SQL Server.","sql\u002Fperformance\u002Findexes","2026-06-20","IyWMTZsQsuPFpcSDdisIDYupSCT7mg2HwdOjRH8cwNw",{"id":112,"title":113,"body":114,"description":33,"difficulty":69,"extension":37,"framework":10,"frameworkSlug":12,"meta":118,"navigation":39,"order":34,"path":119,"questions":120,"questionsCount":104,"related":105,"seo":181,"seoDescription":182,"stem":183,"subtopic":113,"topic":20,"topicSlug":22,"updated":109,"__hash__":184},"qa\u002Fsql\u002Fperformance\u002Fquery-optimization.md","Query Optimization",{"type":30,"value":115,"toc":116},[],{"title":33,"searchDepth":34,"depth":34,"links":117},[],{},"\u002Fsql\u002Fperformance\u002Fquery-optimization",[121,125,129,133,137,141,145,149,153,157,161,165,169,173,177],{"id":122,"difficulty":44,"q":123,"a":124},"what-is-explain","What does EXPLAIN do and how do you read its output?","`EXPLAIN` shows the **query execution plan** the database chose — which\nindexes are used, what join strategies are applied, and the estimated cost\nand row counts at each step. `EXPLAIN ANALYZE` actually runs the query and\nadds real timings and row counts alongside the estimates.\n\n```sql\n-- Postgres: estimated plan only (does not run the query)\nEXPLAIN SELECT * FROM orders WHERE customer_id = 42;\n\n-- Postgres: run the query, show actual vs estimated\nEXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)\n  SELECT * FROM orders WHERE customer_id = 42;\n\n-- MySQL\nEXPLAIN SELECT * FROM orders WHERE customer_id = 42;\nEXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;  -- MySQL 8.0.18+\n\n-- SQL Server\nSET STATISTICS IO ON;\nSET SHOWPLAN_TEXT ON;\nSELECT * FROM orders WHERE customer_id = 42;\n```\n\nKey fields to read in Postgres output:\n- `Seq Scan` \u002F `Index Scan` \u002F `Index Only Scan` — access method\n- `rows=N` — estimated rows; compare to `actual rows=N` in ANALYZE\n- `cost=start..total` — planner's cost units (not wall-clock ms)\n- `Buffers: shared hit=N read=N` — cache hits vs disk reads\n\n**Rule of thumb:** always use `EXPLAIN ANALYZE` (not just `EXPLAIN`) on\nslow queries — large discrepancies between estimated and actual rows reveal\nstale statistics, which is the root cause of most bad query plans.\n",{"id":126,"difficulty":36,"q":127,"a":128},"seq-scan-optimization","You see a Seq Scan on a large table — what do you check first?","A sequential scan on a large table is the most common source of slow queries.\nWork through this checklist:\n\n1. **Is there an index on the WHERE column?** If not, create one.\n2. **Is the index being used?** Check `EXPLAIN` — if not, the planner may\n   think a seq scan is cheaper (see next steps).\n3. **Are statistics fresh?** Run `ANALYZE table_name` and re-check the plan.\n4. **Is selectivity high?** An index on a low-cardinality column (e.g. a\n   boolean) won't help if 80 % of rows match.\n5. **Is there a function in the WHERE clause?** `WHERE lower(email) = '...'`\n   won't use an index on `email` — create a functional index.\n6. **Is `random_page_cost` tuned for SSDs?** Default is 4.0 (HDD); set to\n   1.1 for SSD storage to make index scans more attractive.\n\n```sql\n-- Run ANALYZE to refresh statistics\nANALYZE orders;\n\n-- Tune cost parameters for SSD (session level)\nSET random_page_cost = 1.1;\nSET effective_cache_size = '4GB';\n\nEXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;\n```\n\n**Rule of thumb:** stale statistics cause the planner to choose bad plans\nmore often than missing indexes. Always `ANALYZE` before concluding that an\nindex isn't working.\n",{"id":130,"difficulty":36,"q":131,"a":132},"n-plus-one","What is the N+1 query problem and how do you fix it in SQL?","The **N+1 problem** occurs when code fetches a list of N parent records and\nthen issues one additional query per record to load its children — totalling\nN+1 round-trips instead of 1.\n\n```sql\n-- N+1: 1 query for customers + N queries for orders (one per customer)\nSELECT id, name FROM customers WHERE active = TRUE;  -- → N rows\n-- then in a loop:\nSELECT * FROM orders WHERE customer_id = ?;  -- N times\n\n-- Fix: JOIN or subquery — 1 round-trip total\nSELECT c.id, c.name, o.id AS order_id, o.total\nFROM   customers c\nLEFT JOIN orders o ON o.customer_id = c.id\nWHERE  c.active = TRUE;\n\n-- Or use a lateral join to get the latest order per customer\nSELECT c.id, c.name, latest.total\nFROM   customers c\nLEFT JOIN LATERAL (\n  SELECT total FROM orders WHERE customer_id = c.id\n  ORDER BY created_at DESC LIMIT 1\n) latest ON TRUE\nWHERE  c.active = TRUE;\n```\n\n**Rule of thumb:** if application logs show many nearly-identical queries\ndiffering only by a primary key value, you have an N+1 problem. Fix it with\na `JOIN`, an `IN (...)` batch fetch, or a lateral join.\n",{"id":134,"difficulty":69,"q":135,"a":136},"join-strategies","What are the three join strategies and when does the optimizer choose each?","The query planner chooses from three physical join algorithms:\n\n1. **Nested Loop Join** — for each row in the outer table, scan the inner\n   table (optionally using an index). O(N × M) worst case. Best when the\n   outer set is very small or an index on the inner table makes the inner\n   scan cheap.\n2. **Hash Join** — build a hash table from the smaller side, then probe it\n   with each row from the larger side. O(N + M). Best for large unsorted\n   inputs with no useful index.\n3. **Merge Join** — sort both sides by the join key, then merge in O(N + M).\n   Best when both sides are already sorted (e.g., both have B-tree index\n   scans in join-key order).\n\n```sql\n-- Force a specific strategy (Postgres — for testing only)\nSET enable_hashjoin = off;\nSET enable_mergejoin = off;\nEXPLAIN SELECT * FROM orders o JOIN customers c ON c.id = o.customer_id;\n-- Now forced to Nested Loop\n```\n\n**Rule of thumb:** trust the optimizer to pick the right join strategy.\nAdd an index on the join column of the larger table to make nested-loop\njoins efficient, and ensure statistics are current so the planner estimates\nset sizes correctly.\n",{"id":138,"difficulty":36,"q":139,"a":140},"statistics-analyze","What are table statistics and how do you update them?","**Statistics** are metadata the query planner uses to estimate how many\nrows a filter will return — column histograms, most common values, null\nfractions, and row counts. Stale statistics lead to bad execution plans.\n\n```sql\n-- Postgres: update statistics for a table (fast, non-blocking)\nANALYZE orders;\n\n-- Update all tables in the database\nANALYZE;\n\n-- Check when statistics were last collected\nSELECT relname, last_analyze, last_autoanalyze, n_live_tup, n_dead_tup\nFROM   pg_stat_user_tables\nWHERE  relname = 'orders';\n\n-- Increase statistics target for a column with non-uniform distribution\nALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;\n-- Default is 100 (100 histogram buckets); raise for skewed data.\nANALYZE orders;\n```\n\nPostgres runs `autovacuum` which calls `ANALYZE` automatically, but it may\nlag behind on high-churn tables.\n\n**Rule of thumb:** if a query plan suddenly gets worse after a large data\nload or bulk delete, run `ANALYZE table_name` immediately. For columns with\nvery skewed distributions (e.g., `status` with 99 % of rows as `active`),\nraise the statistics target.\n",{"id":142,"difficulty":36,"q":143,"a":144},"pagination-offset","Why is OFFSET-based pagination slow and what is the alternative?","`LIMIT N OFFSET M` forces the database to scan and discard the first M rows\nbefore returning N. As M grows, the query gets slower — page 1 000 of 50\nresults means scanning 50 000 rows.\n\n```sql\n-- Slow: O(offset) scan for every page\nSELECT * FROM posts ORDER BY created_at DESC LIMIT 50 OFFSET 10000;\n\n-- Fast: keyset (cursor) pagination — O(log n) via index\n-- Page 1:\nSELECT id, title, created_at FROM posts\nORDER BY created_at DESC, id DESC\nLIMIT 50;\n-- → last row: created_at = '2026-06-01 12:00:00', id = 9834\n\n-- Next page: use the last row's values as the cursor\nSELECT id, title, created_at FROM posts\nWHERE (created_at, id) \u003C ('2026-06-01 12:00:00', 9834)\nORDER BY created_at DESC, id DESC\nLIMIT 50;\n-- Index on (created_at DESC, id DESC) makes this O(log n)\n```\n\n**Rule of thumb:** use keyset (cursor) pagination for any list that can\ngrow large. Reserve `OFFSET` only for small datasets (\u003C 10 000 rows) or\nwhere jumping to arbitrary page numbers is a hard requirement.\n",{"id":146,"difficulty":36,"q":147,"a":148},"subquery-vs-join-perf","When does a subquery perform worse than a JOIN and how do you fix it?","In most modern databases (Postgres, MySQL 8+, SQL Server), the optimizer\ncan rewrite correlated subqueries as joins automatically. However,\n**correlated subqueries** that reference the outer query run once per outer\nrow — O(N) inner scans — and may not be rewritten:\n\n```sql\n-- Correlated subquery: potentially O(N) inner scans\nSELECT id, total,\n  (SELECT name FROM customers WHERE id = o.customer_id) AS customer_name\nFROM orders o;\n\n-- Equivalent JOIN: one pass, one lookup\nSELECT o.id, o.total, c.name AS customer_name\nFROM   orders o\nJOIN   customers c ON c.id = o.customer_id;\n\n-- Check with EXPLAIN: if you see \"SubPlan\" or \"Nested Loop\" with inner\n-- rows = outer rows, the subquery is not being optimised away.\n```\n\nNon-correlated subqueries in `IN (SELECT …)` are usually optimised to a\nsemi-join and are equivalent in performance to a `JOIN`.\n\n**Rule of thumb:** if `EXPLAIN` shows a `SubPlan` node repeating for every\nouter row, rewrite it as a `JOIN` or a lateral join. For `EXISTS` \u002F `IN`\nchecks, the optimizer almost always handles them correctly on its own.\n",{"id":150,"difficulty":36,"q":151,"a":152},"query-rewrite-or","How can OR in a WHERE clause hurt performance and how do you fix it?","An `OR` across different columns often prevents the optimizer from using a\nsingle index efficiently because no single index covers both branches.\n\n```sql\n-- This may cause a Seq Scan even if both columns are indexed separately\nSELECT * FROM users WHERE email = 'alice@example.com' OR phone = '555-1234';\n\n-- Fix 1: UNION ALL (each branch uses its own index)\nSELECT * FROM users WHERE email = 'alice@example.com'\nUNION ALL\nSELECT * FROM users WHERE phone = '555-1234' AND email \u003C> 'alice@example.com';\n\n-- Fix 2: Postgres bitmap index scan (auto-handles OR over different indexes)\n-- Postgres may already do this — check EXPLAIN for \"BitmapAnd\"\u002F\"BitmapOr\"\n\n-- Fix 3: denormalise into a single search column \u002F use full-text search\n```\n\n**Rule of thumb:** use `EXPLAIN` to check whether an `OR` query is doing\na full scan. If so, split it into a `UNION ALL` so each branch can exploit\nits own index independently.\n",{"id":154,"difficulty":44,"q":155,"a":156},"avoid-select-star","Why should you avoid SELECT * in production queries?","`SELECT *` fetches every column from the table, including large `TEXT`,\n`BYTEA`, or `JSONB` columns that the query may not need. This increases\nnetwork transfer, memory use, and makes covering-index optimisations\nimpossible.\n\n```sql\n-- BAD: fetches all 40 columns including a 1 MB blob column\nSELECT * FROM products WHERE category_id = 5;\n\n-- GOOD: only the columns the caller actually needs\nSELECT id, name, price, stock FROM products WHERE category_id = 5;\n```\n\nAdditional reasons to avoid `SELECT *`:\n- Adding a column to the table silently changes what the query returns,\n  breaking application code that expects a fixed schema.\n- Prevents the planner from choosing an index-only scan.\n- Makes query intent unclear to future readers.\n\n**Rule of thumb:** always list columns explicitly in production queries.\n`SELECT *` is fine for ad-hoc exploration but should never appear in\napplication code or stored procedures.\n",{"id":158,"difficulty":69,"q":159,"a":160},"cte-performance-opt","Can CTEs hurt query performance and how?","In **Postgres pre-12**, CTEs were **optimisation fences** — the planner\nmaterialised (executed and stored) the CTE result before running the outer\nquery, preventing predicates from being pushed inside. This could cause full\nscans on the CTE that a plain subquery would have avoided.\n\n```sql\n-- Postgres \u003C 12: this CTE is materialised; the WHERE id = 42 is applied\n-- AFTER the full scan of orders inside the CTE\nWITH recent AS (\n  SELECT * FROM orders WHERE created_at > now() - INTERVAL '30 days'\n)\nSELECT * FROM recent WHERE id = 42;\n\n-- Postgres 12+: CTEs are inlined by default (no longer an optimisation fence)\n-- Force materialisation when you WANT the fence (e.g., to prevent repeated execution):\nWITH recent AS MATERIALIZED (\n  SELECT * FROM orders WHERE created_at > now() - INTERVAL '30 days'\n)\nSELECT * FROM recent WHERE id = 42;\n```\n\nMySQL and SQL Server have always inlined non-recursive CTEs.\n\n**Rule of thumb:** on Postgres 12+, CTEs behave like subqueries and are not\na performance concern. On older Postgres, replace CTEs with subqueries in\nthe `FROM` clause if `EXPLAIN` shows the CTE is preventing index use.\n",{"id":162,"difficulty":69,"q":163,"a":164},"vacuum-and-bloat","What is table bloat and how does VACUUM address it?","In Postgres's MVCC model, `UPDATE` and `DELETE` do not overwrite rows —\nthey mark old row versions as dead and write new versions. **Dead tuples**\naccumulate until `VACUUM` reclaims their space. Without regular vacuuming,\nthe table grows (bloat), sequential scans slow down, and indexes carry dead\nentries.\n\n```sql\n-- Check dead tuple accumulation\nSELECT relname, n_live_tup, n_dead_tup,\n       round(n_dead_tup::numeric \u002F NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1)\n         AS dead_pct,\n       last_vacuum, last_autovacuum\nFROM   pg_stat_user_tables\nORDER  BY n_dead_tup DESC\nLIMIT  10;\n\n-- Manual vacuum (reclaims space for reuse; does not shrink the file)\nVACUUM orders;\n\n-- Full vacuum (reclaims and shrinks the file; locks the table)\nVACUUM FULL orders;\n\n-- Rebuild indexes alongside\nVACUUM (ANALYZE, VERBOSE) orders;\n```\n\n**Rule of thumb:** rely on `autovacuum` for routine maintenance. Run\n`VACUUM ANALYZE` manually after large bulk deletes or updates. Only use\n`VACUUM FULL` on heavily bloated tables during maintenance windows — it\nacquires an exclusive lock.\n",{"id":166,"difficulty":36,"q":167,"a":168},"query-timeout","How do you set and enforce query timeouts in SQL?","Long-running queries can exhaust connection pools, hold locks, and degrade\nthe whole database. Most databases allow a maximum query duration:\n\n```sql\n-- Postgres: statement timeout (raises error if exceeded)\nSET statement_timeout = '5s';        -- session level\nSET LOCAL statement_timeout = '2s';  -- transaction level only\n\n-- Postgres: lock wait timeout (fail fast rather than queue behind a blocker)\nSET lock_timeout = '500ms';\n\n-- MySQL: per-query timeout (optimizer hint)\nSELECT \u002F*+ MAX_EXECUTION_TIME(3000) *\u002F * FROM orders WHERE customer_id = 42;\n\n-- SQL Server: per-connection timeout (set by client driver)\n-- In T-SQL:\nSET QUERY_GOVERNOR_COST_LIMIT 1000;  -- abort if estimated cost > 1000\n```\n\nIn application code, always set a reasonable timeout at the connection\nor query level — never leave it at infinity (the default).\n\n**Rule of thumb:** set `statement_timeout` to a value appropriate for the\ncontext: 5–30 s for OLTP API queries; longer for batch jobs. Use\n`lock_timeout` separately to fail fast on lock contention rather than\nqueuing indefinitely.\n",{"id":170,"difficulty":36,"q":171,"a":172},"missing-index-detection","How do you proactively find slow queries and missing indexes in production?","```sql\n-- Postgres: pg_stat_statements (requires extension) — top slow queries\nCREATE EXTENSION IF NOT EXISTS pg_stat_statements;\n\nSELECT query,\n       calls,\n       round(total_exec_time::numeric \u002F calls, 2) AS avg_ms,\n       round(total_exec_time::numeric, 0)         AS total_ms,\n       rows \u002F calls                               AS avg_rows\nFROM   pg_stat_statements\nORDER  BY total_exec_time DESC\nLIMIT  20;\n\n-- Postgres: pg_stat_user_tables — tables with heavy sequential scans\nSELECT relname, seq_scan, seq_tup_read,\n       idx_scan,\n       round(seq_scan::numeric \u002F NULLIF(seq_scan + idx_scan, 0) * 100, 1) AS seq_pct\nFROM   pg_stat_user_tables\nWHERE  seq_scan > 0\nORDER  BY seq_tup_read DESC\nLIMIT  10;\n\n-- MySQL: slow query log\nSET GLOBAL slow_query_log = 'ON';\nSET GLOBAL long_query_time = 1;  -- log queries > 1 second\n```\n\n**Rule of thumb:** enable `pg_stat_statements` (Postgres) or the slow\nquery log (MySQL\u002FSQL Server) in production from day one. Review the top-10\nqueries by total time weekly — optimising one heavily-called query often\nhas more impact than tuning ten rarely-run ones.\n",{"id":174,"difficulty":69,"q":175,"a":176},"partition-pruning","What is partition pruning and how does it improve query performance?","**Partition pruning** is the optimizer's ability to skip entire table\npartitions that cannot contain rows matching the query's `WHERE` clause.\nInstead of scanning all partitions, it reads only the ones that could\nhave relevant data.\n\n```sql\n-- Partitioned table (Postgres)\nCREATE TABLE events (\n  id         BIGINT NOT NULL,\n  created_at DATE   NOT NULL,\n  payload    JSONB\n) PARTITION BY RANGE (created_at);\n\nCREATE TABLE events_2026_q1 PARTITION OF events\n  FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');\nCREATE TABLE events_2026_q2 PARTITION OF events\n  FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');\n\n-- Query: optimizer prunes events_2026_q1 — only scans events_2026_q2\nEXPLAIN SELECT * FROM events WHERE created_at >= '2026-04-01';\n-- Plan shows: Seq Scan on events_2026_q2 (events_2026_q1 not mentioned)\n```\n\nPruning only works when the `WHERE` clause filters on the partition key\nwith a constant (not a function call or a join column).\n\n**Rule of thumb:** for pruning to work, the `WHERE` predicate on the\npartition key must be a literal or a parameter — not a function like\n`DATE_TRUNC(...)`. Check `EXPLAIN` to confirm partitions are being pruned.\n",{"id":178,"difficulty":36,"q":179,"a":180},"index-only-scan","What is an Index Only Scan and how do you enable it?","An **Index Only Scan** reads all needed data directly from the index without\ntouching the main table (heap). It is the fastest read path — no random\nheap I\u002FO at all.\n\nFor an Index Only Scan to be chosen:\n1. All columns in `SELECT`, `WHERE`, and `ORDER BY` must be in the index.\n2. The table's **visibility map** must show that pages are all-visible\n   (recently vacuumed). If many pages are not all-visible, Postgres falls\n   back to heap fetches.\n\n```sql\n-- Query: SELECT email FROM users WHERE created_at > '2026-01-01'\n-- Index needed: (created_at) INCLUDE (email)\nCREATE INDEX idx_users_created_email\n  ON users (created_at DESC)\n  INCLUDE (email);\n\nEXPLAIN (ANALYZE, BUFFERS)\n  SELECT email FROM users WHERE created_at > '2026-01-01';\n-- → Index Only Scan using idx_users_created_email (Heap Fetches: 0)\n\n-- If Heap Fetches > 0, run VACUUM to update the visibility map:\nVACUUM users;\n```\n\n**Rule of thumb:** convert an `Index Scan` to an `Index Only Scan` by\nadding the `SELECT`ed columns to the index via `INCLUDE`. Then ensure the\ntable is regularly vacuumed so the visibility map stays current.\n",{"description":33},"SQL query optimization interview questions — EXPLAIN ANALYZE, execution plans, statistics, join strategies, N+1 problem, pagination patterns, and tuning tips across Postgres, MySQL, and SQL Server.","sql\u002Fperformance\u002Fquery-optimization","avdAD0y5v9vjkEh0IgAcWaowCwAcxFPScWt0qrM-2gw",1782244099021]