[{"data":1,"prerenderedAt":102},["ShallowReactive",2],{"qa-\u002Fsql\u002Fperformance\u002Fquery-optimization":3},{"page":4,"siblings":93,"blog":99},{"id":5,"title":6,"body":7,"description":11,"difficulty":14,"extension":15,"framework":16,"frameworkSlug":17,"meta":18,"navigation":19,"order":12,"path":20,"questions":21,"questionsCount":84,"related":85,"seo":86,"seoDescription":87,"stem":88,"subtopic":6,"topic":89,"topicSlug":90,"updated":91,"__hash__":92},"qa\u002Fsql\u002Fperformance\u002Fquery-optimization.md","Query Optimization",{"type":8,"value":9,"toc":10},"minimark",[],{"title":11,"searchDepth":12,"depth":12,"links":13},"",2,[],"hard","md","SQL","sql",{},true,"\u002Fsql\u002Fperformance\u002Fquery-optimization",[22,27,32,36,40,44,48,52,56,60,64,68,72,76,80],{"id":23,"difficulty":24,"q":25,"a":26},"what-is-explain","easy","What does EXPLAIN do and how do you read its output?","`EXPLAIN` shows the **query execution plan** the database chose — which\nindexes are used, what join strategies are applied, and the estimated cost\nand row counts at each step. `EXPLAIN ANALYZE` actually runs the query and\nadds real timings and row counts alongside the estimates.\n\n```sql\n-- Postgres: estimated plan only (does not run the query)\nEXPLAIN SELECT * FROM orders WHERE customer_id = 42;\n\n-- Postgres: run the query, show actual vs estimated\nEXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)\n  SELECT * FROM orders WHERE customer_id = 42;\n\n-- MySQL\nEXPLAIN SELECT * FROM orders WHERE customer_id = 42;\nEXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;  -- MySQL 8.0.18+\n\n-- SQL Server\nSET STATISTICS IO ON;\nSET SHOWPLAN_TEXT ON;\nSELECT * FROM orders WHERE customer_id = 42;\n```\n\nKey fields to read in Postgres output:\n- `Seq Scan` \u002F `Index Scan` \u002F `Index Only Scan` — access method\n- `rows=N` — estimated rows; compare to `actual rows=N` in ANALYZE\n- `cost=start..total` — planner's cost units (not wall-clock ms)\n- `Buffers: shared hit=N read=N` — cache hits vs disk reads\n\n**Rule of thumb:** always use `EXPLAIN ANALYZE` (not just `EXPLAIN`) on\nslow queries — large discrepancies between estimated and actual rows reveal\nstale statistics, which is the root cause of most bad query plans.\n",{"id":28,"difficulty":29,"q":30,"a":31},"seq-scan-optimization","medium","You see a Seq Scan on a large table — what do you check first?","A sequential scan on a large table is the most common source of slow queries.\nWork through this checklist:\n\n1. **Is there an index on the WHERE column?** If not, create one.\n2. **Is the index being used?** Check `EXPLAIN` — if not, the planner may\n   think a seq scan is cheaper (see next steps).\n3. **Are statistics fresh?** Run `ANALYZE table_name` and re-check the plan.\n4. **Is selectivity high?** An index on a low-cardinality column (e.g. a\n   boolean) won't help if 80 % of rows match.\n5. **Is there a function in the WHERE clause?** `WHERE lower(email) = '...'`\n   won't use an index on `email` — create a functional index.\n6. **Is `random_page_cost` tuned for SSDs?** Default is 4.0 (HDD); set to\n   1.1 for SSD storage to make index scans more attractive.\n\n```sql\n-- Run ANALYZE to refresh statistics\nANALYZE orders;\n\n-- Tune cost parameters for SSD (session level)\nSET random_page_cost = 1.1;\nSET effective_cache_size = '4GB';\n\nEXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;\n```\n\n**Rule of thumb:** stale statistics cause the planner to choose bad plans\nmore often than missing indexes. Always `ANALYZE` before concluding that an\nindex isn't working.\n",{"id":33,"difficulty":29,"q":34,"a":35},"n-plus-one","What is the N+1 query problem and how do you fix it in SQL?","The **N+1 problem** occurs when code fetches a list of N parent records and\nthen issues one additional query per record to load its children — totalling\nN+1 round-trips instead of 1.\n\n```sql\n-- N+1: 1 query for customers + N queries for orders (one per customer)\nSELECT id, name FROM customers WHERE active = TRUE;  -- → N rows\n-- then in a loop:\nSELECT * FROM orders WHERE customer_id = ?;  -- N times\n\n-- Fix: JOIN or subquery — 1 round-trip total\nSELECT c.id, c.name, o.id AS order_id, o.total\nFROM   customers c\nLEFT JOIN orders o ON o.customer_id = c.id\nWHERE  c.active = TRUE;\n\n-- Or use a lateral join to get the latest order per customer\nSELECT c.id, c.name, latest.total\nFROM   customers c\nLEFT JOIN LATERAL (\n  SELECT total FROM orders WHERE customer_id = c.id\n  ORDER BY created_at DESC LIMIT 1\n) latest ON TRUE\nWHERE  c.active = TRUE;\n```\n\n**Rule of thumb:** if application logs show many nearly-identical queries\ndiffering only by a primary key value, you have an N+1 problem. Fix it with\na `JOIN`, an `IN (...)` batch fetch, or a lateral join.\n",{"id":37,"difficulty":14,"q":38,"a":39},"join-strategies","What are the three join strategies and when does the optimizer choose each?","The query planner chooses from three physical join algorithms:\n\n1. **Nested Loop Join** — for each row in the outer table, scan the inner\n   table (optionally using an index). O(N × M) worst case. Best when the\n   outer set is very small or an index on the inner table makes the inner\n   scan cheap.\n2. **Hash Join** — build a hash table from the smaller side, then probe it\n   with each row from the larger side. O(N + M). Best for large unsorted\n   inputs with no useful index.\n3. **Merge Join** — sort both sides by the join key, then merge in O(N + M).\n   Best when both sides are already sorted (e.g., both have B-tree index\n   scans in join-key order).\n\n```sql\n-- Force a specific strategy (Postgres — for testing only)\nSET enable_hashjoin = off;\nSET enable_mergejoin = off;\nEXPLAIN SELECT * FROM orders o JOIN customers c ON c.id = o.customer_id;\n-- Now forced to Nested Loop\n```\n\n**Rule of thumb:** trust the optimizer to pick the right join strategy.\nAdd an index on the join column of the larger table to make nested-loop\njoins efficient, and ensure statistics are current so the planner estimates\nset sizes correctly.\n",{"id":41,"difficulty":29,"q":42,"a":43},"statistics-analyze","What are table statistics and how do you update them?","**Statistics** are metadata the query planner uses to estimate how many\nrows a filter will return — column histograms, most common values, null\nfractions, and row counts. Stale statistics lead to bad execution plans.\n\n```sql\n-- Postgres: update statistics for a table (fast, non-blocking)\nANALYZE orders;\n\n-- Update all tables in the database\nANALYZE;\n\n-- Check when statistics were last collected\nSELECT relname, last_analyze, last_autoanalyze, n_live_tup, n_dead_tup\nFROM   pg_stat_user_tables\nWHERE  relname = 'orders';\n\n-- Increase statistics target for a column with non-uniform distribution\nALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;\n-- Default is 100 (100 histogram buckets); raise for skewed data.\nANALYZE orders;\n```\n\nPostgres runs `autovacuum` which calls `ANALYZE` automatically, but it may\nlag behind on high-churn tables.\n\n**Rule of thumb:** if a query plan suddenly gets worse after a large data\nload or bulk delete, run `ANALYZE table_name` immediately. For columns with\nvery skewed distributions (e.g., `status` with 99 % of rows as `active`),\nraise the statistics target.\n",{"id":45,"difficulty":29,"q":46,"a":47},"pagination-offset","Why is OFFSET-based pagination slow and what is the alternative?","`LIMIT N OFFSET M` forces the database to scan and discard the first M rows\nbefore returning N. As M grows, the query gets slower — page 1 000 of 50\nresults means scanning 50 000 rows.\n\n```sql\n-- Slow: O(offset) scan for every page\nSELECT * FROM posts ORDER BY created_at DESC LIMIT 50 OFFSET 10000;\n\n-- Fast: keyset (cursor) pagination — O(log n) via index\n-- Page 1:\nSELECT id, title, created_at FROM posts\nORDER BY created_at DESC, id DESC\nLIMIT 50;\n-- → last row: created_at = '2026-06-01 12:00:00', id = 9834\n\n-- Next page: use the last row's values as the cursor\nSELECT id, title, created_at FROM posts\nWHERE (created_at, id) \u003C ('2026-06-01 12:00:00', 9834)\nORDER BY created_at DESC, id DESC\nLIMIT 50;\n-- Index on (created_at DESC, id DESC) makes this O(log n)\n```\n\n**Rule of thumb:** use keyset (cursor) pagination for any list that can\ngrow large. Reserve `OFFSET` only for small datasets (\u003C 10 000 rows) or\nwhere jumping to arbitrary page numbers is a hard requirement.\n",{"id":49,"difficulty":29,"q":50,"a":51},"subquery-vs-join-perf","When does a subquery perform worse than a JOIN and how do you fix it?","In most modern databases (Postgres, MySQL 8+, SQL Server), the optimizer\ncan rewrite correlated subqueries as joins automatically. However,\n**correlated subqueries** that reference the outer query run once per outer\nrow — O(N) inner scans — and may not be rewritten:\n\n```sql\n-- Correlated subquery: potentially O(N) inner scans\nSELECT id, total,\n  (SELECT name FROM customers WHERE id = o.customer_id) AS customer_name\nFROM orders o;\n\n-- Equivalent JOIN: one pass, one lookup\nSELECT o.id, o.total, c.name AS customer_name\nFROM   orders o\nJOIN   customers c ON c.id = o.customer_id;\n\n-- Check with EXPLAIN: if you see \"SubPlan\" or \"Nested Loop\" with inner\n-- rows = outer rows, the subquery is not being optimised away.\n```\n\nNon-correlated subqueries in `IN (SELECT …)` are usually optimised to a\nsemi-join and are equivalent in performance to a `JOIN`.\n\n**Rule of thumb:** if `EXPLAIN` shows a `SubPlan` node repeating for every\nouter row, rewrite it as a `JOIN` or a lateral join. For `EXISTS` \u002F `IN`\nchecks, the optimizer almost always handles them correctly on its own.\n",{"id":53,"difficulty":29,"q":54,"a":55},"query-rewrite-or","How can OR in a WHERE clause hurt performance and how do you fix it?","An `OR` across different columns often prevents the optimizer from using a\nsingle index efficiently because no single index covers both branches.\n\n```sql\n-- This may cause a Seq Scan even if both columns are indexed separately\nSELECT * FROM users WHERE email = 'alice@example.com' OR phone = '555-1234';\n\n-- Fix 1: UNION ALL (each branch uses its own index)\nSELECT * FROM users WHERE email = 'alice@example.com'\nUNION ALL\nSELECT * FROM users WHERE phone = '555-1234' AND email \u003C> 'alice@example.com';\n\n-- Fix 2: Postgres bitmap index scan (auto-handles OR over different indexes)\n-- Postgres may already do this — check EXPLAIN for \"BitmapAnd\"\u002F\"BitmapOr\"\n\n-- Fix 3: denormalise into a single search column \u002F use full-text search\n```\n\n**Rule of thumb:** use `EXPLAIN` to check whether an `OR` query is doing\na full scan. If so, split it into a `UNION ALL` so each branch can exploit\nits own index independently.\n",{"id":57,"difficulty":24,"q":58,"a":59},"avoid-select-star","Why should you avoid SELECT * in production queries?","`SELECT *` fetches every column from the table, including large `TEXT`,\n`BYTEA`, or `JSONB` columns that the query may not need. This increases\nnetwork transfer, memory use, and makes covering-index optimisations\nimpossible.\n\n```sql\n-- BAD: fetches all 40 columns including a 1 MB blob column\nSELECT * FROM products WHERE category_id = 5;\n\n-- GOOD: only the columns the caller actually needs\nSELECT id, name, price, stock FROM products WHERE category_id = 5;\n```\n\nAdditional reasons to avoid `SELECT *`:\n- Adding a column to the table silently changes what the query returns,\n  breaking application code that expects a fixed schema.\n- Prevents the planner from choosing an index-only scan.\n- Makes query intent unclear to future readers.\n\n**Rule of thumb:** always list columns explicitly in production queries.\n`SELECT *` is fine for ad-hoc exploration but should never appear in\napplication code or stored procedures.\n",{"id":61,"difficulty":14,"q":62,"a":63},"cte-performance-opt","Can CTEs hurt query performance and how?","In **Postgres pre-12**, CTEs were **optimisation fences** — the planner\nmaterialised (executed and stored) the CTE result before running the outer\nquery, preventing predicates from being pushed inside. This could cause full\nscans on the CTE that a plain subquery would have avoided.\n\n```sql\n-- Postgres \u003C 12: this CTE is materialised; the WHERE id = 42 is applied\n-- AFTER the full scan of orders inside the CTE\nWITH recent AS (\n  SELECT * FROM orders WHERE created_at > now() - INTERVAL '30 days'\n)\nSELECT * FROM recent WHERE id = 42;\n\n-- Postgres 12+: CTEs are inlined by default (no longer an optimisation fence)\n-- Force materialisation when you WANT the fence (e.g., to prevent repeated execution):\nWITH recent AS MATERIALIZED (\n  SELECT * FROM orders WHERE created_at > now() - INTERVAL '30 days'\n)\nSELECT * FROM recent WHERE id = 42;\n```\n\nMySQL and SQL Server have always inlined non-recursive CTEs.\n\n**Rule of thumb:** on Postgres 12+, CTEs behave like subqueries and are not\na performance concern. On older Postgres, replace CTEs with subqueries in\nthe `FROM` clause if `EXPLAIN` shows the CTE is preventing index use.\n",{"id":65,"difficulty":14,"q":66,"a":67},"vacuum-and-bloat","What is table bloat and how does VACUUM address it?","In Postgres's MVCC model, `UPDATE` and `DELETE` do not overwrite rows —\nthey mark old row versions as dead and write new versions. **Dead tuples**\naccumulate until `VACUUM` reclaims their space. Without regular vacuuming,\nthe table grows (bloat), sequential scans slow down, and indexes carry dead\nentries.\n\n```sql\n-- Check dead tuple accumulation\nSELECT relname, n_live_tup, n_dead_tup,\n       round(n_dead_tup::numeric \u002F NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1)\n         AS dead_pct,\n       last_vacuum, last_autovacuum\nFROM   pg_stat_user_tables\nORDER  BY n_dead_tup DESC\nLIMIT  10;\n\n-- Manual vacuum (reclaims space for reuse; does not shrink the file)\nVACUUM orders;\n\n-- Full vacuum (reclaims and shrinks the file; locks the table)\nVACUUM FULL orders;\n\n-- Rebuild indexes alongside\nVACUUM (ANALYZE, VERBOSE) orders;\n```\n\n**Rule of thumb:** rely on `autovacuum` for routine maintenance. Run\n`VACUUM ANALYZE` manually after large bulk deletes or updates. Only use\n`VACUUM FULL` on heavily bloated tables during maintenance windows — it\nacquires an exclusive lock.\n",{"id":69,"difficulty":29,"q":70,"a":71},"query-timeout","How do you set and enforce query timeouts in SQL?","Long-running queries can exhaust connection pools, hold locks, and degrade\nthe whole database. Most databases allow a maximum query duration:\n\n```sql\n-- Postgres: statement timeout (raises error if exceeded)\nSET statement_timeout = '5s';        -- session level\nSET LOCAL statement_timeout = '2s';  -- transaction level only\n\n-- Postgres: lock wait timeout (fail fast rather than queue behind a blocker)\nSET lock_timeout = '500ms';\n\n-- MySQL: per-query timeout (optimizer hint)\nSELECT \u002F*+ MAX_EXECUTION_TIME(3000) *\u002F * FROM orders WHERE customer_id = 42;\n\n-- SQL Server: per-connection timeout (set by client driver)\n-- In T-SQL:\nSET QUERY_GOVERNOR_COST_LIMIT 1000;  -- abort if estimated cost > 1000\n```\n\nIn application code, always set a reasonable timeout at the connection\nor query level — never leave it at infinity (the default).\n\n**Rule of thumb:** set `statement_timeout` to a value appropriate for the\ncontext: 5–30 s for OLTP API queries; longer for batch jobs. Use\n`lock_timeout` separately to fail fast on lock contention rather than\nqueuing indefinitely.\n",{"id":73,"difficulty":29,"q":74,"a":75},"missing-index-detection","How do you proactively find slow queries and missing indexes in production?","```sql\n-- Postgres: pg_stat_statements (requires extension) — top slow queries\nCREATE EXTENSION IF NOT EXISTS pg_stat_statements;\n\nSELECT query,\n       calls,\n       round(total_exec_time::numeric \u002F calls, 2) AS avg_ms,\n       round(total_exec_time::numeric, 0)         AS total_ms,\n       rows \u002F calls                               AS avg_rows\nFROM   pg_stat_statements\nORDER  BY total_exec_time DESC\nLIMIT  20;\n\n-- Postgres: pg_stat_user_tables — tables with heavy sequential scans\nSELECT relname, seq_scan, seq_tup_read,\n       idx_scan,\n       round(seq_scan::numeric \u002F NULLIF(seq_scan + idx_scan, 0) * 100, 1) AS seq_pct\nFROM   pg_stat_user_tables\nWHERE  seq_scan > 0\nORDER  BY seq_tup_read DESC\nLIMIT  10;\n\n-- MySQL: slow query log\nSET GLOBAL slow_query_log = 'ON';\nSET GLOBAL long_query_time = 1;  -- log queries > 1 second\n```\n\n**Rule of thumb:** enable `pg_stat_statements` (Postgres) or the slow\nquery log (MySQL\u002FSQL Server) in production from day one. Review the top-10\nqueries by total time weekly — optimising one heavily-called query often\nhas more impact than tuning ten rarely-run ones.\n",{"id":77,"difficulty":14,"q":78,"a":79},"partition-pruning","What is partition pruning and how does it improve query performance?","**Partition pruning** is the optimizer's ability to skip entire table\npartitions that cannot contain rows matching the query's `WHERE` clause.\nInstead of scanning all partitions, it reads only the ones that could\nhave relevant data.\n\n```sql\n-- Partitioned table (Postgres)\nCREATE TABLE events (\n  id         BIGINT NOT NULL,\n  created_at DATE   NOT NULL,\n  payload    JSONB\n) PARTITION BY RANGE (created_at);\n\nCREATE TABLE events_2026_q1 PARTITION OF events\n  FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');\nCREATE TABLE events_2026_q2 PARTITION OF events\n  FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');\n\n-- Query: optimizer prunes events_2026_q1 — only scans events_2026_q2\nEXPLAIN SELECT * FROM events WHERE created_at >= '2026-04-01';\n-- Plan shows: Seq Scan on events_2026_q2 (events_2026_q1 not mentioned)\n```\n\nPruning only works when the `WHERE` clause filters on the partition key\nwith a constant (not a function call or a join column).\n\n**Rule of thumb:** for pruning to work, the `WHERE` predicate on the\npartition key must be a literal or a parameter — not a function like\n`DATE_TRUNC(...)`. Check `EXPLAIN` to confirm partitions are being pruned.\n",{"id":81,"difficulty":29,"q":82,"a":83},"index-only-scan","What is an Index Only Scan and how do you enable it?","An **Index Only Scan** reads all needed data directly from the index without\ntouching the main table (heap). It is the fastest read path — no random\nheap I\u002FO at all.\n\nFor an Index Only Scan to be chosen:\n1. All columns in `SELECT`, `WHERE`, and `ORDER BY` must be in the index.\n2. The table's **visibility map** must show that pages are all-visible\n   (recently vacuumed). If many pages are not all-visible, Postgres falls\n   back to heap fetches.\n\n```sql\n-- Query: SELECT email FROM users WHERE created_at > '2026-01-01'\n-- Index needed: (created_at) INCLUDE (email)\nCREATE INDEX idx_users_created_email\n  ON users (created_at DESC)\n  INCLUDE (email);\n\nEXPLAIN (ANALYZE, BUFFERS)\n  SELECT email FROM users WHERE created_at > '2026-01-01';\n-- → Index Only Scan using idx_users_created_email (Heap Fetches: 0)\n\n-- If Heap Fetches > 0, run VACUUM to update the visibility map:\nVACUUM users;\n```\n\n**Rule of thumb:** convert an `Index Scan` to an `Index Only Scan` by\nadding the `SELECT`ed columns to the index via `INCLUDE`. Then ensure the\ntable is regularly vacuumed so the visibility map stays current.\n",15,null,{"description":11},"SQL query optimization interview questions — EXPLAIN ANALYZE, execution plans, statistics, join strategies, N+1 problem, pagination patterns, and tuning tips across Postgres, MySQL, and SQL Server.","sql\u002Fperformance\u002Fquery-optimization","Indexes & Performance","performance","2026-06-20","avdAD0y5v9vjkEh0IgAcWaowCwAcxFPScWt0qrM-2gw",[94,98],{"subtopic":95,"path":96,"order":97},"Indexes","\u002Fsql\u002Fperformance\u002Findexes",1,{"subtopic":6,"path":20,"order":12},{"path":100,"title":101},"\u002Fblog\u002Fsql-query-optimization-explain","SQL Query Optimization — Reading EXPLAIN, Fixing Slow Queries",1782244107394]