Q: When does a subquery perform worse than a JOIN and how do you fix it?

In most modern databases (Postgres, MySQL 8+, SQL Server), the optimizer can rewrite correlated subqueries as joins automatically. However, correlated subqueries that reference the outer query run once per outer row — O(N) inner scans — and may not be rewritten: Non-correlated subqueries in are usually optimised to a semi-join and are equivalent in performance to a . Rule of thumb: if shows a node repeating for every outer row, rewrite it as a or a lateral join. For / checks, the optimizer almost always handles them correctly on its own.

Q: How can OR in a WHERE clause hurt performance and how do you fix it?

An across different columns often prevents the optimizer from using a single index efficiently because no single index covers both branches. Rule of thumb: use to check whether an query is doing a full scan. If so, split it into a so each branch can exploit its own index independently.

Q: Why should you avoid SELECT * in production queries?

fetches every column from the table, including large , , or columns that the query may not need. This increases network transfer, memory use, and makes covering-index optimisations impossible. Additional reasons to avoid : - Adding a column to the table silently changes what the query returns, breaking application code that expects a fixed schema. - Prevents the planner from choosing an index-only scan. - Makes query intent unclear to future readers. Rule of thumb: always list columns explicitly in production queries. is fine for ad-hoc exploration but should never appear in application code or stored procedures.

Q: Can CTEs hurt query performance and how?

In Postgres pre-12, CTEs were optimisation fences — the planner materialised (executed and stored) the CTE result before running the outer query, preventing predicates from being pushed inside. This could cause full scans on the CTE that a plain subquery would have avoided. MySQL and SQL Server have always inlined non-recursive CTEs. Rule of thumb: on Postgres 12+, CTEs behave like subqueries and are not a performance concern. On older Postgres, replace CTEs with subqueries in the clause if shows the CTE is preventing index use.

Question 1

What does EXPLAIN do and how do you read its output?

Accepted Answer

shows the query execution plan the database chose — which indexes are used, what join strategies are applied, and the estimated cost and row counts at each step.  actually runs the query and adds real timings and row counts alongside the estimates. Key fields to read in Postgres output: -  /  /  — access method -  — estimated rows; compare to  in ANALYZE -  — planner's cost units (not wall-clock ms) -  — cache hits vs disk reads Rule of thumb: always use  (not just ) on slow queries — large discrepancies between estimated and actual rows reveal stale statistics, which is the root cause of most bad query plans.

Question 2

You see a Seq Scan on a large table — what do you check first?

Accepted Answer

A sequential scan on a large table is the most common source of slow queries. Work through this checklist:

Is there an index on the WHERE column? If not, create one.
Is the index being used? Check EXPLAIN — if not, the planner may think a seq scan is cheaper (see next steps).
Are statistics fresh? Run ANALYZE table_name and re-check the plan.
Is selectivity high? An index on a low-cardinality column (e.g. a boolean) won't help if 80 % of rows match.
Is there a function in the WHERE clause? WHERE lower(email) = '...' won't use an index on email — create a functional index.
Is random_page_cost tuned for SSDs? Default is 4.0 (HDD); set to 1.1 for SSD storage to make index scans more attractive.

-- Run ANALYZE to refresh statistics
ANALYZE orders;

-- Tune cost parameters for SSD (session level)
SET random_page_cost = 1.1;
SET effective_cache_size = '4GB';

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

Rule of thumb: stale statistics cause the planner to choose bad plans more often than missing indexes. Always ANALYZE before concluding that an index isn't working.

Question 3

What is the N+1 query problem and how do you fix it in SQL?

Accepted Answer

The N+1 problem occurs when code fetches a list of N parent records and then issues one additional query per record to load its children — totalling N+1 round-trips instead of 1. Rule of thumb: if application logs show many nearly-identical queries differing only by a primary key value, you have an N+1 problem. Fix it with a , an  batch fetch, or a lateral join.

Question 4

What are the three join strategies and when does the optimizer choose each?

Accepted Answer

The query planner chooses from three physical join algorithms:

Nested Loop Join — for each row in the outer table, scan the inner table (optionally using an index). O(N × M) worst case. Best when the outer set is very small or an index on the inner table makes the inner scan cheap.
Hash Join — build a hash table from the smaller side, then probe it with each row from the larger side. O(N + M). Best for large unsorted inputs with no useful index.
Merge Join — sort both sides by the join key, then merge in O(N + M). Best when both sides are already sorted (e.g., both have B-tree index scans in join-key order).

-- Force a specific strategy (Postgres — for testing only)
SET enable_hashjoin = off;
SET enable_mergejoin = off;
EXPLAIN SELECT * FROM orders o JOIN customers c ON c.id = o.customer_id;
-- Now forced to Nested Loop

Rule of thumb: trust the optimizer to pick the right join strategy. Add an index on the join column of the larger table to make nested-loop joins efficient, and ensure statistics are current so the planner estimates set sizes correctly.

Question 5

What are table statistics and how do you update them?

Accepted Answer

Statistics are metadata the query planner uses to estimate how many rows a filter will return — column histograms, most common values, null fractions, and row counts. Stale statistics lead to bad execution plans. Postgres runs  which calls  automatically, but it may lag behind on high-churn tables. Rule of thumb: if a query plan suddenly gets worse after a large data load or bulk delete, run  immediately. For columns with very skewed distributions (e.g.,  with 99 % of rows as ), raise the statistics target.

Question 6

Why is OFFSET-based pagination slow and what is the alternative?

Accepted Answer

forces the database to scan and discard the first M rows before returning N. As M grows, the query gets slower — page 1 000 of 50 results means scanning 50 000 rows. Rule of thumb: use keyset (cursor) pagination for any list that can grow large. Reserve  only for small datasets (< 10 000 rows) or where jumping to arbitrary page numbers is a hard requirement.

Question 7

When does a subquery perform worse than a JOIN and how do you fix it?

Accepted Answer

In most modern databases (Postgres, MySQL 8+, SQL Server), the optimizer can rewrite correlated subqueries as joins automatically. However, correlated subqueries that reference the outer query run once per outer row — O(N) inner scans — and may not be rewritten: Non-correlated subqueries in  are usually optimised to a semi-join and are equivalent in performance to a . Rule of thumb: if  shows a  node repeating for every outer row, rewrite it as a  or a lateral join. For  /  checks, the optimizer almost always handles them correctly on its own.

Question 8

How can OR in a WHERE clause hurt performance and how do you fix it?

Accepted Answer

An  across different columns often prevents the optimizer from using a single index efficiently because no single index covers both branches. Rule of thumb: use  to check whether an  query is doing a full scan. If so, split it into a  so each branch can exploit its own index independently.

Question 9

Why should you avoid SELECT * in production queries?

Accepted Answer

fetches every column from the table, including large , , or  columns that the query may not need. This increases network transfer, memory use, and makes covering-index optimisations impossible. Additional reasons to avoid : - Adding a column to the table silently changes what the query returns,   breaking application code that expects a fixed schema. - Prevents the planner from choosing an index-only scan. - Makes query intent unclear to future readers. Rule of thumb: always list columns explicitly in production queries.  is fine for ad-hoc exploration but should never appear in application code or stored procedures.

Question 10

Can CTEs hurt query performance and how?

Accepted Answer

In Postgres pre-12, CTEs were optimisation fences — the planner materialised (executed and stored) the CTE result before running the outer query, preventing predicates from being pushed inside. This could cause full scans on the CTE that a plain subquery would have avoided. MySQL and SQL Server have always inlined non-recursive CTEs. Rule of thumb: on Postgres 12+, CTEs behave like subqueries and are not a performance concern. On older Postgres, replace CTEs with subqueries in the  clause if  shows the CTE is preventing index use.

Question 11

What is table bloat and how does VACUUM address it?

Accepted Answer

In Postgres's MVCC model,  and  do not overwrite rows — they mark old row versions as dead and write new versions. Dead tuples accumulate until  reclaims their space. Without regular vacuuming, the table grows (bloat), sequential scans slow down, and indexes carry dead entries. Rule of thumb: rely on  for routine maintenance. Run  manually after large bulk deletes or updates. Only use  on heavily bloated tables during maintenance windows — it acquires an exclusive lock.

Question 12

How do you set and enforce query timeouts in SQL?

Accepted Answer

Long-running queries can exhaust connection pools, hold locks, and degrade the whole database. Most databases allow a maximum query duration: In application code, always set a reasonable timeout at the connection or query level — never leave it at infinity (the default). Rule of thumb: set  to a value appropriate for the context: 5–30 s for OLTP API queries; longer for batch jobs. Use  separately to fail fast on lock contention rather than queuing indefinitely.

Question 13

How do you proactively find slow queries and missing indexes in production?

Accepted Answer

Rule of thumb: enable  (Postgres) or the slow query log (MySQL/SQL Server) in production from day one. Review the top-10 queries by total time weekly — optimising one heavily-called query often has more impact than tuning ten rarely-run ones.

Question 14

What is partition pruning and how does it improve query performance?

Accepted Answer

Partition pruning is the optimizer's ability to skip entire table partitions that cannot contain rows matching the query's  clause. Instead of scanning all partitions, it reads only the ones that could have relevant data. Pruning only works when the  clause filters on the partition key with a constant (not a function call or a join column). Rule of thumb: for pruning to work, the  predicate on the partition key must be a literal or a parameter — not a function like . Check  to confirm partitions are being pruned.

Question 15

What is an Index Only Scan and how do you enable it?

Accepted Answer

An Index Only Scan reads all needed data directly from the index without touching the main table (heap). It is the fastest read path — no random heap I/O at all. For an Index Only Scan to be chosen: 1. All columns in , , and  must be in the index. 2. The table's visibility map must show that pages are all-visible    (recently vacuumed). If many pages are not all-visible, Postgres falls    back to heap fetches. Rule of thumb: convert an  to an  by adding the ed columns to the index via . Then ensure the table is regularly vacuumed so the visibility map stays current.

Query Optimization Interview Questions & Answers

What does EXPLAIN do and how do you read its output?

You see a Seq Scan on a large table — what do you check first?

What is the N+1 query problem and how do you fix it in SQL?

What are the three join strategies and when does the optimizer choose each?

What are table statistics and how do you update them?

When does a subquery perform worse than a JOIN and how do you fix it?

How can OR in a WHERE clause hurt performance and how do you fix it?

Why should you avoid SELECT * in production queries?

Can CTEs hurt query performance and how?

What is table bloat and how does VACUUM address it?

How do you set and enforce query timeouts in SQL?

How do you proactively find slow queries and missing indexes in production?

What is partition pruning and how does it improve query performance?

What is an Index Only Scan and how do you enable it?

More ways to practice

What does EXPLAIN do and how do you read its output?

You see a Seq Scan on a large table — what do you check first?

What is the N+1 query problem and how do you fix it in SQL?

What are the three join strategies and when does the optimizer choose each?

What are table statistics and how do you update them?

Why is OFFSET-based pagination slow and what is the alternative?

When does a subquery perform worse than a JOIN and how do you fix it?

How can OR in a WHERE clause hurt performance and how do you fix it?

Why should you avoid SELECT * in production queries?

Can CTEs hurt query performance and how?

What is table bloat and how does VACUUM address it?

How do you set and enforce query timeouts in SQL?

How do you proactively find slow queries and missing indexes in production?

What is partition pruning and how does it improve query performance?

What is an Index Only Scan and how do you enable it?

More Indexes & Performance interview questions

More ways to practice