[{"data":1,"prerenderedAt":2328},["ShallowReactive",2],{"hub-sql":3},{"framework":4,"topics":16,"qa":96},{"id":5,"description":6,"extension":7,"icon":8,"meta":9,"name":10,"order":11,"slug":12,"stem":13,"tier":14,"__hash__":15},"frameworks\u002Fframeworks\u002Fsql.yml","SQL interview questions on queries, joins and aggregation — essential for every backend, data and analytics interview.","yml","database",{},"SQL",4,"sql","frameworks\u002Fsql",1,"lpzsOj2p9p9W0Tctwc61nP-ulZAA80R5gJiyaZS6ZeI",[17,25,34,43,51,60,69,78,87],{"id":18,"description":19,"extension":7,"frameworkSlug":12,"meta":20,"name":21,"order":14,"slug":22,"stem":23,"__hash__":24},"topics\u002Ftopics\u002Fsql-basics.yml","SELECT, WHERE, JOINs and aggregation — the core SQL every interview expects you to know cold.",{},"Query Basics","basics","topics\u002Fsql-basics","HCjWR2CEk8Qa71ZK3hKymDtquizhpXvBFklpn4fxcuM",{"id":26,"description":27,"extension":7,"frameworkSlug":12,"meta":28,"name":29,"order":30,"slug":31,"stem":32,"__hash__":33},"topics\u002Ftopics\u002Fsql-subqueries.yml","Scalar, correlated and nested subqueries, IN\u002FEXISTS, derived tables and common table expressions — composing queries out of other queries.",{},"Subqueries & CTEs",2,"subqueries","topics\u002Fsql-subqueries","FnPGLIAjadsROHZZpt6bZrsS7t3mh0gS-lVF9UAW1tY",{"id":35,"description":36,"extension":7,"frameworkSlug":12,"meta":37,"name":38,"order":39,"slug":40,"stem":41,"__hash__":42},"topics\u002Ftopics\u002Fsql-window-functions.yml","OVER and PARTITION BY, ranking, LAG\u002FLEAD and frame clauses — per-row analytics that keep every row instead of collapsing groups.",{},"Window Functions",3,"window-functions","topics\u002Fsql-window-functions","E2b0Zeo68tkWufM6NH3u9_XbLvySYGeWBd_uC4vzY98",{"id":44,"description":45,"extension":7,"frameworkSlug":12,"meta":46,"name":47,"order":11,"slug":48,"stem":49,"__hash__":50},"topics\u002Ftopics\u002Fsql-schema.yml","Data types, CREATE\u002FALTER\u002FDROP, constraints and normalization — designing the tables your queries run against.",{},"Schema & Data Types","schema","topics\u002Fsql-schema","1Ub7A8Ja3JDWV7ZB4nWkJo6X-pd9IG0Dx0dLMwnRSvE",{"id":52,"description":53,"extension":7,"frameworkSlug":12,"meta":54,"name":55,"order":56,"slug":57,"stem":58,"__hash__":59},"topics\u002Ftopics\u002Fsql-dml.yml","INSERT, UPDATE, DELETE, UPSERT and views — changing the data and exposing it through saved queries.",{},"Modifying Data",5,"dml","topics\u002Fsql-dml","D7PgAya6q-kLOA4xB2C59Qc1jHN38F-54RXtgnWB_C8",{"id":61,"description":62,"extension":7,"frameworkSlug":12,"meta":63,"name":64,"order":65,"slug":66,"stem":67,"__hash__":68},"topics\u002Ftopics\u002Fsql-transactions.yml","ACID, COMMIT\u002FROLLBACK\u002FSAVEPOINT, isolation levels and locking — keeping data correct when many things happen at once.",{},"Transactions",6,"transactions","topics\u002Fsql-transactions","pG-ozhwxxK2VCNSH9_EBgJnYZq0NZ5rreHakRMgLy-8",{"id":70,"description":71,"extension":7,"frameworkSlug":12,"meta":72,"name":73,"order":74,"slug":75,"stem":76,"__hash__":77},"topics\u002Ftopics\u002Fsql-performance.yml","Indexes, EXPLAIN, query plans and optimization — why a query is slow and how to make it fast.",{},"Indexes & Performance",7,"performance","topics\u002Fsql-performance","t18iPx3n6b0VydJSeyDjwtk6Ips8LGJRk1xij8-HJK0",{"id":79,"description":80,"extension":7,"frameworkSlug":12,"meta":81,"name":82,"order":83,"slug":84,"stem":85,"__hash__":86},"topics\u002Ftopics\u002Fsql-functions.yml","String, numeric, date and conditional functions — the everyday built-ins for transforming values inside a query.",{},"Built-in Functions",8,"functions","topics\u002Fsql-functions","URYP9hB2ywnRDyd-PGG8-IF2ZiWvdYdWVL7D8bC3VZU",{"id":88,"description":89,"extension":7,"frameworkSlug":12,"meta":90,"name":91,"order":92,"slug":93,"stem":94,"__hash__":95},"topics\u002Ftopics\u002Fsql-security.yml","GRANT\u002FREVOKE, roles and SQL injection prevention — controlling who can do what and keeping queries safe from untrusted input.",{},"Security & Integrity",9,"security","topics\u002Fsql-security","CulSzfQJU1lvOsDtTE05Ikcb4-IEzgFAZpFzlff0ON0",[97,253,330,405,479,558,633,756,830,952,1036,1110,1185,1259,1334,1409,1516,1591,1724,1803,1878,1953,2097,2180,2254],{"id":98,"title":99,"body":100,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":108,"navigation":109,"order":14,"path":110,"questions":111,"questionsCount":246,"related":247,"seo":248,"seoDescription":249,"stem":250,"subtopic":99,"topic":21,"topicSlug":22,"updated":251,"__hash__":252},"qa\u002Fsql\u002Fbasics\u002Fjoins.md","Joins",{"type":101,"value":102,"toc":103},"minimark",[],{"title":104,"searchDepth":30,"depth":30,"links":105},"",[],"medium","md",{},true,"\u002Fsql\u002Fbasics\u002Fjoins",[112,117,121,125,130,134,138,142,146,150,154,158,162,166,170,174,178,182,186,190,194,198,202,206,210,214,218,222,226,230,234,238,242],{"id":113,"difficulty":114,"q":115,"a":116},"what-is-join","easy","What is a JOIN?","A JOIN combines rows from **two or more tables** into one result set, matching\nthem on a **related column** (typically a foreign key referencing a primary\nkey). Relational databases store data in *normalized* tables to avoid\nduplication; joins are how you stitch that data back together at query time.\n\n```sql\n-- users(id, name)  and  orders(id, user_id, total)\nSELECT users.name, orders.total\nFROM users\nJOIN orders ON orders.user_id = users.id;\n```\n\nThe `ON` clause is the **join condition** — it decides which rows from the left\ntable pair with which rows from the right. The *type* of join (INNER, LEFT,\netc.) then decides what to do with rows that have **no match**.\n",{"id":118,"difficulty":106,"q":119,"a":120},"inner-vs-outer","What is the difference between INNER JOIN and OUTER JOIN?","The difference is what happens to **unmatched rows**:\n\n- **`INNER JOIN`** returns **only** rows that have a match in *both* tables.\n  Rows with no counterpart are dropped from the result.\n- **`OUTER JOIN`** (LEFT \u002F RIGHT \u002F FULL) **keeps unmatched rows** from one or\n  both sides, filling the missing columns with **`NULL`**.\n\n```sql\n-- only users who have placed at least one order\nSELECT u.name, o.total\nFROM users u\nINNER JOIN orders o ON o.user_id = u.id;\n\n-- every user, with NULLs for those who never ordered\nSELECT u.name, o.total\nFROM users u\nLEFT OUTER JOIN orders o ON o.user_id = u.id;\n```\n\nThink of INNER as the **intersection** and OUTER as \"intersection **plus** the\nleftovers from the chosen side(s).\" (`INNER` and `OUTER` are optional keywords —\n`JOIN` alone means `INNER JOIN`, and `LEFT JOIN` means `LEFT OUTER JOIN`.)\n",{"id":122,"difficulty":106,"q":123,"a":124},"left-vs-right","What is the difference between LEFT and RIGHT JOIN?","Both are outer joins; they differ only in **which side is preserved**:\n\n- **`LEFT JOIN`** keeps **all rows from the left** (first) table, plus matching\n  rows from the right — unmatched right columns become `NULL`.\n- **`RIGHT JOIN`** keeps **all rows from the right** (second) table, plus\n  matches from the left.\n\nThey're **mirror images**: any RIGHT JOIN can be rewritten as a LEFT JOIN by\nswapping the table order, which is why teams often standardize on LEFT JOIN for\nreadability.\n\n```sql\n-- these two return the same rows\nSELECT u.name, o.id FROM users u LEFT  JOIN orders o ON o.user_id = u.id;\nSELECT u.name, o.id FROM orders o RIGHT JOIN users  u ON o.user_id = u.id;\n```\n",{"id":126,"difficulty":127,"q":128,"a":129},"self-join","hard","What is a self join?","A self join is a table **joined to itself**, using **table aliases** to treat\nthe one physical table as two logical ones. It's the standard way to relate\nrows *within* the same table — most classically, **hierarchies** where a row\npoints to another row in the same table.\n\n```sql\n-- employees(id, name, manager_id) where manager_id -> employees.id\nSELECT e.name AS employee, m.name AS manager\nFROM employees e\nLEFT JOIN employees m ON e.manager_id = m.id;\n```\n\nHere `e` is the \"employee\" view of the table and `m` is the \"manager\" view.\nUsing `LEFT JOIN` keeps top-level employees (whose `manager_id` is `NULL`) in\nthe result with a `NULL` manager. Aliases are **mandatory** — without them the\ncolumn references would be ambiguous.\n",{"id":131,"difficulty":106,"q":132,"a":133},"cross-join","What does a CROSS JOIN produce?","A `CROSS JOIN` returns the **Cartesian product**: every row of the first table\npaired with **every** row of the second, with **no `ON` condition**. If the\ntables have *M* and *N* rows, you get *M × N* rows back — so it grows fast.\n\n```sql\n-- generate every size\u002Fcolor combination\nSELECT s.label, c.name\nFROM sizes s\nCROSS JOIN colors c;   -- 4 sizes × 6 colors = 24 rows\n```\n\nUse it deliberately — for generating combinations, building calendars, or\ncreating test data. An **accidental** cross join (forgetting the join\ncondition, the dreaded \"comma join\" `FROM a, b`) is a common cause of\nrunaway result sets and slow queries.\n",{"id":135,"difficulty":127,"q":136,"a":137},"join-nulls","How do you find rows with no match using a join?","Use the **anti-join** pattern: `LEFT JOIN` the second table, then filter where\nits key **`IS NULL`**. Because unmatched rows get `NULL` in the right-hand\ncolumns, \"right key is NULL\" precisely selects the rows that had **no match**.\n\n```sql\n-- users who have never placed an order\nSELECT u.*\nFROM users u\nLEFT JOIN orders o ON o.user_id = u.id\nWHERE o.id IS NULL;\n```\n\nTwo things to get right: filter on a column that's **non-nullable in the source\ntable** (like the right table's primary key `o.id`) so a `NULL` there truly\nmeans \"no row matched,\" and remember you must use **`IS NULL`**, never\n`= NULL` — in SQL, `anything = NULL` evaluates to `unknown`, never `true`.\n`NOT EXISTS` is an equivalent and often clearer alternative.\n",{"id":139,"difficulty":127,"q":140,"a":141},"on-vs-where","What is the difference between the ON and WHERE clauses in a join?","`ON` defines **how rows are matched** (it runs *during* the join); `WHERE` filters\nthe **result** *after* the join. For **INNER** joins they're often\ninterchangeable, but for **OUTER** joins they behave very differently.\n\n```sql\n-- keeps all users; only joins orders with amount > 100\nSELECT u.name, o.amount\nFROM users u\nLEFT JOIN orders o ON o.user_id = u.id AND o.amount > 100;\n\n-- effectively INNER: WHERE drops users whose joined row is NULL\nSELECT u.name, o.amount\nFROM users u\nLEFT JOIN orders o ON o.user_id = u.id\nWHERE o.amount > 100;\n```\n\nPut conditions on the **joined (right) table** in `ON` to preserve outer rows; put\nthem in `WHERE` to filter the final result.\n",{"id":143,"difficulty":127,"q":144,"a":145},"left-join-where-trap","Why does filtering the right table in WHERE turn a LEFT JOIN into an INNER JOIN?","An outer join fills unmatched right-table columns with `NULL`. A `WHERE` condition\non those columns (other than `IS NULL`) evaluates to `unknown` for the unmatched\nrows and **removes them** — silently converting your LEFT JOIN into an INNER JOIN.\n\n```sql\n-- unmatched users have o.status = NULL -> WHERE drops them\nSELECT u.name, o.status\nFROM users u\nLEFT JOIN orders o ON o.user_id = u.id\nWHERE o.status = 'shipped';\n\n-- move the predicate into ON to keep all users\nLEFT JOIN orders o ON o.user_id = u.id AND o.status = 'shipped';\n```\n\nThis is one of the most common SQL bugs. Rule: predicates on the optional side\nbelong in `ON`.\n",{"id":147,"difficulty":106,"q":148,"a":149},"multiple-joins","How do you join three or more tables?","Chain `JOIN` clauses; each `ON` connects the new table to one already in the query.\nThe joins are applied left to right, building a progressively wider result.\n\n```sql\nSELECT u.name, o.id AS order_id, p.name AS product\nFROM users u\nJOIN orders o      ON o.user_id = u.id\nJOIN order_items i ON i.order_id = o.id\nJOIN products p    ON p.id = i.product_id;\n```\n\nEach join multiplies\u002Ffilters rows based on its matches, so a user with many orders\nand items appears on many rows. Mind the cardinality — joining several\none-to-many tables can explode row counts (the \"fan trap\").\n",{"id":151,"difficulty":106,"q":152,"a":153},"using-clause","What does the USING clause do?","`USING (col)` is shorthand for an equi-join when the join columns have the **same\nname** in both tables: `USING (user_id)` ≡ `ON a.user_id = b.user_id`. It also\n**merges** the shared column into one in the output.\n\n```sql\nSELECT name, amount\nFROM users\nJOIN orders USING (user_id);   -- one user_id column in the result\n```\n\nCleaner than repeating the column, but only works when names match exactly, and\nthe coalesced column can't be qualified with a table alias. `ON` is more explicit\nand flexible.\n",{"id":155,"difficulty":127,"q":156,"a":157},"natural-join","What is a NATURAL JOIN and why is it risky?","`NATURAL JOIN` automatically joins on **all columns with the same name** in both\ntables — no `ON` needed. It's dangerous because the join condition is **implicit**:\nadding a same-named column later (like `created_at`) silently changes the join.\n\n```sql\nSELECT * FROM users NATURAL JOIN orders;\n-- joins on EVERY shared column name — fragile and surprising\n```\n\nA new `updated_at` column on both tables would suddenly become part of the join\ncondition, breaking results with no error. Most teams **avoid** `NATURAL JOIN` in\nfavor of explicit `ON`\u002F`USING`.\n",{"id":159,"difficulty":106,"q":160,"a":161},"join-groupby","How do you combine a join with aggregation?","Join the tables, then `GROUP BY` the dimension you want to aggregate per, applying\naggregate functions to the joined rows. With outer joins, choose `COUNT(column)`\nvs `COUNT(*)` carefully (next question).\n\n```sql\nSELECT u.name, COUNT(o.id) AS order_count, COALESCE(SUM(o.amount), 0) AS total\nFROM users u\nLEFT JOIN orders o ON o.user_id = u.id\nGROUP BY u.id, u.name;\n```\n\nThe `LEFT JOIN` keeps users with zero orders (counted as 0). Every non-aggregated\nselected column must appear in `GROUP BY` (in standard SQL).\n",{"id":163,"difficulty":106,"q":164,"a":165},"duplicate-rows","Why do joins sometimes produce duplicate rows?","A join produces a row for **every matching pair**. If one side matches **multiple**\nrows on the other (a one-to-many relationship), the single-side values repeat\nacross those matches — not true duplicates, but multiplied rows.\n\n```sql\n-- a user with 3 orders appears on 3 rows\nSELECT u.name, o.id\nFROM users u\nJOIN orders o ON o.user_id = u.id;\n```\n\nTo collapse them, aggregate (`GROUP BY` + `COUNT`\u002F`SUM`), use `DISTINCT`, or\npre-aggregate the many-side in a subquery before joining. Joining two\none-to-many tables to the same parent multiplies rows (the fan trap).\n",{"id":167,"difficulty":127,"q":168,"a":169},"left-join-count-trap","What is the COUNT trap with LEFT JOIN?","`COUNT(*)` counts **rows**, including the NULL-filled row produced for an unmatched\nLEFT JOIN — so users with no orders wrongly count as 1. `COUNT(column)` ignores\n`NULL`s, giving the correct 0.\n\n```sql\nSELECT u.name,\n       COUNT(*)    AS wrong,   -- counts the NULL row -> 1 for zero-order users\n       COUNT(o.id) AS correct  -- ignores NULLs -> 0\nFROM users u\nLEFT JOIN orders o ON o.user_id = u.id\nGROUP BY u.id, u.name;\n```\n\nAlways `COUNT` a **non-nullable column from the joined table** (like its primary\nkey) when counting matches in an outer join.\n",{"id":171,"difficulty":106,"q":172,"a":173},"join-vs-subquery","When should you use a join vs a subquery?","Use a **join** when you need **columns from both tables** in the output. Use a\n**subquery** (especially `EXISTS`\u002F`IN`) when you only need to **filter** by the\nexistence of related rows, not return their columns.\n\n```sql\n-- join: need order data in the result\nSELECT u.name, o.amount FROM users u JOIN orders o ON o.user_id = u.id;\n\n-- subquery: only filter users who have any order\nSELECT name FROM users u WHERE EXISTS (\n  SELECT 1 FROM orders o WHERE o.user_id = u.id\n);\n```\n\nA join can produce duplicate rows when filtering by existence; `EXISTS` won't.\nModern optimizers often plan them similarly, so favor whichever is clearer.\n",{"id":175,"difficulty":127,"q":176,"a":177},"exists-vs-in","What is the difference between EXISTS and IN?","Both test membership, but: `EXISTS` stops at the **first match** (often faster for\nlarge\u002Fcorrelated subqueries) and handles `NULL`s safely. `IN` compares against a\nvalue list and has a **NULL trap** — `NOT IN` with any `NULL` in the list returns\nno rows.\n\n```sql\n-- if any user_id is NULL, NOT IN returns NOTHING\nSELECT * FROM users WHERE id NOT IN (SELECT user_id FROM orders);\n\n-- NOT EXISTS is NULL-safe\nSELECT * FROM users u WHERE NOT EXISTS (\n  SELECT 1 FROM orders o WHERE o.user_id = u.id\n);\n```\n\nPrefer `EXISTS`\u002F`NOT EXISTS` for correlated existence checks, especially when\n`NULL`s are possible.\n",{"id":179,"difficulty":106,"q":180,"a":181},"not-exists-antijoin","How do you write an anti-join?","An anti-join returns rows from the first table that have **no match** in the\nsecond. Two idioms: `NOT EXISTS`, or `LEFT JOIN ... WHERE right.key IS NULL`.\n\n```sql\n-- products never ordered (NOT EXISTS)\nSELECT p.* FROM products p\nWHERE NOT EXISTS (\n  SELECT 1 FROM order_items i WHERE i.product_id = p.id\n);\n\n-- equivalent LEFT JOIN form\nSELECT p.* FROM products p\nLEFT JOIN order_items i ON i.product_id = p.id\nWHERE i.product_id IS NULL;\n```\n\n`NOT EXISTS` is usually clearest and NULL-safe; the LEFT JOIN form can be faster\nwith the right indexes. Avoid `NOT IN` here due to its NULL trap.\n",{"id":183,"difficulty":127,"q":184,"a":185},"semi-join","What is a semi-join?","A semi-join returns rows from the first table that have **at least one match** in\nthe second — but **without** duplicating them per match and without returning the\nsecond table's columns. `EXISTS`\u002F`IN` express it.\n\n```sql\n-- users who have placed at least one order (each user once)\nSELECT u.* FROM users u\nWHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);\n```\n\nContrast with an inner join, which would repeat a user once per order. Semi-join =\n\"filter by existence.\" Anti-join = \"filter by non-existence.\" Databases have\ndedicated semi\u002Fanti-join operators for these.\n",{"id":187,"difficulty":106,"q":188,"a":189},"full-outer-join","What does a FULL OUTER JOIN do?","A `FULL OUTER JOIN` keeps **all rows from both tables**, matching where possible\nand filling the non-matching side with `NULL`s. It's the union of LEFT and RIGHT\nouter joins.\n\n```sql\nSELECT COALESCE(a.id, b.id) AS id, a.val AS left_val, b.val AS right_val\nFROM table_a a\nFULL OUTER JOIN table_b b ON a.id = b.id;\n```\n\nUseful for reconciliation (\"what's in either source\"), like comparing two datasets\nand finding rows present in one but not the other. Supported in PostgreSQL\u002FSQL\nServer\u002FOracle, but **not** in MySQL (which needs an emulation).\n",{"id":191,"difficulty":127,"q":192,"a":193},"emulate-full-outer","How do you emulate a FULL OUTER JOIN in MySQL?","MySQL lacks `FULL OUTER JOIN`, so you **UNION a LEFT JOIN with a RIGHT JOIN** (or a\nLEFT JOIN where the left key is NULL), using `UNION` to dedupe the overlapping\nmatched rows.\n\n```sql\nSELECT a.id, a.val, b.val FROM a LEFT JOIN b ON a.id = b.id\nUNION\nSELECT b.id, a.val, b.val FROM a RIGHT JOIN b ON a.id = b.id;\n```\n\n`UNION` (not `UNION ALL`) removes the duplicate matched rows that appear in both\nhalves. The first half gives all left rows + matches; the second adds the\nunmatched right rows.\n",{"id":195,"difficulty":106,"q":196,"a":197},"equi-non-equi","What is the difference between an equi-join and a non-equi join?","An **equi-join** matches with equality (`a.x = b.x`) — by far the most common. A\n**non-equi join** uses other operators (`\u003C`, `>`, `BETWEEN`, `!=`), useful for\nranges, bands, and comparisons.\n\n```sql\n-- non-equi: match each sale to its price tier by range\nSELECT s.amount, t.label\nFROM sales s\nJOIN tiers t ON s.amount BETWEEN t.min_amount AND t.max_amount;\n```\n\nNon-equi joins can match many rows and are costlier (no simple hash join), but\nthey're powerful for bucketing, gaps-and-islands, and \"find rows within a range\"\nproblems.\n",{"id":199,"difficulty":114,"q":200,"a":201},"join-multiple-columns","How do you join on multiple columns?","Combine the conditions in the `ON` clause with `AND` — all must match. This is\ncommon for **composite keys** or matching on several attributes.\n\n```sql\nSELECT *\nFROM order_items i\nJOIN inventory v\n  ON v.product_id = i.product_id\n AND v.warehouse_id = i.warehouse_id;\n```\n\nEvery `AND` condition tightens the match. If the columns share names across both\ntables, `USING (product_id, warehouse_id)` is a shorthand. Make sure composite-key\ncolumns are indexed together for performance.\n",{"id":203,"difficulty":127,"q":204,"a":205},"self-join-consecutive","How do you use a self join to compare consecutive rows?","Join a table to itself on a key offset by one (e.g. matching `id = id + 1`, or\nusing a date difference) to put each row next to its neighbor — useful for\ncomputing differences between sequential records.\n\n```sql\nSELECT a.day, b.sales - a.sales AS daily_change\nFROM daily a\nJOIN daily b ON b.day = a.day + INTERVAL '1 day';\n```\n\nModern SQL often replaces this with **window functions** (`LAG`\u002F`LEAD`), which are\ncleaner and faster, but the self-join technique is the classic approach and still\nappears in interviews.\n",{"id":207,"difficulty":127,"q":208,"a":209},"fan-trap","What is the fan trap (row multiplication) in joins?","The fan trap occurs when you join one parent to **two different one-to-many**\nchild tables. Their rows multiply against each other (a partial Cartesian\nproduct), inflating aggregates like `SUM`.\n\n```sql\n-- orders × shipments multiply; SUM(amount) is overcounted\nSELECT o.id, SUM(p.amount), SUM(s.weight)\nFROM orders o\nJOIN payments p  ON p.order_id = o.id\nJOIN shipments s ON s.order_id = o.id\nGROUP BY o.id;\n```\n\nFix by **pre-aggregating** each child in its own subquery before joining, so each\ncontributes a single row per parent. Always sanity-check counts when joining\nmultiple one-to-many tables.\n",{"id":211,"difficulty":127,"q":212,"a":213},"pre-aggregate","Why and how do you pre-aggregate before joining?","Pre-aggregating collapses a one-to-many child to **one row per key** in a subquery,\nso the subsequent join doesn't multiply rows or distort aggregates (avoiding the\nfan trap and double counting).\n\n```sql\nSELECT u.name, o.order_count, o.total\nFROM users u\nLEFT JOIN (\n  SELECT user_id, COUNT(*) AS order_count, SUM(amount) AS total\n  FROM orders GROUP BY user_id\n) o ON o.user_id = u.id;\n```\n\nNow each user joins to exactly one aggregated order row. This pattern also lets you\ncombine **multiple** independent aggregates without them multiplying together.\n",{"id":215,"difficulty":127,"q":216,"a":217},"join-indexes","How do indexes affect join performance?","Joins match rows on the `ON` columns, so an **index on the join key** (typically\nthe foreign key) lets the database find matches quickly instead of scanning the\nwhole table. Without it, joins degrade to slow full scans.\n\n```sql\nCREATE INDEX idx_orders_user_id ON orders(user_id); -- speeds up the join\nSELECT * FROM users u JOIN orders o ON o.user_id = u.id;\n```\n\nPrimary keys are indexed automatically, but **foreign keys often aren't** — a\nfrequent cause of slow joins. Index both sides of frequent join conditions, and\ncomposite indexes for multi-column joins.\n",{"id":219,"difficulty":127,"q":220,"a":221},"join-algorithms","What join algorithms do databases use?","The optimizer picks among three physical strategies based on data size and\nindexes:\n\n- **Nested loop join** — for each row in one table, look up matches in the other.\n  Great with an index on the inner table; small inputs.\n- **Hash join** — build a hash table on one input, probe with the other. Best for\n  large, unindexed equi-joins.\n- **Merge join** — sort both inputs on the key, then merge. Efficient when inputs\n  are already sorted\u002Findexed.\n\n```sql\nEXPLAIN SELECT * FROM users u JOIN orders o ON o.user_id = u.id;\n-- shows which join algorithm the planner chose\n```\n\nYou don't pick directly, but understanding them (and reading `EXPLAIN`) explains\nwhy a query is slow.\n",{"id":223,"difficulty":106,"q":224,"a":225},"coalesce-join","How do you handle NULLs from outer joins in the output?","Outer joins produce `NULL`s for unmatched rows. Use **`COALESCE`** to substitute a\ndefault (0, '', 'N\u002FA') so results are clean and aggregates behave.\n\n```sql\nSELECT u.name,\n       COALESCE(o.amount, 0)       AS amount,\n       COALESCE(o.status, 'none')  AS status\nFROM users u\nLEFT JOIN orders o ON o.user_id = u.id;\n```\n\n`COALESCE` returns the first non-NULL argument. It's essential after outer joins\nand aggregates (`COALESCE(SUM(x), 0)`) so missing data shows as a sensible value\nrather than `NULL`.\n",{"id":227,"difficulty":106,"q":228,"a":229},"distinct-dedupe","How do you remove duplicate rows from a join result?","A join that matches one-to-many repeats the single-side rows. Options to dedupe:\n`DISTINCT`, aggregation with `GROUP BY`, or restructuring to a semi-join\n(`EXISTS`) when you don't need the joined columns.\n\n```sql\n-- DISTINCT removes exact duplicate rows\nSELECT DISTINCT u.id, u.name\nFROM users u JOIN orders o ON o.user_id = u.id;\n\n-- better when you only want \"users with orders\":\nSELECT u.id, u.name FROM users u\nWHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);\n```\n\nPrefer `EXISTS` over `DISTINCT` when you're really filtering by existence —\n`DISTINCT` does extra sorting\u002Fwork to remove the duplicates the join created.\n",{"id":231,"difficulty":127,"q":232,"a":233},"lateral-join","What is a LATERAL join (or CROSS APPLY)?","A `LATERAL` join (Postgres) \u002F `CROSS APPLY` (SQL Server) lets a subquery in the\n`FROM` clause **reference columns from preceding tables** in the same `FROM` — so\nit runs per outer row. Ideal for \"top-N per group.\"\n\n```sql\n-- each user's 3 most recent orders\nSELECT u.name, o.id, o.created_at\nFROM users u\nCROSS JOIN LATERAL (\n  SELECT id, created_at FROM orders\n  WHERE user_id = u.id            -- references the outer u\n  ORDER BY created_at DESC LIMIT 3\n) o;\n```\n\nRegular subqueries can't see outer `FROM` columns; `LATERAL` can, making\nper-row\u002Fper-group derived tables possible.\n",{"id":235,"difficulty":127,"q":236,"a":237},"outer-join-chain","What happens when you chain outer joins across three tables?","Join order and type matter. A `LEFT JOIN` followed by an `INNER JOIN` on the\noptional table can **drop** the preserved rows, because the inner join requires a\nmatch the NULL-filled rows don't have.\n\n```sql\n-- the INNER JOIN re-filters out users with no orders\nSELECT u.name, oi.qty\nFROM users u\nLEFT JOIN orders o      ON o.user_id = u.id\nJOIN order_items oi     ON oi.order_id = o.id;  -- inner -> drops NULL o.id\n\n-- keep it LEFT all the way down\nLEFT JOIN order_items oi ON oi.order_id = o.id;\n```\n\nTo preserve outer rows through a chain, every downstream join on the optional path\nmust also be an outer join.\n",{"id":239,"difficulty":114,"q":240,"a":241},"join-vs-union","What is the difference between a JOIN and a UNION?","A **JOIN** combines tables **horizontally** — adding columns by matching rows. A\n**UNION** combines result sets **vertically** — stacking rows from queries that\nhave the **same columns**.\n\n```sql\n-- JOIN: wider rows (user + their order)\nSELECT u.name, o.amount FROM users u JOIN orders o ON o.user_id = u.id;\n\n-- UNION: more rows (current + archived orders)\nSELECT id, amount FROM orders\nUNION ALL\nSELECT id, amount FROM archived_orders;\n```\n\n`UNION` dedupes; `UNION ALL` keeps duplicates (and is faster). Use JOIN to relate\ntables, UNION to append similar datasets.\n",{"id":243,"difficulty":127,"q":244,"a":245},"range-join","How do you join rows within a date or value range?","Use a non-equi join with `BETWEEN` or comparison operators in `ON` — matching each\nrow to all rows of the other table that fall within a range. Common for\ntime-windows, price tiers, and IP-range lookups.\n\n```sql\n-- attribute each event to the active campaign window it falls in\nSELECT e.id, c.name\nFROM events e\nJOIN campaigns c\n  ON e.occurred_at BETWEEN c.start_at AND c.end_at;\n```\n\nRange joins can match multiple rows and don't use simple hash joins, so they're\nheavier — index the range columns, and ensure ranges don't unintentionally overlap\n(which multiplies rows).\n",33,null,{"description":104},"SQL join interview questions — inner vs outer joins, left vs right, self joins and how NULLs behave, with examples.","sql\u002Fbasics\u002Fjoins","2026-06-17","hx7juK0-FDuZFdlHD8aTS-3dvNEZq9Ae8NHixKBMIUE",{"id":254,"title":255,"body":256,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":260,"navigation":109,"order":14,"path":261,"questions":262,"questionsCount":323,"related":247,"seo":324,"seoDescription":325,"stem":326,"subtopic":327,"topic":55,"topicSlug":57,"updated":328,"__hash__":329},"qa\u002Fsql\u002Fdml\u002Finsert-update-delete.md","Insert Update Delete",{"type":101,"value":257,"toc":258},[],{"title":104,"searchDepth":30,"depth":30,"links":259},[],{},"\u002Fsql\u002Fdml\u002Finsert-update-delete",[263,267,271,275,279,283,287,291,295,299,303,307,311,315,319],{"id":264,"difficulty":114,"q":265,"a":266},"basic-insert","What is the syntax for inserting a row into a table?","`INSERT INTO` adds one or more rows to a table. You list the target columns\nexplicitly so the statement stays correct if the table schema changes later.\n\n```sql\n-- Single row, explicit column list (preferred)\nINSERT INTO users (name, email, created_at)\nVALUES ('Alice', 'alice@example.com', now());\n\n-- Multiple rows in one statement (more efficient than many single inserts)\nINSERT INTO users (name, email, created_at)\nVALUES\n  ('Bob',   'bob@example.com',   now()),\n  ('Carol', 'carol@example.com', now());\n```\n\nOmitting the column list makes the `VALUES` order dependent on the physical\ncolumn order, which breaks silently when a column is added.\n\n**Rule of thumb:** always list target columns explicitly in every `INSERT`\nstatement, even when inserting into all columns.\n",{"id":268,"difficulty":114,"q":269,"a":270},"insert-select","How do you insert rows from another table?","`INSERT INTO … SELECT` copies rows produced by a `SELECT` directly into the\ntarget table without materializing them in the application.\n\n```sql\n-- Copy active users to an archive table\nINSERT INTO users_archive (id, name, email, archived_at)\nSELECT id, name, email, now()\nFROM   users\nWHERE  deactivated_at IS NOT NULL;\n\n-- Create a staging table from a production table\nINSERT INTO orders_staging\nSELECT * FROM orders WHERE created_at >= '2026-01-01';\n```\n\nThe `SELECT` can be as complex as needed — it can join tables, apply\nfunctions, filter, and aggregate. The column count and types must match\nthe target column list.\n\n**Rule of thumb:** prefer `INSERT … SELECT` over fetching rows in\napplication code and re-inserting them — it keeps the data movement inside\nthe database and avoids round-trip latency.\n",{"id":272,"difficulty":106,"q":273,"a":274},"upsert-on-conflict","What is an UPSERT and how do you write one in SQL?","An **UPSERT** (update-or-insert) inserts a row if it does not already exist,\nor updates it if it does — determined by a unique\u002Fprimary key conflict.\n\n```sql\n-- Postgres: ON CONFLICT DO UPDATE (INSERT ... ON CONFLICT)\nINSERT INTO page_views (page_id, date, views)\nVALUES (42, '2026-06-20', 1)\nON CONFLICT (page_id, date)\nDO UPDATE SET views = page_views.views + EXCLUDED.views;\n\n-- MySQL: INSERT ... ON DUPLICATE KEY UPDATE\nINSERT INTO page_views (page_id, date, views)\nVALUES (42, '2026-06-20', 1)\nON DUPLICATE KEY UPDATE views = views + VALUES(views);\n\n-- SQL Server: MERGE statement\nMERGE page_views AS target\nUSING (VALUES (42, '2026-06-20', 1)) AS src (page_id, date, views)\n  ON target.page_id = src.page_id AND target.date = src.date\nWHEN MATCHED     THEN UPDATE SET views = target.views + src.views\nWHEN NOT MATCHED THEN INSERT (page_id, date, views) VALUES (src.page_id, src.date, src.views);\n```\n\n**Rule of thumb:** use UPSERT for idempotent writes — counters, caches,\nconfiguration, event deduplication. It avoids the race condition of a\nseparate SELECT-then-INSERT in application code.\n",{"id":276,"difficulty":114,"q":277,"a":278},"on-conflict-do-nothing","How do you silently ignore a duplicate row on INSERT?","Use `ON CONFLICT DO NOTHING` (Postgres) or `INSERT IGNORE` (MySQL) to\nskip rows that would violate a unique constraint rather than raising an error.\n\n```sql\n-- Postgres\nINSERT INTO tags (name)\nVALUES ('sql'), ('database'), ('sql')   -- duplicate 'sql'\nON CONFLICT (name) DO NOTHING;\n-- Inserts 'database'; skips the second 'sql' silently\n\n-- MySQL\nINSERT IGNORE INTO tags (name)\nVALUES ('sql'), ('database'), ('sql');\n```\n\nIn SQL Server, the closest equivalent is `MERGE … WHEN NOT MATCHED THEN INSERT`.\n\n**Rule of thumb:** use `DO NOTHING` for idempotent seed data or batch\nimports where duplicates are expected and harmless. Avoid it when you need\nto know how many rows were actually inserted.\n",{"id":280,"difficulty":114,"q":281,"a":282},"basic-update","What is the syntax for updating rows in SQL?","`UPDATE` modifies column values in rows that satisfy the `WHERE` condition.\n**Omitting `WHERE` updates every row in the table.**\n\n```sql\n-- Update a single row by PK\nUPDATE users\nSET    name = 'Alice Smith', updated_at = now()\nWHERE  id = 1;\n\n-- Update multiple rows matching a condition\nUPDATE orders\nSET    status = 'archived'\nWHERE  created_at \u003C now() - INTERVAL '2 years';\n\n-- Update using a value from the same row (relative update)\nUPDATE products\nSET    stock = stock - 1\nWHERE  id = 99 AND stock > 0;\n```\n\n**Rule of thumb:** always write and test the `WHERE` clause of an `UPDATE`\nas a `SELECT` first to confirm exactly which rows will be affected before\ncommitting the change.\n",{"id":284,"difficulty":106,"q":285,"a":286},"update-join","How do you update rows using values from another table?","Updating based on a related table requires different syntax per database.\n\n```sql\n-- Postgres: UPDATE ... FROM\nUPDATE orders o\nSET    discount = c.tier_discount\nFROM   customers c\nWHERE  c.id = o.customer_id\n  AND  c.tier = 'gold';\n\n-- MySQL: UPDATE with JOIN\nUPDATE orders o\nJOIN   customers c ON c.id = o.customer_id\nSET    o.discount = c.tier_discount\nWHERE  c.tier = 'gold';\n\n-- SQL Server: UPDATE with FROM\u002FJOIN\nUPDATE o\nSET    o.discount = c.tier_discount\nFROM   orders o\nJOIN   customers c ON c.id = o.customer_id\nWHERE  c.tier = 'gold';\n```\n\n**Rule of thumb:** always alias both tables and double-check the join\ncondition — a wrong `ON` clause can fan out rows and cause each target row\nto be updated multiple times (non-deterministically in Postgres).\n",{"id":288,"difficulty":114,"q":289,"a":290},"basic-delete","What is the DELETE statement and what happens if you omit WHERE?","`DELETE FROM` removes rows that match the `WHERE` predicate. Without\n`WHERE`, **all rows in the table are deleted** (the table structure remains).\n\n```sql\n-- Delete a single row\nDELETE FROM users WHERE id = 42;\n\n-- Delete with a condition\nDELETE FROM sessions WHERE expires_at \u003C now();\n\n-- Delete all rows (use TRUNCATE for large tables — it's faster)\nDELETE FROM staging_data;\n```\n\nUnlike `TRUNCATE`, `DELETE` fires `ON DELETE` triggers, respects FK\n`ON DELETE CASCADE` \u002F `RESTRICT` rules, and is logged row-by-row — making\nit slower but more controllable.\n\n**Rule of thumb:** run `SELECT * FROM … WHERE \u003Ccondition>` before any\n`DELETE` to preview what will be removed. Wrap destructive deletes in a\ntransaction and `ROLLBACK` first to verify the row count.\n",{"id":292,"difficulty":106,"q":293,"a":294},"delete-join","How do you delete rows based on a condition in a related table?","```sql\n-- Postgres: DELETE ... USING\nDELETE FROM order_items oi\nUSING  orders o\nWHERE  oi.order_id = o.id\n  AND  o.status = 'cancelled';\n\n-- MySQL: DELETE with JOIN\nDELETE oi\nFROM   order_items oi\nJOIN   orders o ON o.id = oi.order_id\nWHERE  o.status = 'cancelled';\n\n-- SQL Server: DELETE with FROM\u002FJOIN\nDELETE oi\nFROM   order_items oi\nJOIN   orders o ON o.id = oi.order_id\nWHERE  o.status = 'cancelled';\n\n-- ANSI alternative using a subquery (all databases)\nDELETE FROM order_items\nWHERE  order_id IN (\n  SELECT id FROM orders WHERE status = 'cancelled'\n);\n```\n\n**Rule of thumb:** the subquery form (`WHERE … IN (SELECT …)`) is the\nmost portable across databases; use the `JOIN`-based form when the subquery\nis slow (the optimizer may not push it down efficiently).\n",{"id":296,"difficulty":106,"q":297,"a":298},"returning-clause","What does the RETURNING clause do in Postgres?","`RETURNING` appends a `SELECT`-like clause to `INSERT`, `UPDATE`, or\n`DELETE` and returns the affected rows — without a separate query. This\neliminates a round-trip and avoids race conditions from a subsequent\n`SELECT`.\n\n```sql\n-- Get the generated ID after INSERT\nINSERT INTO orders (customer_id, total)\nVALUES (7, 149.99)\nRETURNING id, created_at;\n\n-- See the old values before UPDATE overwrites them\nUPDATE products\nSET    price = price * 0.9\nWHERE  category = 'clearance'\nRETURNING id, name, price AS new_price;\n\n-- Confirm which rows were deleted\nDELETE FROM sessions\nWHERE  expires_at \u003C now()\nRETURNING id, user_id;\n```\n\nMySQL uses `LAST_INSERT_ID()` for inserts only. SQL Server uses the\n`OUTPUT` clause (`OUTPUT INSERTED.id, DELETED.old_col`).\n\n**Rule of thumb:** use `RETURNING` (Postgres) or `OUTPUT` (SQL Server)\nto retrieve generated IDs and audit old\u002Fnew values atomically — never\nfollow a mutation with a separate `SELECT` when the database can return\nthe data in the same statement.\n",{"id":300,"difficulty":106,"q":301,"a":302},"soft-delete","What is soft delete and how is it implemented in SQL?","**Soft delete** marks rows as deleted without physically removing them,\npreserving history and enabling recovery. Instead of `DELETE`, you\n`UPDATE` a `deleted_at` timestamp (or a boolean flag).\n\n```sql\n-- Schema: nullable deleted_at column\nALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ;\n\n-- Soft delete\nUPDATE users SET deleted_at = now() WHERE id = 42;\n\n-- Query active rows only\nSELECT * FROM users WHERE deleted_at IS NULL;\n\n-- Partial unique index so email stays unique among active users only\nCREATE UNIQUE INDEX uq_users_email_active\n  ON users (email)\n  WHERE deleted_at IS NULL;\n\n-- View to hide deleted rows from most queries\nCREATE VIEW active_users AS\n  SELECT * FROM users WHERE deleted_at IS NULL;\n```\n\n**Rule of thumb:** use soft delete when you need an audit trail, recoverability,\nor regulatory retention. Add a partial index on `(natural_key) WHERE deleted_at IS NULL`\nto keep uniqueness constraints working correctly.\n",{"id":304,"difficulty":127,"q":305,"a":306},"batch-delete-large-table","How do you safely delete millions of rows from a large table?","Deleting millions of rows in one statement locks large portions of the\ntable, fills the transaction log, and may time out. The solution is to\n**delete in small batches** inside a loop.\n\n```sql\n-- Postgres: batch delete loop (run from application or a DO block)\nDO $$\nDECLARE deleted_count INT;\nBEGIN\n  LOOP\n    DELETE FROM events\n    WHERE  id IN (\n      SELECT id FROM events\n      WHERE  created_at \u003C now() - INTERVAL '1 year'\n      LIMIT  1000          -- batch size\n    );\n    GET DIAGNOSTICS deleted_count = ROW_COUNT;\n    EXIT WHEN deleted_count = 0;\n    PERFORM pg_sleep(0.05); -- brief pause to release locks\n  END LOOP;\nEND $$;\n```\n\nEach batch commits independently, so the table remains available for reads\nand writes between batches. The lock held per batch is small and short-lived.\n\n**Rule of thumb:** never delete more than ~5 000–10 000 rows per transaction\non a live table. Use batching with a pause between iterations to keep\nreplication lag and lock contention low.\n",{"id":308,"difficulty":114,"q":309,"a":310},"truncate-vs-delete","When should you use TRUNCATE instead of DELETE?","`TRUNCATE` removes all rows by deallocating data pages rather than\nlogging each deletion — it is orders of magnitude faster than\n`DELETE FROM table` with no `WHERE` clause.\n\n```sql\n-- Slow: logs every row deletion\nDELETE FROM staging_orders;\n\n-- Fast: drops and recreates the data pages\nTRUNCATE TABLE staging_orders;\n\n-- TRUNCATE also resets identity\u002Fsequence counters\nTRUNCATE TABLE events RESTART IDENTITY;\n\n-- TRUNCATE multiple tables atomically (Postgres)\nTRUNCATE TABLE staging_orders, staging_items RESTART IDENTITY CASCADE;\n```\n\nTrade-offs vs `DELETE`:\n- `TRUNCATE` does not fire row-level triggers.\n- `TRUNCATE` cannot have a `WHERE` clause — it always removes all rows.\n- In MySQL, `TRUNCATE` is DDL and auto-commits; in Postgres it is transactional.\n\n**Rule of thumb:** use `TRUNCATE` to reset staging, temp, or test-fixture\ntables between runs. Use `DELETE` when you need `WHERE` filtering, trigger\nfiring, or row-count reporting.\n",{"id":312,"difficulty":106,"q":313,"a":314},"update-vs-replace","What is the difference between UPDATE and REPLACE in MySQL?","In MySQL, `REPLACE INTO` works like an UPSERT but via a **delete-then-insert**\nstrategy rather than an in-place update. If a row with the same primary key\n(or unique key) exists, MySQL deletes it and inserts the new row. All columns\nnot listed in the `REPLACE` get their default values — not the existing values.\n\n```sql\n-- Suppose users(id PK, name, email, created_at DEFAULT now())\nREPLACE INTO users (id, name, email)\nVALUES (1, 'Alice Updated', 'alice@example.com');\n-- If id=1 existed: deletes old row, inserts new row.\n-- created_at will be reset to now(), NOT kept from the old row!\n\n-- Safer alternative that preserves untouched columns:\nINSERT INTO users (id, name, email)\nVALUES (1, 'Alice Updated', 'alice@example.com')\nON DUPLICATE KEY UPDATE\n  name  = VALUES(name),\n  email = VALUES(email);\n```\n\n**Rule of thumb:** avoid `REPLACE INTO` — it silently wipes columns you\ndidn't list and can cause unexpected data loss. Use `INSERT … ON DUPLICATE KEY UPDATE`\ninstead for in-place upserts.\n",{"id":316,"difficulty":106,"q":317,"a":318},"conditional-update","How do you update a column to different values based on a condition?","Use a `CASE` expression inside `SET` to apply different values depending\non each row's state — more efficient than multiple `UPDATE` statements.\n\n```sql\n-- Apply tiered discounts in a single pass\nUPDATE orders\nSET discount_pct = CASE\n  WHEN total >= 500  THEN 20\n  WHEN total >= 200  THEN 10\n  WHEN total >= 100  THEN 5\n  ELSE 0\nEND\nWHERE status = 'pending';\n\n-- Flip a boolean flag\nUPDATE tasks\nSET    is_done = CASE WHEN is_done THEN FALSE ELSE TRUE END\nWHERE  id = 7;\n```\n\n**Rule of thumb:** use `SET col = CASE … END` when multiple rows need\ndifferent values updated in one pass. It avoids multiple round-trips and\nis atomic — all rows are updated in the same transaction.\n",{"id":320,"difficulty":127,"q":321,"a":322},"cte-with-dml","How do you use a CTE with INSERT, UPDATE, or DELETE?","A **writeable CTE** (Postgres, SQL Server) lets you stage intermediate\nresults or chain mutations. The CTE body can be a `DELETE` or `UPDATE`\nwith `RETURNING`, whose output feeds the outer `INSERT`.\n\n```sql\n-- Postgres: move rows from one table to another atomically\nWITH deleted AS (\n  DELETE FROM job_queue\n  WHERE  status = 'pending'\n    AND  locked_by IS NULL\n  LIMIT  10\n  RETURNING *\n)\nINSERT INTO job_archive\nSELECT *, now() AS archived_at\nFROM   deleted;\n\n-- CTE as a data source for UPDATE\nWITH price_hike AS (\n  SELECT id, price * 1.05 AS new_price\n  FROM   products\n  WHERE  category = 'premium'\n)\nUPDATE products p\nSET    price = ph.new_price\nFROM   price_hike ph\nWHERE  p.id = ph.id;\n```\n\n**Rule of thumb:** use writeable CTEs to express complex multi-step\nmutations as a single atomic statement. They are far safer than chaining\nseparate DML statements across multiple round-trips where partial failure\ncan leave data in an inconsistent state.\n",15,{"description":104},"SQL INSERT, UPDATE, DELETE interview questions — syntax, bulk inserts, UPSERT, conditional updates, cascading deletes, RETURNING, and safe mutation patterns across Postgres, MySQL, and SQL Server.","sql\u002Fdml\u002Finsert-update-delete","INSERT, UPDATE & DELETE","2026-06-20","hEz-MQNAYp16Bxz5JgoePPGZKXw_OaszDPRiwd6FnSY",{"id":331,"title":332,"body":333,"description":104,"difficulty":114,"extension":107,"framework":10,"frameworkSlug":12,"meta":337,"navigation":109,"order":14,"path":338,"questions":339,"questionsCount":323,"related":247,"seo":400,"seoDescription":401,"stem":402,"subtopic":403,"topic":82,"topicSlug":84,"updated":328,"__hash__":404},"qa\u002Fsql\u002Ffunctions\u002Fstring-numeric-functions.md","String Numeric Functions",{"type":101,"value":334,"toc":335},[],{"title":104,"searchDepth":30,"depth":30,"links":336},[],{},"\u002Fsql\u002Ffunctions\u002Fstring-numeric-functions",[340,344,348,352,356,360,364,368,372,376,380,384,388,392,396],{"id":341,"difficulty":114,"q":342,"a":343},"concat","How do you concatenate strings in SQL?","SQL offers both a standard operator and functions for string concatenation.\nThe `||` operator is ANSI standard; `CONCAT()` is supported everywhere and\nhandles NULLs differently.\n\n```sql\n-- ANSI standard: || operator (Postgres, SQL Server 2012+, SQLite)\nSELECT first_name || ' ' || last_name AS full_name FROM users;\n\n-- NULL propagation: NULL || anything = NULL\nSELECT 'Hello' || NULL;  -- → NULL\n\n-- CONCAT() function: NULLs treated as empty strings\nSELECT CONCAT(first_name, ' ', last_name) AS full_name FROM users;\n-- If first_name IS NULL → ' Smith' (NULL coerced to '')\n\n-- MySQL also has CONCAT_WS (with separator — skips NULLs)\nSELECT CONCAT_WS(' ', first_name, middle_name, last_name) AS full_name;\n-- skips NULL middle_name without leaving a double space\n```\n\n**Rule of thumb:** use `CONCAT_WS` when joining fields that may be NULL\nand you do not want extra separators. Use `||` in Postgres for simple\nconcatenation; use `CONCAT()` in MySQL-compatible code.\n",{"id":345,"difficulty":114,"q":346,"a":347},"substring","How do you extract a part of a string?","`SUBSTRING` (or `SUBSTR`) extracts a portion of a string by position and\noptional length.\n\n```sql\n-- Standard SQL: SUBSTRING(string FROM start FOR length)\nSELECT SUBSTRING('Hello World' FROM 7 FOR 5);   -- → 'World'\n\n-- Shorthand (all databases):\nSELECT SUBSTRING('Hello World', 7, 5);           -- → 'World'\nSELECT SUBSTR('Hello World', 7, 5);              -- → 'World' (MySQL, Postgres)\n\n-- Extract from end: use negative position in MySQL\nSELECT SUBSTRING('Hello World', -5);             -- → 'World' (MySQL only)\n\n-- Postgres: RIGHT \u002F LEFT shortcuts\nSELECT LEFT('Hello World', 5);   -- → 'Hello'\nSELECT RIGHT('Hello World', 5);  -- → 'World'\n\n-- Regex-based extraction (Postgres)\nSELECT SUBSTRING('Order #12345' FROM '[0-9]+');  -- → '12345'\n```\n\n**Rule of thumb:** use `LEFT(str, n)` and `RIGHT(str, n)` for simple\nprefix\u002Fsuffix extraction — they are clearer than `SUBSTRING`. Use the\nregex form of `SUBSTRING` in Postgres for pattern-based extraction.\n",{"id":349,"difficulty":114,"q":350,"a":351},"upper-lower","How do you change string case in SQL?","`UPPER()` and `LOWER()` convert all characters in a string to upper or lower\ncase. `INITCAP()` (Postgres, Oracle) title-cases each word.\n\n```sql\nSELECT UPPER('hello world');    -- → 'HELLO WORLD'\nSELECT LOWER('HELLO WORLD');    -- → 'hello world'\nSELECT INITCAP('hello world');  -- → 'Hello World'  (Postgres\u002FOracle)\n\n-- Common use: case-insensitive comparison\nSELECT * FROM users WHERE LOWER(email) = 'alice@example.com';\n-- Better: create a functional index so this is fast:\n-- CREATE INDEX idx_users_email_lower ON users (LOWER(email));\n```\n\n**Rule of thumb:** for case-insensitive searches, use `LOWER()` with a\nmatching functional index rather than `ILIKE` (Postgres) on large tables,\nbecause `LOWER(col) = value` can be indexed but `col ILIKE value` typically\ncannot.\n",{"id":353,"difficulty":114,"q":354,"a":355},"trim","How do you remove whitespace or specific characters from a string?","`TRIM`, `LTRIM`, and `RTRIM` remove characters (default: spaces) from the\nstart, end, or both ends of a string.\n\n```sql\nSELECT TRIM('  hello  ');           -- → 'hello'\nSELECT LTRIM('  hello  ');          -- → 'hello  '\nSELECT RTRIM('  hello  ');          -- → '  hello'\n\n-- Remove specific characters (Postgres \u002F SQL Server)\nSELECT TRIM(BOTH '.' FROM '...hello...');   -- → 'hello'\nSELECT TRIM(LEADING '0' FROM '00042');       -- → '42'\n\n-- MySQL TRIM with specific characters\nSELECT TRIM(LEADING '0' FROM '00042');       -- → '42'\n```\n\n**Rule of thumb:** always `TRIM()` user-supplied input before storing it\nin the database. Trailing spaces cause subtle equality failures and bloat\n`CHAR(n)` columns.\n",{"id":357,"difficulty":114,"q":358,"a":359},"replace","How do you replace occurrences of a substring?","`REPLACE(source, from, to)` substitutes all occurrences of `from` with `to`\nin `source`. The replacement is case-sensitive in most databases.\n\n```sql\nSELECT REPLACE('Hello World', 'World', 'SQL');   -- → 'Hello SQL'\nSELECT REPLACE('aababab', 'ab', 'X');            -- → 'aXXX'\n\n-- Common use: sanitise data\nUPDATE products SET sku = REPLACE(sku, '-', '');  -- remove dashes from SKUs\n\n-- Postgres: regexp_replace for pattern-based replacement\nSELECT regexp_replace('Order #12345', '[0-9]+', 'XXXXXX');  -- → 'Order #XXXXXX'\nSELECT regexp_replace('a1b2c3', '[0-9]', '#', 'g');         -- → 'a#b#c#'\n-- 'g' flag = replace all occurrences (global)\n```\n\n**Rule of thumb:** use `REPLACE` for literal string swaps; use\n`regexp_replace` (Postgres) or `REGEXP_REPLACE` (MySQL 8+) when the\npattern to replace requires a regular expression.\n",{"id":361,"difficulty":114,"q":362,"a":363},"length","How do you get the length of a string in SQL?","`LENGTH` and `CHAR_LENGTH` return the number of characters in a string.\n`OCTET_LENGTH` returns the byte count (differs for multi-byte UTF-8).\n\n```sql\nSELECT LENGTH('Hello');          -- → 5\nSELECT CHAR_LENGTH('Hello');     -- → 5 (standard SQL; MySQL synonym)\nSELECT LENGTH('こんにちは');      -- Postgres: 5 chars; MySQL: 15 bytes!\n\n-- Postgres: char vs byte length\nSELECT char_length('こんにちは');  -- → 5 (characters)\nSELECT octet_length('こんにちは'); -- → 15 (bytes in UTF-8)\n\n-- Practical use: enforce a max length\nSELECT * FROM users WHERE LENGTH(username) > 50;\n```\n\n**Rule of thumb:** in MySQL, `LENGTH()` returns byte count for multibyte\nstrings — use `CHAR_LENGTH()` to count characters. In Postgres, `LENGTH()`\nreturns character count.\n",{"id":365,"difficulty":106,"q":366,"a":367},"string-search","How do you search for a substring within a string?","`POSITION`, `CHARINDEX`, `INSTR`, and `STRPOS` all find the starting\nposition of a substring (returning 0 or NULL if not found, depending on\nthe database).\n\n```sql\n-- ANSI standard\nSELECT POSITION('World' IN 'Hello World');    -- → 7\n\n-- Postgres\nSELECT STRPOS('Hello World', 'World');         -- → 7\n\n-- MySQL\nSELECT INSTR('Hello World', 'World');          -- → 7\nSELECT LOCATE('World', 'Hello World');         -- → 7\nSELECT LOCATE('World', 'Hello World', 8);      -- → 0 (start search at pos 8)\n\n-- SQL Server\nSELECT CHARINDEX('World', 'Hello World');      -- → 7\n\n-- Check existence: returns 0 if not found\nSELECT * FROM products WHERE POSITION('PRO' IN UPPER(sku)) > 0;\n-- Often better expressed as:\nSELECT * FROM products WHERE sku ILIKE '%PRO%';  -- Postgres\n```\n\n**Rule of thumb:** use `LIKE` or `ILIKE` for simple contains-checks — they\nare cleaner and the optimizer understands them. Use position functions when\nyou need the actual offset or want to split on a delimiter.\n",{"id":369,"difficulty":114,"q":370,"a":371},"round-ceil-floor","How do you round numeric values in SQL?","`ROUND`, `CEIL`\u002F`CEILING`, and `FLOOR` control numeric rounding.\n\n```sql\nSELECT ROUND(3.14159, 2);    -- → 3.14\nSELECT ROUND(3.145, 2);      -- → 3.15 (or 3.14 — depends on type\u002Frounding mode)\nSELECT ROUND(3.5);           -- → 4\nSELECT ROUND(-3.5);          -- → -4 (Postgres); -3 (some databases)\n\nSELECT CEIL(3.2);            -- → 4  (round up to nearest integer)\nSELECT CEILING(3.2);         -- → 4  (SQL Server alias)\nSELECT FLOOR(3.8);           -- → 3  (round down to nearest integer)\n\n-- Truncate without rounding (Postgres)\nSELECT TRUNC(3.999, 1);      -- → 3.9\nSELECT TRUNC(-3.999, 1);     -- → -3.9 (towards zero, not floor)\n\n-- Practical: round to 2 decimal places for currency display\nSELECT ROUND(SUM(total), 2) AS revenue FROM orders;\n```\n\n**Rule of thumb:** use `ROUND(x, 2)` for display formatting of monetary\nvalues. Use `TRUNC` (not `FLOOR`) when you need to drop fractional parts\nwithout rounding, especially for negative numbers.\n",{"id":373,"difficulty":114,"q":374,"a":375},"mod-abs-power","What are MOD, ABS, and POWER and how do you use them?","```sql\n-- MOD: remainder after integer division\nSELECT MOD(10, 3);      -- → 1\nSELECT 10 % 3;          -- → 1 (Postgres, SQL Server, MySQL)\n-- Common use: every Nth row, parity checks\nSELECT id FROM orders WHERE MOD(id, 2) = 0;  -- even IDs\n\n-- ABS: absolute value (removes sign)\nSELECT ABS(-42);        -- → 42\nSELECT ABS(-3.14);      -- → 3.14\n-- Use: distance calculations, deviation from target\nSELECT product_id, ABS(actual_stock - expected_stock) AS discrepancy\nFROM inventory_check;\n\n-- POWER: exponentiation\nSELECT POWER(2, 10);    -- → 1024\nSELECT 2 ^ 10;          -- → 1024 (Postgres)\n-- Use: compound interest, exponential growth models\nSELECT principal * POWER(1 + rate, years) AS future_value\nFROM loans;\n```\n\n**Rule of thumb:** prefer the operator form (`%`, `^`) in Postgres for\nbrevity; use `MOD()` and `POWER()` for cross-database portability.\n",{"id":377,"difficulty":106,"q":378,"a":379},"cast-convert","How do you convert a value from one data type to another?","Type conversion is done with `CAST` (standard), `::` (Postgres shorthand),\nor `CONVERT` (MySQL\u002FSQL Server).\n\n```sql\n-- Standard SQL CAST\nSELECT CAST('42' AS INTEGER);              -- string → integer\nSELECT CAST(3.14 AS TEXT);                 -- number → string\nSELECT CAST('2026-06-20' AS DATE);         -- string → date\n\n-- Postgres shorthand\nSELECT '42'::INTEGER;\nSELECT 3.14::TEXT;\nSELECT '2026-06-20'::DATE;\n\n-- MySQL CONVERT\nSELECT CONVERT('42', UNSIGNED);\nSELECT CONVERT(3.14, CHAR);\n\n-- SQL Server CONVERT (also supports format style codes)\nSELECT CONVERT(INT, '42');\nSELECT CONVERT(VARCHAR, GETDATE(), 103);   -- → '20\u002F06\u002F2026' (UK format)\n\n-- Safe cast (returns NULL instead of error on failure — Postgres 14+)\nSELECT pg_catalog.pg_safe_cast('abc', 'integer');  -- → NULL, no error\n-- MySQL: CAST('abc' AS UNSIGNED) → 0 (silent coercion, not NULL)\n```\n\n**Rule of thumb:** use `CAST(x AS type)` for portability. In Postgres,\n`::type` is idiomatic. Always validate input before casting to avoid\nruntime errors; use `TRY_CAST` (SQL Server) or `TRY_CONVERT` to return\nNULL on failure instead of raising an exception.\n",{"id":381,"difficulty":114,"q":382,"a":383},"string-padding","How do you pad a string to a fixed length?","`LPAD` and `RPAD` pad a string to a target length with a specified fill\ncharacter (defaulting to spaces).\n\n```sql\nSELECT LPAD('42', 5, '0');     -- → '00042'  (zero-pad a number)\nSELECT RPAD('hello', 8, '.');  -- → 'hello...'\nSELECT LPAD('hello', 3, ' ');  -- → 'hel'  (truncates if longer!)\n\n-- Common use: format fixed-width codes\nSELECT LPAD(CAST(id AS TEXT), 8, '0') AS padded_id FROM orders;\n-- id=42 → '00000042'\n\n-- Postgres alternative for numeric zero-padding\nSELECT TO_CHAR(42, 'FM00000000');  -- → '00000042'\n```\n\n**Rule of thumb:** use `LPAD(value::text, width, '0')` for zero-padding\nintegers when formatting export files or codes. Note that `LPAD` truncates\nfrom the left if the string is already longer than the target — always\nconfirm the max width before using it.\n",{"id":385,"difficulty":106,"q":386,"a":387},"format-to-char","How do you format numbers and dates as strings?","`TO_CHAR` (Postgres, Oracle) and `FORMAT` (MySQL, SQL Server) convert\nnumbers and dates to formatted strings.\n\n```sql\n-- Postgres TO_CHAR for numbers\nSELECT TO_CHAR(1234567.89, 'FM$999,999,990.00');  -- → '$1,234,567.89'\nSELECT TO_CHAR(0.153, 'FM90.0%');                  -- → '15.3%'\n\n-- Postgres TO_CHAR for dates\nSELECT TO_CHAR(now(), 'YYYY-MM-DD HH24:MI:SS');   -- → '2026-06-20 14:30:00'\nSELECT TO_CHAR(now(), 'Day, DD Month YYYY');       -- → 'Saturday, 20 June 2026'\n\n-- MySQL FORMAT for numbers\nSELECT FORMAT(1234567.89, 2);   -- → '1,234,567.89' (locale-aware)\n\n-- SQL Server FORMAT\nSELECT FORMAT(1234567.89, 'N2');            -- → '1,234,567.89'\nSELECT FORMAT(GETDATE(), 'yyyy-MM-dd');     -- → '2026-06-20'\n```\n\n**Rule of thumb:** format values for display in the application layer when\npossible — SQL formatting functions are less testable and locale-handling\nvaries. Use `TO_CHAR` in SQL when formatting is needed in the query itself\n(e.g., CSV exports, stored reports).\n",{"id":389,"difficulty":106,"q":390,"a":391},"string-split","How do you split a delimited string in SQL?","Splitting a delimited string (e.g., `'a,b,c'`) into rows requires\ndatabase-specific functions.\n\n```sql\n-- Postgres: STRING_TO_ARRAY + UNNEST\nSELECT UNNEST(STRING_TO_ARRAY('a,b,c', ',')) AS item;\n-- → rows: 'a', 'b', 'c'\n\n-- Postgres: regexp_split_to_table\nSELECT regexp_split_to_table('one two three', '\\s+') AS word;\n\n-- MySQL 8+: JSON_TABLE workaround (no native split)\nSELECT jt.item\nFROM JSON_TABLE(\n  CONCAT('[\"', REPLACE('a,b,c', ',', '\",\"'), '\"]'),\n  '$[*]' COLUMNS (item VARCHAR(100) PATH '$')\n) AS jt;\n\n-- SQL Server: STRING_SPLIT (SQL Server 2016+)\nSELECT value AS item FROM STRING_SPLIT('a,b,c', ',');\n```\n\n**Rule of thumb:** storing comma-separated values in a single column is a\n1NF violation — use a child table instead. When you must parse a legacy\nstring, `UNNEST(STRING_TO_ARRAY(...))` in Postgres is the cleanest approach.\n",{"id":393,"difficulty":106,"q":394,"a":395},"aggregate-string","How do you aggregate multiple rows into a single string?","`STRING_AGG` (Postgres 9.0+, SQL Server 2017+) and `GROUP_CONCAT` (MySQL)\nconcatenate values from multiple rows into one string, with a separator.\n\n```sql\n-- Postgres \u002F SQL Server\nSELECT customer_id,\n       STRING_AGG(product_name, ', ' ORDER BY product_name) AS products\nFROM   order_items\nGROUP  BY customer_id;\n-- → customer_id=1: 'Pen, Pencil, Ruler'\n\n-- MySQL\nSELECT customer_id,\n       GROUP_CONCAT(product_name ORDER BY product_name SEPARATOR ', ') AS products\nFROM   order_items\nGROUP  BY customer_id;\n\n-- With DISTINCT to deduplicate\nSELECT STRING_AGG(DISTINCT tag, ', ') FROM article_tags WHERE article_id = 1;\n```\n\n**Rule of thumb:** use `STRING_AGG` to build comma-separated lists for\nreports or JSON responses without a second round-trip. Be mindful of the\nresult length — `GROUP_CONCAT` in MySQL defaults to a 1 024-byte limit\n(configurable via `group_concat_max_len`).\n",{"id":397,"difficulty":106,"q":398,"a":399},"regex-functions","How do you use regular expressions in SQL?","Most databases support regex-based filtering and extraction, though the\nsyntax differs.\n\n```sql\n-- Postgres: ~ (match), !~ (no match), ~* (case-insensitive)\nSELECT * FROM users WHERE email ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$';\n-- Extract a substring matching a pattern\nSELECT REGEXP_MATCH(description, '\\d+') AS first_number FROM products;\n-- Replace with regex\nSELECT REGEXP_REPLACE(phone, '[^0-9]', '', 'g') AS digits_only FROM users;\n\n-- MySQL: REGEXP \u002F RLIKE (filter), REGEXP_REPLACE, REGEXP_SUBSTR (8.0+)\nSELECT * FROM users WHERE email REGEXP '^[a-z0-9._%+-]+@';\nSELECT REGEXP_REPLACE(phone, '[^0-9]', '') FROM users;\n\n-- SQL Server: no built-in regex; use LIKE for simple patterns\n-- or CLR functions \u002F JSON PATH for complex cases\nSELECT * FROM users WHERE email LIKE '%@%.%';\n```\n\n**Rule of thumb:** use `LIKE` for simple prefix\u002Fsuffix\u002Fcontains patterns —\nit is portable and index-friendly (leading wildcards aside). Use regex\nfunctions only when the pattern is too complex for `LIKE`, and only in\nPostgres or MySQL where support is solid.\n",{"description":104},"SQL string and numeric function interview questions — CONCAT, SUBSTRING, TRIM, REPLACE, UPPER\u002FLOWER, LENGTH, ROUND, CEIL, FLOOR, MOD, CAST, and dialect differences across Postgres, MySQL, and SQL Server.","sql\u002Ffunctions\u002Fstring-numeric-functions","String & Numeric Functions","jLD6OehmtGFFohDiH_eWEI5-sQoDyYQ_QrEhbvk3apg",{"id":406,"title":407,"body":408,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":412,"navigation":109,"order":14,"path":413,"questions":414,"questionsCount":323,"related":247,"seo":475,"seoDescription":476,"stem":477,"subtopic":407,"topic":73,"topicSlug":75,"updated":328,"__hash__":478},"qa\u002Fsql\u002Fperformance\u002Findexes.md","Indexes",{"type":101,"value":409,"toc":410},[],{"title":104,"searchDepth":30,"depth":30,"links":411},[],{},"\u002Fsql\u002Fperformance\u002Findexes",[415,419,423,427,431,435,439,443,447,451,455,459,463,467,471],{"id":416,"difficulty":114,"q":417,"a":418},"what-is-index","What is a database index and how does it speed up queries?","An **index** is a separate data structure (usually a **B-tree**) that the\ndatabase maintains alongside a table. It stores copies of one or more column\nvalues in sorted order together with pointers to the full row, allowing the\nengine to locate matching rows in **O(log n)** time instead of scanning\nevery row (O(n)).\n\n```sql\n-- Without an index: full table scan — reads every row\nSELECT * FROM orders WHERE customer_id = 42;\n\n-- After adding an index: index seek — reads only the matching branch\nCREATE INDEX idx_orders_customer ON orders (customer_id);\nSELECT * FROM orders WHERE customer_id = 42;\n-- Execution plan changes from Seq Scan to Index Scan\n```\n\nThe trade-off: indexes consume disk space and must be updated on every\n`INSERT`, `UPDATE`, or `DELETE` on the indexed columns — adding write\noverhead.\n\n**Rule of thumb:** add an index on any column that appears frequently in\n`WHERE`, `JOIN ON`, or `ORDER BY` clauses of slow queries. Verify with\n`EXPLAIN` before and after.\n",{"id":420,"difficulty":114,"q":421,"a":422},"btree-index","What is a B-tree index and what kinds of queries does it support?","A **B-tree** (Balanced-tree) index is the default index type in all major\ndatabases. It keeps values in sorted order across a balanced tree of pages,\nmaking it efficient for equality, range, and sort operations.\n\nSupported query types:\n- Equality: `WHERE col = value`\n- Range: `WHERE col > value`, `WHERE col BETWEEN a AND b`\n- Prefix: `WHERE col LIKE 'abc%'` (but not `'%abc'`)\n- Sorting: `ORDER BY col` (the optimizer may use the index to avoid a sort)\n- Prefix of a composite index: `WHERE (a, b)` when index is on `(a, b, c)`\n\n```sql\n-- B-tree index covers all of these:\nCREATE INDEX idx_users_email ON users (email);\n\nSELECT * FROM users WHERE email = 'alice@example.com';       -- equality\nSELECT * FROM users WHERE email > 'm@example.com';           -- range\nSELECT * FROM users WHERE email LIKE 'alice%';               -- prefix\nSELECT * FROM users ORDER BY email LIMIT 10;                 -- sort\n```\n\n**Rule of thumb:** always start with a B-tree index. Only reach for\nspecialised types (hash, GIN, GiST) when B-tree cannot satisfy the\naccess pattern (e.g., full-text search, containment on arrays\u002FJSON).\n",{"id":424,"difficulty":106,"q":425,"a":426},"composite-index","What is a composite index and what is the left-prefix rule?","A **composite index** covers two or more columns. The database sorts rows\nby the first column, then by the second within each first-column group, and\nso on. The optimizer can only use the index starting from the leftmost column\n— this is the **left-prefix rule**.\n\n```sql\nCREATE INDEX idx_orders_status_date ON orders (status, created_at);\n\n-- Uses the index (status is the leftmost column)\nSELECT * FROM orders WHERE status = 'pending';\n\n-- Uses the index (both columns used, in order)\nSELECT * FROM orders WHERE status = 'pending' AND created_at > '2026-01-01';\n\n-- CANNOT use the index (skips the first column)\nSELECT * FROM orders WHERE created_at > '2026-01-01';\n-- → falls back to Seq Scan\n```\n\nColumn order in the index matters: put the column used in equality filters\nfirst, then range-filtered columns, then sort columns.\n\n**Rule of thumb:** design composite indexes as `(equality_cols, range_col,\nsort_col)`. The most selective equality column goes first. A query that\nskips the leftmost column cannot use the index.\n",{"id":428,"difficulty":106,"q":429,"a":430},"covering-index","What is a covering index and how does it eliminate table lookups?","A **covering index** contains all columns a query needs — the database can\nanswer the query entirely from the index without touching the main table\n(the \"heap\"). This eliminates the extra I\u002FO of the **table lookup** (also\ncalled a \"heap fetch\" or \"bookmark lookup\").\n\n```sql\n-- Query needs id, status, total — all three must be in the index\nCREATE INDEX idx_orders_covering\n  ON orders (customer_id)\n  INCLUDE (status, total);   -- Postgres 11+ \u002F SQL Server: INCLUDE clause\n\nSELECT id, status, total\nFROM   orders\nWHERE  customer_id = 42;\n-- Execution plan: Index Only Scan (no heap access)\n```\n\nIn MySQL, covering indexes work without an `INCLUDE` clause — all columns\nin the `SELECT` list just need to be part of the index definition.\n\n**Rule of thumb:** when `EXPLAIN` shows an `Index Scan` on a high-traffic\nquery, check whether adding the `SELECT`ed columns to the index via\n`INCLUDE` can convert it to an `Index Only Scan`.\n",{"id":432,"difficulty":106,"q":433,"a":434},"partial-index","What is a partial index and when does it help?","A **partial index** is an index built on a subset of rows — those satisfying\na `WHERE` clause in the index definition. It is smaller, faster to update,\nand more selective than a full-column index.\n\n```sql\n-- Only index pending orders (the rows that are actually queried)\nCREATE INDEX idx_orders_pending\n  ON orders (created_at)\n  WHERE status = 'pending';\n\n-- This query uses the partial index (matches the WHERE condition)\nSELECT * FROM orders WHERE status = 'pending' AND created_at \u003C now() - INTERVAL '1 day';\n\n-- This query cannot use the partial index (status ≠ 'pending')\nSELECT * FROM orders WHERE status = 'shipped' AND created_at \u003C now() - INTERVAL '1 day';\n```\n\nAlso valid for partial unique indexes (see the constraints topic).\n\n**Rule of thumb:** use partial indexes when a large table has a small\n\"hot\" subset that most queries filter on (e.g., active records, unprocessed\njobs, non-deleted rows). The index shrinks dramatically and fits in cache\nmore easily.\n",{"id":436,"difficulty":106,"q":437,"a":438},"hash-index","When should you use a hash index instead of a B-tree?","A **hash index** maps each column value to a hash bucket, giving O(1)\naverage-case lookup for **equality-only** queries. It cannot support range\nqueries, sorting, or prefix matches.\n\n```sql\n-- Postgres: explicit hash index\nCREATE INDEX idx_sessions_token ON sessions USING HASH (token);\n\n-- Useful for: WHERE token = 'abc123'  (pure equality)\n-- Useless for: WHERE token > 'abc123'  (range)\n-- Useless for: ORDER BY token          (sort)\n```\n\nIn older Postgres versions (\u003C 10), hash indexes were not WAL-logged and\nwere lost on crash. Since Postgres 10, they are crash-safe. MySQL and SQL\nServer do not offer explicit hash indexes on disk (MySQL Memory engine does).\n\n**Rule of thumb:** prefer B-tree in almost all cases — it handles equality\ntoo and adds range\u002Fsort support for free. Only consider a hash index when\nprofiling shows that a very high-throughput equality-only lookup would\nmeasurably benefit from the marginal O(1) vs O(log n) difference.\n",{"id":440,"difficulty":127,"q":441,"a":442},"gin-index","What is a GIN index and when do you use it in Postgres?","**GIN** (Generalized Inverted Index) is optimised for columns that contain\nmultiple values per row — arrays, `JSONB`, `tsvector` (full-text search).\nIt maps each individual element (word, key, array item) to the set of rows\ncontaining it.\n\n```sql\n-- Full-text search index\nCREATE INDEX idx_articles_fts ON articles USING GIN (to_tsvector('english', body));\nSELECT * FROM articles WHERE to_tsvector('english', body) @@ to_tsquery('postgres & index');\n\n-- JSONB containment index\nCREATE INDEX idx_events_payload ON events USING GIN (payload);\nSELECT * FROM events WHERE payload @> '{\"type\": \"click\"}';\n\n-- Array containment index\nCREATE INDEX idx_posts_tags ON posts USING GIN (tags);\nSELECT * FROM posts WHERE tags @> ARRAY['sql', 'performance'];\n```\n\nGIN indexes are large and slow to build but very fast for containment\nqueries (`@>`, `@@`, `?`).\n\n**Rule of thumb:** use GIN for full-text search and JSONB\u002Farray containment\nqueries. Use a regular B-tree for simple `=` or range filters on JSONB\nextracted scalar values (`(payload->>'user_id')::int`).\n",{"id":444,"difficulty":127,"q":445,"a":446},"index-bloat","What is index bloat and how do you fix it?","**Index bloat** occurs when an index grows much larger than the live data it\ncovers, usually because `DELETE` and `UPDATE` operations leave dead index\nentries that accumulate faster than `VACUUM` can reclaim them.\n\nSymptoms: index scans slow down, the index file on disk is much larger than\nexpected, and `VACUUM` verbose output shows many dead tuples.\n\n```sql\n-- Postgres: check index bloat (pgstattuple extension)\nCREATE EXTENSION IF NOT EXISTS pgstattuple;\nSELECT index_name,\n       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,\n       round(leaf_fragmentation::numeric, 2)       AS fragmentation_pct\nFROM   pgstattuple_approx('idx_orders_customer') t,\n       pg_indexes WHERE indexname = 'idx_orders_customer';\n\n-- Fix: rebuild the index (locks table — use CONCURRENTLY for large tables)\nREINDEX INDEX idx_orders_customer;\n\n-- Non-blocking rebuild (Postgres 12+)\nREINDEX INDEX CONCURRENTLY idx_orders_customer;\n```\n\n**Rule of thumb:** monitor index sizes relative to table sizes. If an index\nis more than 2–3× the expected size, run `REINDEX CONCURRENTLY`. Tune\n`autovacuum` to run more aggressively on high-churn tables.\n",{"id":448,"difficulty":106,"q":449,"a":450},"when-not-to-index","When should you NOT add an index?","Indexes are not free — they slow down writes and consume disk space. Avoid\nadding an index when:\n\n1. **The table is tiny** — a full scan of a 500-row table is faster than\n   an index lookup because the whole table fits in one or two pages.\n2. **The column has very low selectivity** — an index on a `status` column\n   with only two values (`active`\u002F`inactive`) where 90 % of rows are\n   `active` gives the optimizer no benefit for `WHERE status = 'active'`.\n3. **The table is write-heavy** — a table with hundreds of inserts\u002Fsecond\n   pays a high cost to maintain indexes. Batch-load tables often drop\n   indexes before the load and rebuild them after.\n4. **The column is rarely queried** — unused indexes consume space and slow\n   every write without ever benefiting a read.\n\n```sql\n-- Postgres: find unused indexes\nSELECT schemaname, tablename, indexname, idx_scan\nFROM   pg_stat_user_indexes\nWHERE  idx_scan = 0\n  AND  indexname NOT LIKE '%pkey%'   -- skip PKs\nORDER  BY pg_relation_size(indexrelid) DESC;\n```\n\n**Rule of thumb:** only add an index when you can show — via `EXPLAIN\nANALYZE` — that it is used and that it reduces query time. Drop unused\nindexes; they are pure overhead.\n",{"id":452,"difficulty":106,"q":453,"a":454},"index-scan-vs-seq-scan","When does the optimizer choose a sequential scan over an index scan?","The query optimizer uses **cost-based planning** to choose between a\nsequential scan and an index scan. It chooses sequential scan when:\n\n1. **Many rows match** — if 30 %+ of table rows match the `WHERE` clause,\n   reading the table sequentially in disk order is cheaper than the random\n   I\u002FO of following index pointers row-by-row.\n2. **Table statistics are stale** — if `ANALYZE` has not run recently, the\n   planner may underestimate or overestimate selectivity.\n3. **Small table** — the whole table fits in a few pages; sequential I\u002FO\n   is faster.\n4. **Cost configuration** — `random_page_cost` (Postgres) affects the\n   relative cost of index vs sequential I\u002FO. Lowering it (e.g. to 1.1 for\n   SSDs) makes index scans more attractive.\n\n```sql\n-- Check the plan and actual vs estimated rows\nEXPLAIN (ANALYZE, BUFFERS)\n  SELECT * FROM orders WHERE status = 'pending';\n-- If 'Rows Removed by Filter' is huge → Seq Scan is correct\n-- If 'Rows Removed by Filter' is small → missing index or stale stats\n\n-- Update statistics\nANALYZE orders;\n```\n\n**Rule of thumb:** trust the optimizer; a sequential scan on 40 % of rows\nIS faster than an index scan. If a plan looks wrong, check statistics with\n`ANALYZE` before forcing an index with a hint.\n",{"id":456,"difficulty":127,"q":457,"a":458},"expression-index","What is a functional (expression) index?","A **functional index** (expression index) indexes the *result* of a\nfunction or expression applied to a column rather than the raw column value.\nThis allows the optimizer to use the index when the same expression appears\nin a `WHERE` clause.\n\n```sql\n-- Without expression index: full scan (function applied to every row)\nSELECT * FROM users WHERE lower(email) = 'alice@example.com';\n\n-- Create an index on the expression\nCREATE INDEX idx_users_email_lower ON users (lower(email));\n\n-- Now this uses the index\nSELECT * FROM users WHERE lower(email) = 'alice@example.com';\n\n-- Also useful for JSON extraction\nCREATE INDEX idx_events_user ON events ((payload->>'user_id'));\nSELECT * FROM events WHERE payload->>'user_id' = '42';\n```\n\nThe expression in the `WHERE` clause must match the expression in the index\nexactly for the planner to use it.\n\n**Rule of thumb:** create a functional index whenever a `WHERE` clause\napplies a deterministic function to a column (case-insensitive email,\nJSON field extraction, date truncation). Run `ANALYZE` after creating it\nso the planner sees up-to-date statistics.\n",{"id":460,"difficulty":106,"q":461,"a":462},"index-maintenance","How do you find and remove duplicate or unused indexes?","```sql\n-- Postgres: find unused indexes (not scanned since last stats reset)\nSELECT schemaname, tablename, indexname,\n       pg_size_pretty(pg_relation_size(indexrelid)) AS size,\n       idx_scan\nFROM   pg_stat_user_indexes\nWHERE  idx_scan = 0\nORDER  BY pg_relation_size(indexrelid) DESC;\n\n-- Postgres: find duplicate indexes (same columns, same table)\nSELECT indrelid::regclass AS table,\n       array_agg(indexrelid::regclass) AS duplicate_indexes\nFROM   pg_index\nGROUP  BY indrelid, indkey\nHAVING COUNT(*) > 1;\n\n-- Drop a redundant index (non-blocking in Postgres)\nDROP INDEX CONCURRENTLY idx_orders_old_customer;\n```\n\nAn index is redundant when another index on the same table starts with the\nsame column(s). For example, an index on `(customer_id)` is made redundant\nby a composite index on `(customer_id, created_at)` for equality lookups.\n\n**Rule of thumb:** audit indexes quarterly. Drop unused ones — they slow\nwrites and mislead developers into thinking a column is important for\nlookups. Keep an index removal in a migration script so it can be re-added\nif monitoring reveals it is needed.\n",{"id":464,"difficulty":106,"q":465,"a":466},"clustered-vs-nonclustered","What is the difference between a clustered and a non-clustered index?","- **Clustered index**: the table data is physically stored in the order of\n  the index. There can be only **one** per table. In SQL Server and MySQL\n  InnoDB, the primary key is always the clustered index. In Postgres, there\n  is no automatic clustering, but `CLUSTER table USING index` physically\n  reorders the table once (not maintained dynamically).\n- **Non-clustered index**: a separate structure that stores the indexed\n  values and pointers (row IDs \u002F PKs) back to the heap. Multiple non-\n  clustered indexes can exist per table.\n\n```sql\n-- SQL Server: clustered index (the PK is clustered by default)\nCREATE TABLE orders (\n  id INT PRIMARY KEY CLUSTERED,    -- data pages sorted by id\n  customer_id INT NOT NULL\n);\n\n-- Non-clustered index\nCREATE NONCLUSTERED INDEX idx_orders_customer ON orders (customer_id);\n\n-- Postgres: one-time physical sort (does NOT stay sorted after future writes)\nCLUSTER orders USING idx_orders_customer_date;\n```\n\n**Rule of thumb:** in SQL Server and MySQL, choose the clustered index\n(usually the PK) carefully — sequential integer PKs cause minimal page\nsplits. Random UUIDs as clustered keys cause fragmentation and slow inserts.\n",{"id":468,"difficulty":114,"q":469,"a":470},"index-on-foreign-key","Should you always index a foreign key column?","**Yes, in almost all cases.** Foreign key columns are used in `JOIN ON`\nconditions and in cascade operations (`ON DELETE CASCADE`). Without an index,\nboth joins and FK enforcement scans become full table scans.\n\n```sql\n-- Child table without an index on the FK column:\n-- DELETE FROM customers WHERE id = 1\n-- → DB must scan ALL order rows to find children — O(n)\n\n-- With an index on the FK column:\nCREATE INDEX idx_orders_customer_id ON orders (customer_id);\n-- DELETE FROM customers WHERE id = 1\n-- → Index lookup to find children — O(log n)\n```\n\nPostgres does NOT automatically create an index on FK columns (unlike the\nPK side). MySQL InnoDB DOES create one automatically. SQL Server does not.\n\n**Rule of thumb:** after declaring a `REFERENCES` constraint in Postgres or\nSQL Server, immediately add a `CREATE INDEX` on the child's FK column unless\nthe child table is very small or the FK is never used in queries.\n",{"id":472,"difficulty":114,"q":473,"a":474},"index-types-summary","What index types are available in Postgres and when do you use each?","| Type | Best for | Notes |\n|---|---|---|\n| `BTREE` | equality, range, sort, prefix LIKE | Default; handles 95 % of use cases |\n| `HASH` | equality only | Marginally faster than B-tree for pure equality; no range |\n| `GIN` | arrays, JSONB containment, full-text | Large index; slow build; fast `@>` \u002F `@@` |\n| `GiST` | geometry, ranges, nearest-neighbour | PostGIS spatial; `&&` \u002F `\u003C->` operators |\n| `BRIN` | huge tables with natural physical ordering | Very small index; only useful for append-only tables (time-series) |\n| `SP-GiST` | non-balanced partitioned structures | Quad-trees, radix trees; niche use |\n\n```sql\n-- BRIN: tiny index on a massive append-only events table\n-- (works because newer rows have higher timestamps and are stored later on disk)\nCREATE INDEX idx_events_brin ON events USING BRIN (created_at);\n```\n\n**Rule of thumb:** use `BTREE` by default. Use `GIN` for arrays\u002FJSONB\u002FFTS.\nUse `BRIN` only on truly append-only tables (logs, IoT data) where the\nindexed column correlates with physical storage order — otherwise it will\nnot be used.\n",{"description":104},"SQL indexes interview questions — B-tree, hash, GIN, GiST, partial and composite indexes, covering indexes, index bloat, VACUUM, and when not to index across Postgres, MySQL, and SQL Server.","sql\u002Fperformance\u002Findexes","IyWMTZsQsuPFpcSDdisIDYupSCT7mg2HwdOjRH8cwNw",{"id":480,"title":481,"body":482,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":486,"navigation":109,"order":14,"path":487,"questions":488,"questionsCount":553,"related":247,"seo":554,"seoDescription":555,"stem":556,"subtopic":481,"topic":47,"topicSlug":48,"updated":328,"__hash__":557},"qa\u002Fsql\u002Fschema\u002Fdata-types.md","Data Types",{"type":101,"value":483,"toc":484},[],{"title":104,"searchDepth":30,"depth":30,"links":485},[],{},"\u002Fsql\u002Fschema\u002Fdata-types",[489,493,497,501,505,509,513,517,521,525,529,533,537,541,545,549],{"id":490,"difficulty":114,"q":491,"a":492},"why-types-matter","Why does picking the right data type matter?","Choosing the right type affects **storage size**, **query performance**, and\n**data integrity**. A correct type rejects bad data at insert time (the\ndatabase enforces the constraint for free) and lets the engine use internal\noptimizations (integer comparisons are faster than string comparisons; a\n`DATE` column can use date arithmetic natively).\n\n```sql\n-- BAD: storing a price as VARCHAR lets \"abc\" in and breaks SUM()\nprice VARCHAR(20)\n\n-- GOOD: exact numeric, 2 decimal places, always positive\nprice NUMERIC(10, 2) NOT NULL CHECK (price >= 0)\n```\n\n**Rule of thumb:** choose the *narrowest* type that correctly represents\nevery valid value — it saves space, speeds up indexes, and keeps invalid\ndata out automatically.\n",{"id":494,"difficulty":114,"q":495,"a":496},"integer-types","What are the main integer types and when do you choose each?","All major databases offer a family of fixed-size integers:\n\n| Type | Bytes | Range (~) | Use when… |\n|---|---|---|---|\n| `SMALLINT` | 2 | ±32 k | small lookup codes, status flags |\n| `INT` \u002F `INTEGER` | 4 | ±2.1 B | most surrogate keys, counters |\n| `BIGINT` | 8 | ±9.2 × 10¹⁸ | high-volume tables, distributed IDs |\n\n```sql\n-- Postgres auto-increment shorthand\nid SERIAL PRIMARY KEY          -- alias for INT + sequence\nid BIGSERIAL PRIMARY KEY       -- alias for BIGINT + sequence\n\n-- Standard SQL (Postgres 10+, MySQL 8, SQL Server)\nid INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY\n```\n\n**Rule of thumb:** default to `INT` for PKs; switch to `BIGINT` if you\nexpect more than ~1 billion rows or use globally distributed IDs (snowflakes,\nUUIDs stored as numbers).\n",{"id":498,"difficulty":106,"q":499,"a":500},"numeric-vs-float","What is the difference between NUMERIC\u002FDECIMAL and FLOAT\u002FREAL?","**`NUMERIC(p, s)` \u002F `DECIMAL(p, s)`** store exact values using binary-coded\ndecimal arithmetic. They never introduce rounding errors and are required for\nmoney, tax rates, or any value where \"0.10 + 0.20 = 0.30\" must hold exactly.\n\n**`FLOAT` \u002F `REAL` \u002F `DOUBLE PRECISION`** are IEEE-754 floating-point types.\nThey are faster and more compact but introduce tiny rounding errors, making\nthem unsuitable for financial calculations.\n\n```sql\n-- exact: total will always equal the sum of its parts\nprice NUMERIC(12, 4)\n\n-- approximate: fine for sensor readings, ML feature vectors\nlatitude DOUBLE PRECISION\n```\n\n**Rule of thumb:** use `NUMERIC` for money and anything that will be summed\nor compared for equality; use `FLOAT`\u002F`DOUBLE` for scientific measurements\nwhere small rounding is acceptable.\n",{"id":502,"difficulty":114,"q":503,"a":504},"char-vs-varchar-vs-text","What is the difference between CHAR, VARCHAR, and TEXT?","- **`CHAR(n)`** — fixed-length, always `n` characters, right-padded with\n  spaces. Trailing spaces are ignored in comparisons in most databases.\n  Useful only for truly fixed-width codes (country codes, ISO currency codes).\n- **`VARCHAR(n)`** — variable-length up to `n` characters. The limit is a\n  *declaration*; values shorter than `n` use less storage.\n- **`TEXT`** — unlimited-length character string (no declared max). Postgres\n  treats `TEXT` and `VARCHAR` identically at the storage level. MySQL and\n  SQL Server have different performance trade-offs for very large TEXT values.\n\n```sql\ncountry_code CHAR(2)       -- 'US', 'GB' — always exactly 2\nemail        VARCHAR(255)  -- typical cap; protects against runaway input\nbody         TEXT          -- blog post, unlimited\n```\n\n**Rule of thumb:** use `CHAR` only for fixed-width codes, `VARCHAR(n)` for\nfields with a meaningful business-length cap (email, username), and `TEXT`\nfor free-form content.\n",{"id":506,"difficulty":106,"q":507,"a":508},"date-time-types","What date and time types does SQL offer and how do they differ?","| Type | Stores | Timezone-aware? |\n|---|---|---|\n| `DATE` | year-month-day | No |\n| `TIME` | hour-min-sec | No (`TIMETZ` in Postgres) |\n| `TIMESTAMP` | date + time | No |\n| `TIMESTAMPTZ` (Postgres) \u002F `DATETIMEOFFSET` (SQL Server) | date + time | Yes — stored as UTC, displayed in session tz |\n\n```sql\n-- Postgres\ncreated_at  TIMESTAMPTZ NOT NULL DEFAULT now()\nbirth_date  DATE\n\n-- MySQL (no TIMESTAMPTZ; DATETIME is naive, TIMESTAMP is UTC-stored)\ncreated_at  DATETIME(6)  -- 6-digit microsecond precision\n```\n\n**Rule of thumb:** always store timestamps with time-zone awareness\n(`TIMESTAMPTZ` in Postgres, `DATETIMEOFFSET` in SQL Server). Store `DATE`\nalone only when the time component is meaningless (birthdays, holidays).\n",{"id":510,"difficulty":114,"q":511,"a":512},"boolean-type","How do databases handle boolean values?","**Postgres** has a native `BOOLEAN` type that accepts `TRUE`\u002F`FALSE` (and\naliases like `'yes'`\u002F`'no'`, `1`\u002F`0`).\n\n**MySQL** lacks a native boolean — `BOOL`\u002F`BOOLEAN` is an alias for\n`TINYINT(1)`, where `0` = false and any non-zero = true.\n\n**SQL Server** uses `BIT` (0 or 1, no native `TRUE`\u002F`FALSE` literal).\n\n```sql\n-- Postgres\nis_active BOOLEAN NOT NULL DEFAULT TRUE\n\n-- MySQL\nis_active TINYINT(1) NOT NULL DEFAULT 1\n\n-- SQL Server\nis_active BIT NOT NULL DEFAULT 1\n```\n\n**Rule of thumb:** in Postgres use `BOOLEAN`; in MySQL use `TINYINT(1)`;\nin SQL Server use `BIT`. In all cases, enforce `NOT NULL DEFAULT` to keep\nthe flag unambiguous.\n",{"id":514,"difficulty":106,"q":515,"a":516},"null-semantics","What is NULL and how does three-valued logic work in SQL?","`NULL` means **unknown \u002F missing \u002F not applicable** — it is not zero, not\nempty string, not `FALSE`. SQL uses **three-valued logic**: a comparison\ninvolving `NULL` evaluates to `UNKNOWN`, which acts like `FALSE` in `WHERE`\nclauses (the row is excluded).\n\n```sql\n-- These all return UNKNOWN, not TRUE or FALSE:\nNULL = NULL       -- UNKNOWN\nNULL \u003C> 1         -- UNKNOWN\nNULL IS NULL      -- TRUE  ← the correct test\nNULL IS NOT NULL  -- FALSE\n\n-- Practical pitfall: rows where discount IS NULL are excluded\nSELECT * FROM orders WHERE discount \u003C> 0;\n-- Fix:\nSELECT * FROM orders WHERE discount \u003C> 0 OR discount IS NULL;\n```\n\n**Rule of thumb:** never compare with `= NULL`; always use `IS NULL` \u002F\n`IS NOT NULL`. Use `COALESCE(col, default)` to substitute a default before\ncomparison.\n",{"id":518,"difficulty":106,"q":519,"a":520},"uuid-type","What are UUIDs and when should you use them as primary keys?","A **UUID** (Universally Unique Identifier) is a 128-bit value, usually\nwritten as `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`. It is collision-resistant\nwithout a central coordinator, making it ideal for **distributed inserts**\nand **exposing IDs in APIs** (not predictable like an integer sequence).\n\n```sql\n-- Postgres: native UUID type (stored as 16 bytes)\nid UUID PRIMARY KEY DEFAULT gen_random_uuid()\n\n-- MySQL: no native UUID; store as CHAR(36) or BINARY(16)\nid CHAR(36) PRIMARY KEY DEFAULT (UUID())\n```\n\nDownsides: random UUIDs (v4) cause **index fragmentation** because inserts\nscatter across the B-tree. **UUIDv7** (time-ordered) mitigates this.\n\n**Rule of thumb:** prefer integer PKs for internal tables; use UUIDs when\nrows are created across multiple nodes or when IDs are exposed externally and\nmust not be guessable.\n",{"id":522,"difficulty":106,"q":523,"a":524},"json-type","When should you store data as JSON in a relational column?","JSON columns let you persist **semi-structured, schema-flexible** data\n(event payloads, third-party API responses) alongside relational data.\nPostgres's `JSONB` stores a parsed binary representation — indexable with\nGIN, fast to query. MySQL 8+ and SQL Server 2016+ also support JSON but\nstore it as text with helper functions.\n\n```sql\n-- Postgres JSONB with a GIN index for fast containment queries\nCREATE TABLE events (\n  id      BIGSERIAL PRIMARY KEY,\n  payload JSONB NOT NULL\n);\nCREATE INDEX idx_events_payload ON events USING GIN (payload);\n\n-- Query a nested key\nSELECT * FROM events WHERE payload @> '{\"type\": \"click\"}';\n```\n\n**Rule of thumb:** use JSON columns for truly variable structures that would\notherwise require dozens of nullable columns or a separate EAV table. If\nyou find yourself querying the same JSON key in every `WHERE` clause,\nextract it into a proper column.\n",{"id":526,"difficulty":106,"q":527,"a":528},"enum-type","What are ENUM types and what are their trade-offs?","An **`ENUM`** restricts a column to a predefined list of string labels,\nenforcing a domain constraint at the type level. Postgres stores `ENUM` as a\nuser-defined type; MySQL stores it internally as an integer but displays the\nlabel.\n\n```sql\n-- Postgres\nCREATE TYPE order_status AS ENUM ('pending', 'shipped', 'delivered', 'cancelled');\nALTER TABLE orders ADD COLUMN status order_status NOT NULL DEFAULT 'pending';\n\n-- MySQL\nstatus ENUM('pending', 'shipped', 'delivered', 'cancelled') NOT NULL DEFAULT 'pending'\n```\n\nTrade-offs:\n- ✅ Compact storage, enforced domain, readable values.\n- ❌ Adding a new label requires `ALTER TYPE` (Postgres) or `ALTER TABLE` (MySQL), which can lock the table.\n- ❌ Harder to manage via migrations; lookup tables are more flexible.\n\n**Rule of thumb:** use `ENUM` for short, stable lists (\u003C 10 values, rarely\nchanging); use a lookup\u002Freference table when the list is large or frequently\nupdated.\n",{"id":530,"difficulty":106,"q":531,"a":532},"choosing-numeric-precision","How do you choose precision and scale for NUMERIC(p, s)?","`p` = **total significant digits**, `s` = **digits to the right of the\ndecimal point**.\n\n| Value | Type | p | s |\n|---|---|---|---|\n| `12345.67` | price | 7 | 2 |\n| `0.000001` | rate | 7 | 6 |\n| `-9999999.9999` | balance | 11 | 4 |\n\n```sql\n-- Store prices up to $9,999,999.99 with cent precision\nprice NUMERIC(9, 2)\n\n-- Interest rate: 0.0000 – 1.0000 with 4 decimal places\nrate  NUMERIC(5, 4)\n```\n\nPostgres and SQL Server will raise an error if a value exceeds the declared\nprecision. MySQL silently rounds or truncates.\n\n**Rule of thumb:** set `s` to the number of decimal places your business\nlogic requires; set `p` to `s` plus the number of digits you expect to the\nleft of the decimal, then add a few digits of headroom.\n",{"id":534,"difficulty":106,"q":535,"a":536},"serial-vs-identity","What is the difference between SERIAL and GENERATED AS IDENTITY?","Both auto-generate ascending integer PKs, but they differ in SQL standard\ncompliance and control:\n\n- **`SERIAL`** (Postgres-specific) creates a sequence and sets a `DEFAULT\n  nextval(...)` on the column. It is an alias, not a type — the column's real\n  type is `INTEGER`. Users can still `INSERT` an explicit value, bypassing\n  the sequence.\n- **`GENERATED ALWAYS AS IDENTITY`** (SQL:2003 standard, Postgres 10+,\n  SQL Server, MySQL 8+) formally declares the column as identity-generated.\n  `GENERATED ALWAYS` prevents manual inserts; `GENERATED BY DEFAULT` allows\n  them.\n\n```sql\n-- Old Postgres style\nid SERIAL PRIMARY KEY\n\n-- Standard SQL (preferred)\nid INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY\n```\n\n**Rule of thumb:** prefer `GENERATED ALWAYS AS IDENTITY` for new schemas —\nit is portable and prevents accidental sequence skips from manual inserts.\n",{"id":538,"difficulty":106,"q":539,"a":540},"storing-money","What is the best way to store monetary values in SQL?","Use **`NUMERIC(p, 2)`** (or a higher scale for currencies with sub-cent\nprecision). Never use `FLOAT` — floating-point arithmetic makes `0.10 + 0.20`\nequal `0.30000000000000004`, which causes reconciliation errors.\n\nSome teams store money as a `BIGINT` of the smallest unit (cents, pence) and\nconvert to a decimal only in the application layer — this avoids any numeric\ntype ambiguity.\n\n```sql\n-- Option 1: NUMERIC column in dollars\namount NUMERIC(12, 2) NOT NULL\n\n-- Option 2: BIGINT in cents (no decimal at all)\namount_cents BIGINT NOT NULL  -- 1099 = $10.99\n```\n\n**Rule of thumb:** use `NUMERIC` in SQL; if you also need speed at very\nhigh throughput, use `BIGINT` cents and divide by 100 in the application.\nDocument which approach you use in the column comment.\n",{"id":542,"difficulty":114,"q":543,"a":544},"binary-types","When would you store binary data in a SQL column?","Binary columns (`BYTEA` in Postgres, `VARBINARY`\u002F`BLOB` in MySQL\u002FSQL Server)\nstore raw byte sequences — images, PDFs, encrypted values, hashes.\n\nIn practice, storing **large blobs directly in the database** bloats the\ntable, slows backups, and is rarely optimal compared to object storage (S3,\nGCS) with only a URL or key in the DB.\n\n```sql\n-- Appropriate: cryptographic hash (fixed 32 bytes)\npassword_hash BYTEA NOT NULL   -- bcrypt\u002Fargon2 output\n\n-- Appropriate: small thumbnail thumbnail (\u003C 64 KB)\navatar_thumb  BYTEA\n\n-- Avoid: storing full-resolution images in the DB\n-- Store the S3 key instead:\navatar_s3_key VARCHAR(500)\n```\n\n**Rule of thumb:** store binary data in the database only when it is small\n(\u003C 1 MB), must be transactionally consistent with other columns, or access\npatterns demand it. Otherwise, use object storage and keep a reference key.\n",{"id":546,"difficulty":106,"q":547,"a":548},"implicit-vs-explicit-casting","What is implicit type casting and why can it be dangerous?","**Implicit casting** (coercion) happens when the database silently converts\na value from one type to another to satisfy a comparison or expression. This\ncan cause **index scans to degrade into full-table scans** if the cast\nprevents the engine from using the index on the original column.\n\n```sql\n-- Table: users(id INT, phone VARCHAR(20))\n-- phone has an index.\n\n-- BAD: implicit cast of the integer literal to VARCHAR\n-- Some databases may cast the column instead, killing the index\nSELECT * FROM users WHERE phone = 12345;\n\n-- GOOD: explicit cast or string literal\nSELECT * FROM users WHERE phone = '12345';\n-- or\nSELECT * FROM users WHERE phone = CAST(12345 AS VARCHAR);\n```\n\n**Rule of thumb:** always compare like types. Mismatched types in `WHERE`\npredicates are a common source of unexpected full-table scans — check with\n`EXPLAIN` when in doubt.\n",{"id":550,"difficulty":127,"q":551,"a":552},"array-types","When should you use array columns (Postgres) instead of a child table?","Postgres supports **`ARRAY` columns** that hold a list of any base type\n(`INTEGER[]`, `TEXT[]`, `UUID[]`). They can save a join for read-heavy\ndenormalized patterns but lose referential integrity and are harder to\nindex and update partially.\n\n```sql\n-- Array column: fast read, no join, but no FK enforcement\nCREATE TABLE articles (\n  id   SERIAL PRIMARY KEY,\n  tags TEXT[] NOT NULL DEFAULT '{}'\n);\nCREATE INDEX idx_articles_tags ON articles USING GIN (tags);\n\nSELECT * FROM articles WHERE tags @> ARRAY['sql', 'indexing'];\n\n-- Child table: full relational integrity, flexible queries\nCREATE TABLE article_tags (\n  article_id INT REFERENCES articles(id),\n  tag        TEXT NOT NULL,\n  PRIMARY KEY (article_id, tag)\n);\n```\n\n**Rule of thumb:** use arrays when the list is small, ordered, read far more\nthan written, and does not need referential integrity or per-element queries.\nUse a child table when you need FK constraints, ordering, or per-row metadata.\n",16,{"description":104},"SQL data types interview questions — numeric, character, date\u002Ftime, boolean, JSON, NULL semantics, and how to choose the right type across Postgres, MySQL, and SQL Server.","sql\u002Fschema\u002Fdata-types","JffWumXQ_4Kaqd6EvuOyCKiWfy7w6VhFSYtqDFgQngk",{"id":559,"title":560,"body":561,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":565,"navigation":109,"order":14,"path":566,"questions":567,"questionsCount":323,"related":247,"seo":628,"seoDescription":629,"stem":630,"subtopic":631,"topic":91,"topicSlug":93,"updated":328,"__hash__":632},"qa\u002Fsql\u002Fsecurity\u002Fpermissions.md","Permissions",{"type":101,"value":562,"toc":563},[],{"title":104,"searchDepth":30,"depth":30,"links":564},[],{},"\u002Fsql\u002Fsecurity\u002Fpermissions",[568,572,576,580,584,588,592,596,600,604,608,612,616,620,624],{"id":569,"difficulty":114,"q":570,"a":571},"grant-revoke","What do GRANT and REVOKE do?","`GRANT` gives a database user or role permission to perform an operation.\n`REVOKE` removes a previously granted permission.\n\n```sql\n-- Grant SELECT on one table\nGRANT SELECT ON orders TO analyst_user;\n\n-- Grant multiple privileges at once\nGRANT SELECT, INSERT, UPDATE ON products TO app_user;\n\n-- Grant on all tables in a schema (Postgres)\nGRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly_role;\n\n-- Grant to a role (not a user directly)\nGRANT SELECT ON customers TO reporting_role;\n\n-- Revoke a specific privilege\nREVOKE INSERT ON products FROM app_user;\n\n-- Revoke all privileges on a table\nREVOKE ALL PRIVILEGES ON orders FROM analyst_user;\n```\n\n**Rule of thumb:** always grant to **roles**, not individual users. Assign\nusers to roles. This makes permission management scalable — add a new\nemployee by assigning them to the correct role, not by running a dozen\n`GRANT` statements.\n",{"id":573,"difficulty":114,"q":574,"a":575},"least-privilege","What is the principle of least privilege and how does it apply to SQL?","**Least privilege** means giving each user or service account only the\nminimum database permissions needed to perform its function — no more.\n\n```sql\n-- Application database user: only needs to read\u002Fwrite its own tables\nCREATE ROLE app_role;\nGRANT SELECT, INSERT, UPDATE, DELETE ON orders    TO app_role;\nGRANT SELECT, INSERT, UPDATE, DELETE ON customers TO app_role;\nGRANT SELECT ON products TO app_role;  -- read-only on the product catalogue\n-- NOT granted: DROP TABLE, CREATE TABLE, TRUNCATE, ALTER\n\n-- Reporting user: read-only access\nCREATE ROLE reporting_role;\nGRANT SELECT ON ALL TABLES IN SCHEMA public TO reporting_role;\n\n-- Migrations user: only runs during deploys\nCREATE ROLE migration_role;\nGRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO migration_role;\nGRANT CREATE ON SCHEMA public TO migration_role;\n```\n\n**Rule of thumb:** the application's runtime database user should never\nhave `DROP TABLE`, `TRUNCATE`, or `ALTER TABLE` privileges. Use a separate\nmigration user for schema changes, and revoke it after deploys.\n",{"id":577,"difficulty":106,"q":578,"a":579},"roles","What are roles and how do they differ from users?","A **role** is a named collection of privileges. A **user** is a role that\ncan log in. In Postgres, users and roles are unified — `CREATE USER` is\nsyntactic sugar for `CREATE ROLE … LOGIN`.\n\n```sql\n-- Postgres: create roles\nCREATE ROLE readonly_role;\nCREATE ROLE readwrite_role;\nCREATE ROLE admin_role;\n\n-- Grant privileges to roles\nGRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly_role;\nGRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO readwrite_role;\n\n-- Create users (roles that can log in)\nCREATE ROLE alice LOGIN PASSWORD 'secret';\nCREATE ROLE bob   LOGIN PASSWORD 'secret';\n\n-- Assign users to roles\nGRANT readonly_role  TO alice;\nGRANT readwrite_role TO bob;\n\n-- Alice now inherits all privileges of readonly_role\n```\n\nSQL Server uses `CREATE LOGIN` (server-level) + `CREATE USER` (database-level)\n+ `CREATE ROLE` as separate concepts.\n\n**Rule of thumb:** define a small set of roles that match your access\npatterns (read-only, read-write, admin, migration). Add users to roles\nrather than granting privileges to individual users — it is far easier to\naudit and maintain.\n",{"id":581,"difficulty":106,"q":582,"a":583},"grant-with-grant-option","What does WITH GRANT OPTION do?","`WITH GRANT OPTION` allows the grantee to re-grant the same privilege\nto other users or roles.\n\n```sql\n-- Alice can SELECT on orders AND can grant that to others\nGRANT SELECT ON orders TO alice WITH GRANT OPTION;\n\n-- Alice can now do:\nGRANT SELECT ON orders TO bob;  -- valid because alice has GRANT OPTION\n\n-- Revoke cascades to anyone alice granted to\nREVOKE SELECT ON orders FROM alice CASCADE;\n-- Bob loses SELECT too, because it came from alice\n```\n\n`WITH GRANT OPTION` creates a chain of trust that is hard to audit — Bob's\naccess depends on Alice's access, and revoking Alice's access removes Bob's.\n\n**Rule of thumb:** avoid `WITH GRANT OPTION` in most cases — it makes\npermission chains hard to audit and revocations unpredictable. Only use it\nfor schema owners or DBA roles that are explicitly responsible for managing\naccess.\n",{"id":585,"difficulty":127,"q":586,"a":587},"row-level-security","What is Row-Level Security (RLS) in Postgres?","**Row-Level Security (RLS)** enforces access policies at the table level —\nthe database automatically filters rows based on the current user, hiding\nrows that the policy says the user cannot see.\n\n```sql\n-- Enable RLS on the table\nALTER TABLE orders ENABLE ROW LEVEL SECURITY;\n\n-- Policy: users can only see their own orders\nCREATE POLICY own_orders ON orders\n  FOR ALL\n  TO app_role\n  USING (customer_id = current_setting('app.current_user_id')::INT);\n\n-- Application sets the config before every query\nSET app.current_user_id = '42';\nSELECT * FROM orders;  -- automatically filtered to customer_id = 42\n\n-- Bypass RLS (table owner and superuser bypass by default)\nALTER TABLE orders FORCE ROW LEVEL SECURITY;  -- forces even for table owner\n```\n\n**Rule of thumb:** use RLS for multi-tenant applications where every table\nquery must be tenant-scoped. It enforces the isolation at the database level\n— a bug in application code cannot accidentally expose another tenant's data.\n",{"id":589,"difficulty":106,"q":590,"a":591},"schema-permissions","How do schema-level permissions work?","In Postgres, a user must have `USAGE` on a **schema** before they can access\nany objects within it, even if they have `SELECT` on individual tables.\n\n```sql\n-- Step 1: grant schema access\nGRANT USAGE ON SCHEMA reporting TO analyst_role;\n\n-- Step 2: grant table access within the schema\nGRANT SELECT ON ALL TABLES IN SCHEMA reporting TO analyst_role;\n\n-- Ensure future tables are also covered (Postgres)\nALTER DEFAULT PRIVILEGES IN SCHEMA reporting\n  GRANT SELECT ON TABLES TO analyst_role;\n\n-- Deny schema access entirely\nREVOKE USAGE ON SCHEMA private_data FROM analyst_role;\n-- Now analyst_role cannot see any tables in private_data schema\n```\n\n**Rule of thumb:** use separate schemas (`app`, `reporting`, `audit`,\n`staging`) and grant schema `USAGE` per role. This lets you grant broad\naccess to an entire schema with two `GRANT` statements instead of one per\ntable.\n",{"id":593,"difficulty":106,"q":594,"a":595},"column-level-grants","Can you grant permissions on specific columns rather than whole tables?","Yes — SQL supports column-level `SELECT`, `INSERT`, and `UPDATE` grants.\nThis lets you expose some columns of a sensitive table while hiding others\n(e.g., show names but not salaries).\n\n```sql\n-- Grant SELECT on specific columns only\nGRANT SELECT (id, name, department) ON employees TO hr_report_role;\n-- hr_report_role can NOT read salary or ssn columns\n\n-- Grant UPDATE on specific columns\nGRANT UPDATE (email, phone) ON users TO support_role;\n-- support_role can update contact info but not password_hash\n\n-- Revoke column-level grant\nREVOKE SELECT (salary) ON employees FROM payroll_role;\n```\n\nColumn-level grants work in Postgres, SQL Server, and MySQL but are\ncomplex to manage. An alternative is to create a **view** that exposes\nonly the allowed columns and grant `SELECT` on the view instead.\n\n**Rule of thumb:** prefer a view over column-level grants for hiding\nsensitive columns — views are easier to discover, test, and document.\nUse column-level grants when a view is not practical (e.g., you need\nwrite permissions on specific columns).\n",{"id":597,"difficulty":106,"q":598,"a":599},"superuser-privileges","What are superuser privileges and why should you avoid using them for applications?","A **superuser** (Postgres) or **sysadmin** (SQL Server) bypasses all\npermission checks and can do anything in the database — create\u002Fdrop\ndatabases, bypass RLS, read any table, impersonate other users.\n\n```sql\n-- Check if current user is a superuser (Postgres)\nSELECT current_user, usesuper FROM pg_user WHERE usename = current_user;\n\n-- Create a non-superuser admin for routine work\nCREATE ROLE dba_role CREATEDB CREATEROLE;  -- can manage DBs and roles, not superuser\n\n-- Application connection string should NEVER use a superuser\n-- BAD:  postgresql:\u002F\u002Fpostgres:password@host\u002Fdb\n-- GOOD: postgresql:\u002F\u002Fapp_user:password@host\u002Fdb (limited privileges)\n```\n\n**Rule of thumb:** the application's database connection string must never\nuse a superuser account. Use superuser credentials only for database\nadministration tasks, run from a secured bastion host or local machine —\nnever from application servers.\n",{"id":601,"difficulty":127,"q":602,"a":603},"audit-logging","How do you audit who changed what in a database?","Audit logging records which user performed which action and when. Common\napproaches:\n\n1. **Application-level**: record the actor and action in an audit table\n   from application code.\n2. **Trigger-based**: a database trigger automatically writes to an audit\n   table on every `INSERT`\u002F`UPDATE`\u002F`DELETE`.\n3. **Extension\u002Ffeature**: `pgaudit` (Postgres), SQL Server Audit, MySQL\n   General Log.\n\n```sql\n-- Postgres: trigger-based audit log\nCREATE TABLE audit_log (\n  id         BIGSERIAL PRIMARY KEY,\n  table_name TEXT NOT NULL,\n  operation  TEXT NOT NULL,  -- INSERT \u002F UPDATE \u002F DELETE\n  old_data   JSONB,\n  new_data   JSONB,\n  changed_by TEXT NOT NULL DEFAULT current_user,\n  changed_at TIMESTAMPTZ NOT NULL DEFAULT now()\n);\n\nCREATE OR REPLACE FUNCTION audit_trigger() RETURNS TRIGGER AS $$\nBEGIN\n  INSERT INTO audit_log (table_name, operation, old_data, new_data)\n  VALUES (TG_TABLE_NAME, TG_OP,\n          CASE WHEN TG_OP = 'DELETE' THEN row_to_json(OLD)::jsonb END,\n          CASE WHEN TG_OP \u003C> 'DELETE' THEN row_to_json(NEW)::jsonb END);\n  RETURN NEW;\nEND;\n$$ LANGUAGE plpgsql;\n\nCREATE TRIGGER orders_audit\n  AFTER INSERT OR UPDATE OR DELETE ON orders\n  FOR EACH ROW EXECUTE FUNCTION audit_trigger();\n```\n\n**Rule of thumb:** trigger-based auditing is comprehensive but adds write\nlatency. For compliance-grade auditing, use a dedicated extension\n(`pgaudit`) or the database's native audit feature — they capture more\nevents (reads, DDL) and cannot be bypassed by application code that\ncircumvents triggers.\n",{"id":605,"difficulty":127,"q":606,"a":607},"revoking-public-schema","Why is the PUBLIC schema dangerous in Postgres and how do you secure it?","In Postgres, every user has `CREATE` and `USAGE` on the `public` schema by\ndefault (before Postgres 15). Any user can create tables there, potentially\n**shadowing** system functions or other users' objects via the search path.\n\n```sql\n-- Check who has what on the public schema\nSELECT grantee, privilege_type\nFROM   information_schema.role_schema_grants\nWHERE  schema_name = 'public';\n\n-- Revoke CREATE from all non-superusers (Postgres \u003C 15)\nREVOKE CREATE ON SCHEMA public FROM PUBLIC;\n\n-- Postgres 15+: CREATE is revoked from PUBLIC by default\n-- but USAGE is still granted — revoke if needed\nREVOKE USAGE ON SCHEMA public FROM PUBLIC;\n\n-- Application code should use an explicit schema, not rely on search_path\nSET search_path = app, public;\n-- Or set it per user:\nALTER ROLE app_user SET search_path = app;\n```\n\n**Rule of thumb:** on a shared database, immediately revoke `CREATE ON\nSCHEMA public FROM PUBLIC` (Postgres \u003C 15) and put application objects in\na dedicated schema. Set `search_path` explicitly for each role to prevent\nsearch-path hijacking attacks.\n",{"id":609,"difficulty":106,"q":610,"a":611},"password-policies","How do you manage database user passwords securely?","```sql\n-- Postgres: create a user with a password\nCREATE ROLE app_user LOGIN PASSWORD 'str0ng-p@ssw0rd!';\n\n-- Set password expiry (force rotation)\nALTER ROLE app_user VALID UNTIL '2026-12-31';\n\n-- Use SCRAM-SHA-256 authentication (more secure than md5)\n-- In pg_hba.conf: host all all 0.0.0.0\u002F0 scram-sha-256\n-- Then:\nSET password_encryption = 'scram-sha-256';\nALTER ROLE app_user PASSWORD 'new-password';\n\n-- MySQL: create user with strong auth plugin\nCREATE USER 'app_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'str0ng-p@ss';\n\n-- SQL Server: enforce password policy (Windows policy integration)\nCREATE LOGIN app_login WITH PASSWORD = 'str0ng-p@ss!',\n  CHECK_POLICY = ON, CHECK_EXPIRATION = ON;\n```\n\n**Rule of thumb:** use randomly generated, long passwords (32+ characters)\nfor service accounts and store them in a secrets manager (Vault, AWS Secrets\nManager). Rotate passwords automatically. Never hardcode credentials in\napplication source code.\n",{"id":613,"difficulty":106,"q":614,"a":615},"connection-security","How do you restrict which hosts can connect to the database?","Database-level host restrictions add a network layer of access control —\neven if credentials are compromised, connections from unauthorised IPs are\nrejected.\n\n```sql\n-- Postgres: pg_hba.conf (host-based authentication file)\n-- Each line: TYPE  DATABASE  USER  ADDRESS  METHOD\n-- Allow the app server IP only:\nhost   myapp    app_user   10.0.1.5\u002F32    scram-sha-256\n-- Reject everything else:\nhost   myapp    all        0.0.0.0\u002F0      reject\n\n-- MySQL: user accounts include the host\nCREATE USER 'app_user'@'10.0.1.5' IDENTIFIED BY 'password';\n-- This account can ONLY connect from 10.0.1.5\nGRANT ALL ON myapp.* TO 'app_user'@'10.0.1.5';\n\n-- SQL Server: use firewall rules (Azure Portal \u002F Windows Firewall)\n-- plus Windows authentication or IP restrictions in network config\n```\n\n**Rule of thumb:** never expose the database port to the public internet.\nAllow connections only from application servers and VPN\u002Fbastion hosts,\nusing IP allowlists at both the database (`pg_hba.conf`) and network\n(firewall) levels.\n",{"id":617,"difficulty":106,"q":618,"a":619},"view-as-security","How can views be used to implement access control?","By granting access to a **view** instead of the base table, you restrict\nwhat data a user can see — without row-level security policies or column\ngrants.\n\n```sql\n-- Base table: all employees including salary and SSN\n-- View: only expose name, department, and hire date\nCREATE VIEW employee_directory AS\n  SELECT id, full_name, department, hire_date\n  FROM   employees\n  WHERE  terminated_at IS NULL;\n\n-- Grant SELECT on the view only\nGRANT SELECT ON employee_directory TO hr_partner_role;\nREVOKE ALL ON employees FROM hr_partner_role;  -- no direct table access\n\n-- HR partner can now run:\nSELECT * FROM employee_directory WHERE department = 'Engineering';\n-- Cannot see salary, SSN, or terminated employees\n```\n\n**Rule of thumb:** use views as a security layer for read-only access to\nsensitive tables when you want the constraint to be declarative and\nself-documenting. For write-access scenarios, combine views with `INSTEAD\nOF` triggers or handle mutations directly on the restricted columns.\n",{"id":621,"difficulty":106,"q":622,"a":623},"default-privileges","What are default privileges and why do they matter?","**Default privileges** define the permissions automatically applied to\nfuture objects (tables, sequences, functions) created in a schema. Without\nthem, a role that has `SELECT` on all current tables loses access as soon as\na new table is created.\n\n```sql\n-- Postgres: set default privileges for future tables in a schema\n-- Run this as the schema owner:\nALTER DEFAULT PRIVILEGES IN SCHEMA app\n  GRANT SELECT ON TABLES TO readonly_role;\n\nALTER DEFAULT PRIVILEGES IN SCHEMA app\n  GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO readwrite_role;\n\nALTER DEFAULT PRIVILEGES IN SCHEMA app\n  GRANT USAGE, SELECT ON SEQUENCES TO readwrite_role;\n\n-- Verify current default privileges\nSELECT * FROM pg_default_acl;\n\n-- MySQL: no native default privileges; use a provisioning script or IAM\n```\n\n**Rule of thumb:** set `ALTER DEFAULT PRIVILEGES` for every role when you\nfirst create the schema. Without it, each new table deployed in a migration\nrequires a separate `GRANT` — easy to forget, causing silent 403s in\nproduction.\n",{"id":625,"difficulty":106,"q":626,"a":627},"privilege-inspection","How do you inspect what permissions a user or role currently has?","```sql\n-- Postgres: check table-level privileges\nSELECT grantee, table_name, privilege_type\nFROM   information_schema.role_table_grants\nWHERE  grantee = 'analyst_role'\nORDER  BY table_name;\n\n-- Postgres: psql shorthand\n-- \\dp orders           → show ACL for the orders table\n-- \\du analyst_role     → show role attributes and memberships\n-- \\z                   → show ACLs for all tables\n\n-- Postgres: check schema privileges\nSELECT grantee, schema_name, privilege_type\nFROM   information_schema.role_schema_grants\nWHERE  grantee = 'analyst_role';\n\n-- MySQL\nSHOW GRANTS FOR 'app_user'@'%';\n\n-- SQL Server\nSELECT principal_name, object_name, permission_name, state_desc\nFROM   sys.database_permissions dp\nJOIN   sys.database_principals  pr ON dp.grantee_principal_id = pr.principal_id\nWHERE  pr.name = 'analyst_role';\n```\n\n**Rule of thumb:** before revoking or changing permissions, always inspect\nthe current state first. In Postgres, `\\dp \u003Ctablename>` is the fastest\nway to spot unexpected `PUBLIC` grants on sensitive tables.\n",{"description":104},"SQL permissions interview questions — GRANT, REVOKE, roles, least-privilege, row-level security, column-level grants, schema ownership, and access control patterns across Postgres, MySQL, and SQL Server.","sql\u002Fsecurity\u002Fpermissions","Permissions & Roles","HEpF1IbUqXS2me_x56LuGaV5k4-5e2wf_XpPYrAOBv8",{"id":634,"title":635,"body":636,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":672,"navigation":109,"order":14,"path":673,"questions":674,"questionsCount":751,"related":247,"seo":752,"seoDescription":753,"stem":754,"subtopic":635,"topic":29,"topicSlug":31,"updated":328,"__hash__":755},"qa\u002Fsql\u002Fsubqueries\u002Fsubqueries.md","Subqueries",{"type":101,"value":637,"toc":669},[638,643],[639,640,642],"h2",{"id":641},"about-sql-subqueries","About SQL Subqueries",[644,645,646,647,651,652,656,657,660,661,664,665,668],"p",{},"Subqueries let you compose a query out of the results of other queries — filtering on\nan aggregate, testing membership, or computing a per-row value inline. Mastering the\ndifference between ",[648,649,650],"strong",{},"scalar, correlated, and multi-row"," subqueries, and knowing when\n",[653,654,655],"code",{},"EXISTS"," beats ",[653,658,659],{},"IN"," (and why ",[653,662,663],{},"NOT IN"," is dangerous with ",[653,666,667],{},"NULL","s), is a staple of SQL\ninterviews.",{"title":104,"searchDepth":30,"depth":30,"links":670},[671],{"id":641,"depth":30,"text":642},{},"\u002Fsql\u002Fsubqueries\u002Fsubqueries",[675,679,683,687,691,695,699,703,707,711,715,719,723,727,731,735,739,743,747],{"id":676,"difficulty":114,"q":677,"a":678},"what-is-a-subquery","What is a subquery in SQL?","A **subquery** (or inner query) is a `SELECT` statement nested inside another\nSQL statement. The database runs the inner query and feeds its result to the\n**outer query**. Subqueries can appear in the `SELECT`, `FROM`, `WHERE`, or\n`HAVING` clause.\n\n```sql\n-- find employees earning more than the company average\nSELECT name, salary\nFROM   employees\nWHERE  salary > (SELECT AVG(salary) FROM employees);  -- inner query first\n```\n\nThe parentheses are required, and a subquery is always **fully enclosed** in\nthem. The inner query here returns a single value the outer `WHERE` compares\nagainst.\n\nRule of thumb: a subquery lets you use the result of one query as an input to\nanother, without a temporary table.\n",{"id":680,"difficulty":114,"q":681,"a":682},"types-of-subqueries","What are the main types of subqueries?","Subqueries are classified by **what they return** and **whether they depend on\nthe outer query**:\n\n- **Scalar subquery** — returns a single value (one row, one column).\n- **Row subquery** — returns a single row of multiple columns.\n- **Table \u002F multi-row subquery** — returns many rows (used with `IN`, `ANY`, `EXISTS`).\n- **Correlated subquery** — references a column from the outer query, so it\n  re-runs per outer row.\n- **Non-correlated subquery** — self-contained; runs once independently.\n\n```sql\nSELECT (SELECT MAX(price) FROM products) AS top_price;  -- scalar\n```\n\nRule of thumb: classify a subquery by its cardinality (scalar \u002F row \u002F table)\nand its dependency (correlated \u002F not) — both drive which operators you can use\nand how it performs.\n",{"id":684,"difficulty":114,"q":685,"a":686},"scalar-subquery","What is a scalar subquery and where can you use it?","A **scalar subquery** returns **exactly one row and one column** — a single\nvalue. Because it resolves to a value, you can use it almost anywhere an\nexpression is allowed: `SELECT`, `WHERE`, `HAVING`, even inside arithmetic.\n\n```sql\nSELECT name,\n       salary,\n       salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg\nFROM   employees;\n```\n\nIf a scalar subquery returns **more than one row**, the query errors at runtime\n(`more than one row returned by a subquery used as an expression`). If it\nreturns **no rows**, it yields `NULL`.\n\nRule of thumb: use a scalar subquery wherever you'd write a single value —\nguarantee it returns at most one row, often with an aggregate or `LIMIT 1`.\n",{"id":688,"difficulty":106,"q":689,"a":690},"correlated-subquery","What is a correlated subquery?","A **correlated subquery** references a column from the **outer query**, so it\ncannot run on its own — it is **re-evaluated once per outer row**. Conceptually\nit behaves like a loop.\n\n```sql\n-- employees who earn more than their own department's average\nSELECT e.name, e.salary, e.dept_id\nFROM   employees e\nWHERE  e.salary > (SELECT AVG(s.salary)\n                   FROM   employees s\n                   WHERE  s.dept_id = e.dept_id);  -- e.dept_id = outer ref\n```\n\nThe inner query depends on `e.dept_id`, which changes per outer row. This is\npowerful but can be **slow** on large tables because of the repeated execution\n(though modern optimizers often rewrite it as a join).\n\nRule of thumb: if the inner query mentions an outer table's column, it's\ncorrelated and runs per row — watch its cost.\n",{"id":692,"difficulty":106,"q":693,"a":694},"correlated-vs-non-correlated","What is the difference between a correlated and a non-correlated subquery?","A **non-correlated** subquery is independent — it runs **once**, and its result\nis reused by the outer query. A **correlated** subquery references the outer\nquery and runs **once per outer row**.\n\n```sql\n-- non-correlated: AVG computed a single time\nSELECT name FROM employees\nWHERE salary > (SELECT AVG(salary) FROM employees);\n\n-- correlated: inner query re-runs for each employee row\nSELECT name FROM employees e\nWHERE salary > (SELECT AVG(salary) FROM employees s\n                WHERE s.dept_id = e.dept_id);\n```\n\nYou can test the difference: a non-correlated subquery runs successfully on its\nown; a correlated one errors because the outer column is unknown.\n\nRule of thumb: non-correlated = compute once; correlated = compute per row.\nPrefer non-correlated (or a join) when you can.\n",{"id":696,"difficulty":106,"q":697,"a":698},"in-vs-exists","What is the difference between IN and EXISTS?","Both test membership, but differently. `IN` compares a column against the **list\nof values** a subquery returns. `EXISTS` checks whether the subquery returns\n**any row at all** (a boolean), and is typically **correlated**.\n\n```sql\n-- IN: builds a value list, then matches\nSELECT name FROM customers\nWHERE id IN (SELECT customer_id FROM orders);\n\n-- EXISTS: stops at the first matching row\nSELECT name FROM customers c\nWHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);\n```\n\n`EXISTS` can be faster on large subquery results because it **short-circuits**\nat the first match. `IN` is fine and readable for small, static lists. The big\ngotcha is `NOT IN` with `NULL`s (see the next question).\n\nRule of thumb: use `EXISTS` for correlated existence checks on big tables;\n`IN` for small value lists.\n",{"id":700,"difficulty":127,"q":701,"a":702},"not-in-null-trap","Why can NOT IN behave unexpectedly with NULL values?","If the subquery (or list) behind `NOT IN` contains a single `NULL`, the **whole\n`NOT IN` returns no rows**. That's because `x NOT IN (a, b, NULL)` expands to\n`x \u003C> a AND x \u003C> b AND x \u003C> NULL`, and `x \u003C> NULL` is `UNKNOWN`, which makes the\nentire `AND` never `TRUE`.\n\n```sql\n-- if any customer_id is NULL, this returns ZERO rows unexpectedly\nSELECT name FROM customers\nWHERE id NOT IN (SELECT customer_id FROM orders);\n\n-- safe alternatives:\nSELECT name FROM customers c\nWHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);\n-- or filter the NULLs: ... WHERE customer_id IS NOT NULL\n```\n\n`NOT EXISTS` is **NULL-safe** and usually the right tool.\n\nRule of thumb: avoid `NOT IN` over a nullable column — use `NOT EXISTS` instead.\n",{"id":704,"difficulty":106,"q":705,"a":706},"any-all-operators","What do the ANY and ALL operators do with subqueries?","`ANY` (synonym `SOME`) and `ALL` compare a value against a **set** returned by a\nsubquery, combined with a comparison operator.\n\n- `> ANY (...)` → greater than **at least one** value (i.e. greater than the min).\n- `> ALL (...)` → greater than **every** value (i.e. greater than the max).\n\n```sql\n-- products pricier than the cheapest product in category 5\nSELECT name FROM products\nWHERE price > ANY (SELECT price FROM products WHERE category_id = 5);\n\n-- products pricier than ALL products in category 5\nSELECT name FROM products\nWHERE price > ALL (SELECT price FROM products WHERE category_id = 5);\n```\n\nNote `= ANY` is equivalent to `IN`, and `\u003C> ALL` is equivalent to `NOT IN`.\n\nRule of thumb: `ANY` = matches some; `ALL` = matches every — often clearer when\nrewritten with `MIN`\u002F`MAX`.\n",{"id":708,"difficulty":114,"q":709,"a":710},"subquery-in-where","How are subqueries used in the WHERE clause?","A `WHERE`-clause subquery **filters** outer rows using a value or set computed by\nthe inner query. The operator must match the subquery's cardinality: use `=`,\n`\u003C`, `>` with **scalar** subqueries, and `IN`\u002F`EXISTS`\u002F`ANY`\u002F`ALL` with\n**multi-row** subqueries.\n\n```sql\n-- scalar comparison\nSELECT name FROM products\nWHERE price > (SELECT AVG(price) FROM products);\n\n-- multi-row membership\nSELECT name FROM products\nWHERE category_id IN (SELECT id FROM categories WHERE active = true);\n```\n\nUsing `=` with a subquery that returns multiple rows is a runtime error.\n\nRule of thumb: scalar subquery → comparison operator; multi-row subquery →\n`IN`\u002F`EXISTS`\u002F`ANY`\u002F`ALL`.\n",{"id":712,"difficulty":106,"q":713,"a":714},"subquery-in-select","How do you use a subquery in the SELECT list?","A subquery in the `SELECT` list must be **scalar** — it produces one extra\ncomputed column per output row, and is usually **correlated** to the outer row.\n\n```sql\nSELECT c.name,\n       (SELECT COUNT(*) FROM orders o\n        WHERE o.customer_id = c.id) AS order_count\nFROM   customers c;\n```\n\nThis runs the inner `COUNT` for each customer. It's readable but can be slow at\nscale; a `LEFT JOIN ... GROUP BY` or window function is often faster.\n\n```sql\n-- usually faster equivalent\nSELECT c.name, COUNT(o.id) AS order_count\nFROM   customers c\nLEFT JOIN orders o ON o.customer_id = c.id\nGROUP BY c.name;\n```\n\nRule of thumb: a `SELECT`-list subquery is a convenient per-row lookup — switch\nto a join when performance matters.\n",{"id":716,"difficulty":106,"q":717,"a":718},"derived-table","What is a derived table (subquery in the FROM clause)?","A **derived table** is a subquery in the `FROM` clause that acts as a temporary,\ninline table for the outer query. It **must be given an alias**, and you can join\nto it or filter it like any table.\n\n```sql\n-- average order value per customer, then filter the big spenders\nSELECT customer_id, avg_value\nFROM (\n    SELECT customer_id, AVG(total) AS avg_value\n    FROM   orders\n    GROUP BY customer_id\n) AS customer_avgs            -- alias is mandatory\nWHERE avg_value > 500;\n```\n\nDerived tables let you **filter on aggregates** (which `WHERE` can't do directly)\nand break complex logic into stages. CTEs do the same thing with cleaner syntax.\n\nRule of thumb: use a derived table to compute an intermediate result set you then\nquery — always alias it.\n",{"id":720,"difficulty":106,"q":721,"a":722},"subquery-vs-join","When should you use a subquery versus a join?","They often produce the same result; choose by **readability and performance**.\n\n- **Join** — best when you need columns from **both** tables in the output, and\n  usually optimizes well. Can multiply rows if the relationship is one-to-many.\n- **Subquery** — best for **existence\u002Fmembership tests** (`EXISTS`, `IN`) or a\n  single computed value, where you don't want the other table's columns. Won't\n  accidentally duplicate rows.\n\n```sql\n-- subquery: just \"do they have orders?\" — no duplication\nSELECT name FROM customers c\nWHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);\n\n-- join: need order details too\nSELECT c.name, o.total\nFROM customers c JOIN orders o ON o.customer_id = c.id;\n```\n\nMost optimizers rewrite `IN`\u002F`EXISTS` into semi-joins anyway, so they perform\nsimilarly.\n\nRule of thumb: need the other table's columns → join; just testing existence →\nsubquery.\n",{"id":724,"difficulty":106,"q":725,"a":726},"nested-subqueries","What are nested subqueries?","A **nested subquery** is a subquery that itself contains another subquery —\nqueries nested several levels deep. The innermost runs first, feeding its result\noutward.\n\n```sql\n-- customers in the region with the highest total sales\nSELECT name FROM customers\nWHERE region_id = (\n    SELECT region_id FROM sales\n    GROUP BY region_id\n    ORDER BY SUM(amount) DESC\n    LIMIT 1\n);\n```\n\nMost databases allow many levels of nesting, but deep nesting hurts readability\nand can hurt performance. **CTEs** are the standard cure — they flatten nested\nlogic into named, sequential steps.\n\nRule of thumb: a couple of nesting levels is fine; beyond that, refactor into\nCTEs for clarity.\n",{"id":728,"difficulty":114,"q":729,"a":730},"subquery-returns-multiple-rows-error","Why does \"subquery returns more than one row\" error occur?","This error means you used a subquery in a **scalar context** (with `=`, `\u003C`, `>`,\nor in the `SELECT` list), but it returned **more than one row**. A scalar context\nexpects a single value.\n\n```sql\n-- ERROR if multiple employees share the max salary\nSELECT name FROM employees\nWHERE salary = (SELECT salary FROM employees ORDER BY salary DESC);\n\n-- fixes: aggregate, LIMIT, or switch operator\nWHERE salary = (SELECT MAX(salary) FROM employees);   -- scalar\nWHERE salary IN (SELECT salary FROM employees ORDER BY salary DESC LIMIT 5);\n```\n\nEither guarantee one row (`MAX`, `LIMIT 1`) or use a set operator (`IN`, `ANY`).\n\nRule of thumb: scalar operators need a one-row subquery — add an aggregate or\n`LIMIT`, or switch to `IN`.\n",{"id":732,"difficulty":106,"q":733,"a":734},"subquery-execution-order","In what order are subqueries executed?","Conceptually, a **non-correlated** subquery runs **first**, once, and its result\nis substituted into the outer query. A **correlated** subquery runs **per outer\nrow**, after the outer row is available.\n\n```sql\n-- non-correlated inner query evaluated once, then outer filters\nSELECT name FROM products\nWHERE price > (SELECT AVG(price) FROM products);\n```\n\nIn practice the **query optimizer** decides the real execution plan — it may\nrewrite a subquery as a join, cache a correlated result, or reorder operations.\nThe logical \"inner first\" model is for reasoning, not a literal guarantee.\n\nRule of thumb: reason as \"inner first (or per-row if correlated),\" but trust the\noptimizer for the actual plan — check `EXPLAIN`.\n",{"id":736,"difficulty":127,"q":737,"a":738},"correlated-subquery-performance","How can you improve the performance of a correlated subquery?","Correlated subqueries re-run per outer row, so the fixes reduce that repetition:\n\n- **Rewrite as a join** (often a semi-join) or a derived table aggregated once.\n- **Index** the correlated column the inner query filters on.\n- Use **window functions** to compute per-group values in a single pass.\n\n```sql\n-- correlated (per-row AVG):\nSELECT name FROM employees e\nWHERE salary > (SELECT AVG(salary) FROM employees s WHERE s.dept_id = e.dept_id);\n\n-- window-function rewrite (single pass):\nSELECT name FROM (\n    SELECT name, salary, AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg\n    FROM employees\n) t\nWHERE salary > dept_avg;\n```\n\nRule of thumb: if a correlated subquery is hot, rewrite it as a join or window\nfunction and index the join key.\n",{"id":740,"difficulty":106,"q":741,"a":742},"subquery-in-having","Can you use a subquery in the HAVING clause?","Yes. A `HAVING` subquery filters **groups** by comparing an aggregate against a\nvalue the subquery computes — useful when the threshold itself comes from a query.\n\n```sql\n-- departments whose average salary beats the company-wide average\nSELECT dept_id, AVG(salary) AS dept_avg\nFROM   employees\nGROUP  BY dept_id\nHAVING AVG(salary) > (SELECT AVG(salary) FROM employees);\n```\n\nThe subquery here is scalar and non-correlated, computed once. `HAVING` runs\nafter grouping, so it can compare group aggregates to the subquery's value.\n\nRule of thumb: use a `HAVING` subquery when you filter groups against a value\ncomputed elsewhere.\n",{"id":744,"difficulty":106,"q":745,"a":746},"subquery-with-insert-update-delete","Can subqueries be used in INSERT, UPDATE, and DELETE statements?","Yes — subqueries work in DML, not just `SELECT`.\n\n```sql\n-- INSERT ... SELECT\nINSERT INTO archived_orders\nSELECT * FROM orders WHERE created_at \u003C '2024-01-01';\n\n-- UPDATE with a correlated subquery\nUPDATE products p\nSET    avg_category_price = (SELECT AVG(price) FROM products x\n                             WHERE x.category_id = p.category_id);\n\n-- DELETE using a subquery in WHERE\nDELETE FROM customers\nWHERE id NOT IN (SELECT customer_id FROM orders WHERE customer_id IS NOT NULL);\n```\n\nThis lets you drive modifications off other tables' data. Watch the `NOT IN`\u002FNULL\ntrap in `DELETE`.\n\nRule of thumb: subqueries make DML data-driven — `INSERT ... SELECT`, correlated\n`UPDATE`, and `WHERE` subqueries in `DELETE`.\n",{"id":748,"difficulty":106,"q":749,"a":750},"subquery-vs-cte","What is the difference between a subquery and a CTE?","A **CTE** (Common Table Expression, `WITH` clause) is a named, reusable result set\ndefined before the main query. A **subquery** is inline and anonymous. They're\nlogically similar — a CTE is often just a more readable derived table.\n\n```sql\n-- subquery (derived table)\nSELECT * FROM (SELECT dept_id, AVG(salary) a FROM employees GROUP BY dept_id) t\nWHERE a > 50000;\n\n-- CTE equivalent\nWITH dept_avg AS (\n    SELECT dept_id, AVG(salary) AS a FROM employees GROUP BY dept_id\n)\nSELECT * FROM dept_avg WHERE a > 50000;\n```\n\nA CTE can be **referenced multiple times** in the same query, supports\n**recursion**, and reads top-to-bottom — advantages a derived table lacks.\n\nRule of thumb: reach for a CTE when the logic is reused, recursive, or complex\nenough that names aid readability; a subquery for quick one-off nesting.\n",19,{"description":104},"SQL subquery interview questions — scalar vs correlated subqueries, IN vs EXISTS, derived tables, subqueries in SELECT\u002FFROM\u002FWHERE, and performance.","sql\u002Fsubqueries\u002Fsubqueries","orQNx0a4_px7--wFgVCQ4cieuTolAE9lZKNbU6YrfuE",{"id":757,"title":64,"body":758,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":762,"navigation":109,"order":14,"path":763,"questions":764,"questionsCount":323,"related":247,"seo":825,"seoDescription":826,"stem":827,"subtopic":828,"topic":64,"topicSlug":66,"updated":328,"__hash__":829},"qa\u002Fsql\u002Ftransactions\u002Ftransactions.md",{"type":101,"value":759,"toc":760},[],{"title":104,"searchDepth":30,"depth":30,"links":761},[],{},"\u002Fsql\u002Ftransactions\u002Ftransactions",[765,769,773,777,781,785,789,793,797,801,805,809,813,817,821],{"id":766,"difficulty":114,"q":767,"a":768},"what-is-transaction","What is a database transaction?","A **transaction** is a sequence of one or more SQL statements that the\ndatabase treats as a single unit of work — either **all succeed** or\n**none take effect**. Transactions ensure that partial failures never leave\nthe database in an inconsistent state.\n\n```sql\n-- Transfer $100 from account 1 to account 2 atomically\nBEGIN;\n  UPDATE accounts SET balance = balance - 100 WHERE id = 1;\n  UPDATE accounts SET balance = balance + 100 WHERE id = 2;\nCOMMIT;\n-- If either UPDATE fails, ROLLBACK is triggered and neither change persists.\n```\n\nWithout a transaction, a crash between the two UPDATEs would subtract $100\nfrom account 1 but never add it to account 2 — money disappears.\n\n**Rule of thumb:** wrap any sequence of writes that must succeed or fail\ntogether in a single transaction. A transaction that touches only one row\nis still valid — it gives you the crash-recovery guarantee.\n",{"id":770,"difficulty":114,"q":771,"a":772},"acid-properties","What are the ACID properties?","**ACID** is the set of guarantees a database must provide for transactions\nto be reliable:\n\n- **Atomicity** — the transaction is all-or-nothing. A failure at any point\n  rolls back every change made so far.\n- **Consistency** — a transaction can only bring the database from one valid\n  state to another. All constraints, triggers, and rules still hold after\n  the commit.\n- **Isolation** — concurrent transactions do not see each other's uncommitted\n  changes. The degree of isolation is configurable (see isolation levels).\n- **Durability** — once committed, the changes survive crashes. The database\n  writes them to persistent storage (WAL \u002F redo log) before acknowledging\n  the commit.\n\n```sql\n-- Atomicity: if the second UPDATE fails, the first is rolled back\nBEGIN;\n  UPDATE orders SET status = 'shipped' WHERE id = 42;\n  INSERT INTO shipments (order_id, shipped_at) VALUES (42, now()); -- fails?\nCOMMIT; -- only reaches here if both statements succeed\n```\n\n**Rule of thumb:** when someone asks \"how does your database prevent X?\" the\nanswer maps to one of the four ACID letters. Know which letter covers which\nclass of problem.\n",{"id":774,"difficulty":114,"q":775,"a":776},"begin-commit-rollback","What do BEGIN, COMMIT, and ROLLBACK do?","- **`BEGIN`** (or `START TRANSACTION`) opens a new transaction. All\n  subsequent statements are part of this transaction until it is ended.\n- **`COMMIT`** permanently saves all changes made in the transaction and\n  releases any locks held.\n- **`ROLLBACK`** discards all changes made since `BEGIN` and releases locks.\n  The database returns to the state it was in before the transaction started.\n\n```sql\nBEGIN;\n  INSERT INTO invoices (customer_id, total) VALUES (7, 299.99);\n  UPDATE accounts SET balance = balance - 299.99 WHERE customer_id = 7;\nCOMMIT;   -- both rows persist\n\n-- Error path\nBEGIN;\n  DELETE FROM orders WHERE id = 99;\nROLLBACK; -- deletion is undone\n```\n\nIn Postgres, a failed statement inside a transaction automatically puts the\ntransaction into an **error state** — further statements are rejected until\nyou issue `ROLLBACK` (or `ROLLBACK TO SAVEPOINT`).\n\n**Rule of thumb:** always pair every `BEGIN` with either a `COMMIT` or a\n`ROLLBACK`. An open transaction that is never closed holds locks and blocks\nother sessions.\n",{"id":778,"difficulty":114,"q":779,"a":780},"autocommit","What is autocommit and how does it affect transactions?","In **autocommit mode** (the default in most databases), every SQL statement\nthat is not inside an explicit `BEGIN` block is automatically wrapped in its\nown single-statement transaction and committed immediately.\n\n```sql\n-- Autocommit ON (default): each statement is its own transaction\nUPDATE users SET email = 'new@example.com' WHERE id = 1;\n-- ^ committed immediately, cannot be rolled back\n\n-- Explicit transaction: autocommit suspended until COMMIT\u002FROLLBACK\nBEGIN;\n  UPDATE users SET email = 'new@example.com' WHERE id = 1;\n  -- still uncommitted, can still ROLLBACK\nCOMMIT;\n```\n\n- **Postgres**: autocommit is on by default; `BEGIN` suspends it.\n- **MySQL**: autocommit is on by default; `SET autocommit = 0` or `BEGIN` turns it off.\n- **SQL Server**: autocommit is on by default; `BEGIN TRANSACTION` starts an explicit one.\n\n**Rule of thumb:** never rely on autocommit for multi-statement operations.\nAlways open an explicit transaction when two or more writes must be atomic.\n",{"id":782,"difficulty":106,"q":783,"a":784},"savepoint","What is a SAVEPOINT and when would you use one?","A **SAVEPOINT** marks a point within a transaction that you can roll back\nto without aborting the entire transaction. This lets you recover from a\npartial failure while keeping the work done before the savepoint.\n\n```sql\nBEGIN;\n  INSERT INTO orders (customer_id, total) VALUES (5, 100.00);\n\n  SAVEPOINT after_order;\n\n  INSERT INTO order_items (order_id, product_id, qty) VALUES (99, 1, 2);\n  -- ^ suppose this fails (e.g. FK violation)\n\n  ROLLBACK TO SAVEPOINT after_order;\n  -- the orders INSERT is still intact; only order_items is undone\n\n  -- Try a different fix or log the error, then commit what we have\nCOMMIT;\n```\n\nSavepoints are especially useful in ORMs and application frameworks that\nwrap nested operations in sub-transactions.\n\n**Rule of thumb:** use savepoints for \"nested transaction\" patterns — when\na library or service call may fail but you want to keep the outer transaction\nalive. Don't overuse them; a simpler design often avoids the need.\n",{"id":786,"difficulty":106,"q":787,"a":788},"release-savepoint","What does RELEASE SAVEPOINT do?","`RELEASE SAVEPOINT name` destroys the savepoint but **does not commit or\nroll back** the work done since it. The changes remain part of the enclosing\ntransaction and will be committed or rolled back with it.\n\n```sql\nBEGIN;\n  INSERT INTO audit_log (event) VALUES ('start');\n\n  SAVEPOINT sp1;\n  UPDATE config SET value = 'new' WHERE key = 'theme';\n  RELEASE SAVEPOINT sp1;   -- sp1 is gone; UPDATE is still pending\n\n  -- Cannot ROLLBACK TO SAVEPOINT sp1 anymore\nCOMMIT; -- both the INSERT and UPDATE persist\n```\n\nAfter `RELEASE`, you can no longer roll back to that savepoint name.\n`RELEASE` is useful to free memory when you are certain you will not need\nto roll back to a particular point.\n\n**Rule of thumb:** release savepoints once you are confident the work they\ncover is correct. This is a minor housekeeping step; most applications omit\nit since savepoints are released automatically on `COMMIT` or `ROLLBACK`.\n",{"id":790,"difficulty":106,"q":791,"a":792},"implicit-transaction","What is an implicit transaction in SQL Server?","SQL Server supports an **implicit transaction** mode (`SET IMPLICIT_TRANSACTIONS ON`)\nwhere the database automatically begins a transaction before each DML or DDL\nstatement without an explicit `BEGIN TRANSACTION`. The user must still\nissue `COMMIT` or `ROLLBACK` to end it.\n\n```sql\n-- SQL Server with implicit transactions enabled\nSET IMPLICIT_TRANSACTIONS ON;\n\nUPDATE products SET price = price * 1.1; -- transaction auto-started\n-- still uncommitted! Must explicitly end it:\nCOMMIT;\n```\n\nThis differs from autocommit (where each statement commits automatically)\nand from explicit transactions (where you write `BEGIN TRANSACTION`).\nImplicit transactions are an easy source of long-running uncommitted\ntransactions and should be used carefully.\n\n**Rule of thumb:** avoid `IMPLICIT_TRANSACTIONS ON` in SQL Server — it\nsurprises developers who expect autocommit behavior. Prefer explicit\n`BEGIN TRANSACTION … COMMIT` for clarity.\n",{"id":794,"difficulty":106,"q":795,"a":796},"transaction-log","What is the transaction log (WAL) and why does it matter?","The **transaction log** (called **WAL** — Write-Ahead Log — in Postgres)\nrecords every change before it is written to the main data files. On a\ncrash, the database replays the log to bring data back to a consistent\nstate (redo) and removes uncommitted changes (undo).\n\n```\nTimeline of a COMMIT:\n1. Changes are written to the WAL on disk  ← durability guaranteed here\n2. Database acknowledges COMMIT to the client\n3. Changes are eventually flushed from buffer pool to data files\n(crash between steps 2 and 3 is safe — WAL replay restores the data)\n```\n\nThe WAL also powers **replication** (streaming the log to replicas) and\n**point-in-time recovery** (replaying the log up to a specific timestamp).\n\n**Rule of thumb:** understand that `COMMIT` does NOT mean \"data is in the\ntable file\" — it means \"data is in the WAL and therefore durable.\" The\nactual table file update is asynchronous.\n",{"id":798,"difficulty":106,"q":799,"a":800},"long-running-transactions","Why are long-running transactions harmful?","A long-running transaction holds **row\u002Fpage locks** that block other writers,\naccumulates **undo\u002Frollback data** that inflates the database's version store\nor transaction log, and in Postgres prevents **VACUUM** from reclaiming dead\nrow versions (causing table bloat).\n\n```sql\n-- Postgres: find long-running transactions\nSELECT pid,\n       now() - pg_stat_activity.xact_start AS duration,\n       query,\n       state\nFROM   pg_stat_activity\nWHERE  xact_start IS NOT NULL\n  AND  now() - xact_start > INTERVAL '5 minutes'\nORDER  BY duration DESC;\n\n-- Terminate if necessary\nSELECT pg_terminate_backend(pid) FROM pg_stat_activity\nWHERE  now() - xact_start > INTERVAL '1 hour';\n```\n\n**Rule of thumb:** keep transactions as short as possible. Do not hold an\nopen transaction while waiting for user input, making HTTP calls, or doing\nslow computation. Open the transaction, write, commit — then do anything\nslow.\n",{"id":802,"difficulty":106,"q":803,"a":804},"deadlock","What is a deadlock and how does the database resolve it?","A **deadlock** occurs when two (or more) transactions each hold a lock that\nthe other needs, so neither can proceed.\n\n```\nSession A: locks row 1, waits for row 2\nSession B: locks row 2, waits for row 1\n→ circular wait → deadlock\n```\n\nDatabases automatically detect deadlocks via a cycle-detection algorithm\nand resolve them by choosing a **victim** (typically the transaction with\nthe least work done) and rolling it back with an error.\n\n```sql\n-- Avoid deadlocks by always locking rows in the same order\n-- BAD: Session A locks user 1 then order 5; Session B locks order 5 then user 1\n-- GOOD: both sessions always lock by (user_id, order_id) order\n\n-- Postgres: SELECT FOR UPDATE to acquire locks explicitly in order\nSELECT * FROM users WHERE id = 1 FOR UPDATE;\nSELECT * FROM orders WHERE id = 5 FOR UPDATE;\n```\n\n**Rule of thumb:** prevent deadlocks by always acquiring locks in a\n**consistent order** across all transactions. Keep transactions short.\nHandle deadlock errors in application code with a retry loop.\n",{"id":806,"difficulty":127,"q":807,"a":808},"optimistic-vs-pessimistic","What is the difference between optimistic and pessimistic locking?","**Pessimistic locking** acquires a lock *before* reading the data and holds\nit until the transaction commits — preventing any other writer from touching\nthe row during that window.\n\n**Optimistic locking** reads the data without a lock, does work, then\nchecks at write time whether another writer has changed the data since it\nwas read. If yes, it retries rather than committing stale data.\n\n```sql\n-- Pessimistic: lock the row immediately on read\nBEGIN;\n  SELECT * FROM inventory WHERE product_id = 42 FOR UPDATE;\n  -- no other transaction can UPDATE this row until we COMMIT\n  UPDATE inventory SET stock = stock - 1 WHERE product_id = 42;\nCOMMIT;\n\n-- Optimistic: use a version column to detect conflicts\n-- Read phase (no lock):\nSELECT stock, version FROM inventory WHERE product_id = 42;\n-- → stock=10, version=7\n\n-- Write phase: only update if version hasn't changed\nUPDATE inventory\nSET    stock = 9, version = 8\nWHERE  product_id = 42 AND version = 7;\n-- If 0 rows affected → conflict detected → retry\n```\n\n**Rule of thumb:** use pessimistic locking for high-contention resources\n(inventory, seat reservations) where conflicts are frequent. Use optimistic\nlocking for low-contention resources where conflicts are rare — it scales\nbetter by avoiding lock waits.\n",{"id":810,"difficulty":106,"q":811,"a":812},"select-for-update","What does SELECT FOR UPDATE do?","`SELECT FOR UPDATE` reads rows and immediately acquires an **exclusive row\nlock** on each row returned, preventing other transactions from updating or\nlocking those rows until the current transaction commits or rolls back.\n\n```sql\nBEGIN;\n  -- Lock the row so no other session can change it before we write\n  SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;\n  -- → balance = 500\n\n  UPDATE accounts SET balance = balance - 100 WHERE id = 1;\nCOMMIT;\n```\n\nVariants:\n- `FOR SHARE` — shared lock; others can read but not write.\n- `FOR UPDATE SKIP LOCKED` (Postgres, MySQL 8+) — skip rows already locked;\n  useful for job queues where workers should not compete for the same row.\n- `FOR UPDATE NOWAIT` — fail immediately if the row is already locked.\n\n**Rule of thumb:** use `SELECT FOR UPDATE` when you read a value and\nimmediately use it to compute an update — it closes the time-of-check \u002F\ntime-of-use race condition that would exist if the read and write were\nunprotected.\n",{"id":814,"difficulty":127,"q":815,"a":816},"skip-locked","How do you implement a concurrent job queue with SKIP LOCKED?","`SKIP LOCKED` lets multiple workers pull jobs from a queue table without\ncompeting for the same row — each worker skips rows already locked by\nanother worker.\n\n```sql\n-- Schema\nCREATE TABLE job_queue (\n  id         BIGSERIAL PRIMARY KEY,\n  payload    JSONB NOT NULL,\n  status     TEXT NOT NULL DEFAULT 'pending',\n  created_at TIMESTAMPTZ NOT NULL DEFAULT now()\n);\n\n-- Worker: claim one job atomically\nBEGIN;\n  SELECT id, payload\n  FROM   job_queue\n  WHERE  status = 'pending'\n  ORDER  BY created_at\n  LIMIT  1\n  FOR UPDATE SKIP LOCKED;   -- skip rows locked by other workers\n\n  UPDATE job_queue SET status = 'processing' WHERE id = \u003Cclaimed_id>;\nCOMMIT;\n```\n\nEach worker gets a different row. If a worker crashes, the transaction\nrolls back, returning the row to `pending` for another worker to claim.\n\n**Rule of thumb:** `SELECT … FOR UPDATE SKIP LOCKED` is the correct pattern\nfor building a reliable job queue in SQL. It is atomic, crash-safe, and\nscales to many concurrent workers without external coordination.\n",{"id":818,"difficulty":127,"q":819,"a":820},"two-phase-commit","What is two-phase commit (2PC) and when is it used?","**Two-phase commit** is a distributed coordination protocol that ensures\na transaction spanning **multiple independent databases** either commits\non all of them or rolls back on all.\n\n- **Phase 1 (Prepare)**: the coordinator asks each participant to prepare\n  (write to their WAL, acquire locks, but do not commit). Each replies\n  \"yes\" or \"no\".\n- **Phase 2 (Commit\u002FAbort)**: if all said \"yes\", the coordinator tells all\n  to commit. If any said \"no\", the coordinator tells all to abort.\n\n```sql\n-- Postgres prepared transactions (the participant side of 2PC)\nBEGIN;\n  UPDATE accounts SET balance = balance - 100 WHERE id = 1;\nPREPARE TRANSACTION 'txn-xyz-001'; -- phase 1: prepared, not committed\n\n-- Later, coordinator decides to commit or rollback:\nCOMMIT PREPARED 'txn-xyz-001';\n-- or\nROLLBACK PREPARED 'txn-xyz-001';\n```\n\n**Rule of thumb:** 2PC solves cross-database atomicity but adds latency\nand coordinator failure risk. In modern systems, the **Saga pattern**\n(compensating transactions) is often preferred over 2PC for microservice\narchitectures.\n",{"id":822,"difficulty":106,"q":823,"a":824},"transaction-best-practices","What are the most important best practices for writing transactions?","1. **Keep transactions short** — open, write, commit. Do not hold a\n   transaction open while making network calls, waiting for user input, or\n   running slow computations.\n2. **Access tables in a consistent order** — prevents deadlocks across\n   concurrent transactions that touch the same set of tables.\n3. **Handle errors explicitly** — always `ROLLBACK` on any exception; never\n   swallow errors and then commit.\n4. **Do not use transactions for reads alone** — unless you need a consistent\n   snapshot across multiple queries, a plain `SELECT` outside a transaction\n   is cheaper.\n5. **Retry on transient failures** — deadlocks and serialization failures\n   are expected; build retry logic with exponential back-off.\n\n```sql\n-- Pattern: wrap in try\u002Fcatch, always rollback on error (Python psycopg2)\ntry:\n    cur.execute(\"BEGIN\")\n    cur.execute(\"UPDATE ...\")\n    cur.execute(\"INSERT ...\")\n    cur.execute(\"COMMIT\")\nexcept Exception:\n    cur.execute(\"ROLLBACK\")\n    raise\n```\n\n**Rule of thumb:** a transaction should be as wide as necessary (all writes\nthat must be atomic) and no wider. Every extra statement inside a transaction\nis a longer lock hold and a bigger rollback payload.\n",{"description":104},"SQL transactions interview questions — ACID properties, BEGIN\u002FCOMMIT\u002FROLLBACK, SAVEPOINT, autocommit, implicit vs explicit transactions, and best practices across Postgres, MySQL, and SQL Server.","sql\u002Ftransactions\u002Ftransactions","Transactions & ACID","OBbo4RNRAA_7U1YkzIh4CFLgUM6Qp7vn03xJXnQssNU",{"id":831,"title":832,"body":833,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":871,"navigation":109,"order":14,"path":872,"questions":873,"questionsCount":946,"related":247,"seo":947,"seoDescription":948,"stem":949,"subtopic":950,"topic":38,"topicSlug":40,"updated":328,"__hash__":951},"qa\u002Fsql\u002Fwindow-functions\u002Fwindow-basics.md","Window Basics",{"type":101,"value":834,"toc":868},[835,839],[639,836,838],{"id":837},"about-window-function-basics","About Window Function Basics",[644,840,841,842,845,846,849,850,853,854,857,858,861,862,864,865,867],{},"Window functions are SQL's analytics workhorse: they compute aggregates, rankings, and\nrow-to-row comparisons ",[648,843,844],{},"without collapsing rows",", via the ",[653,847,848],{},"OVER"," clause and its\n",[653,851,852],{},"PARTITION BY","\u002F",[653,855,856],{},"ORDER BY"," parts. Interviews probe the core distinction from ",[653,859,860],{},"GROUP BY",",\nwhere windows are (and aren't) allowed, how ",[653,863,856],{}," inside ",[653,866,848],{}," turns an aggregate\ninto a running total, and the CTE pattern for filtering on a window result.",{"title":104,"searchDepth":30,"depth":30,"links":869},[870],{"id":837,"depth":30,"text":838},{},"\u002Fsql\u002Fwindow-functions\u002Fwindow-basics",[874,878,882,886,890,894,898,902,906,910,914,918,922,926,930,934,938,942],{"id":875,"difficulty":114,"q":876,"a":877},"what-is-a-window-function","What is a window function in SQL?","A **window function** performs a calculation across a set of rows — the\n**window** — that are related to the current row, **without collapsing them into\none row**. Unlike a `GROUP BY` aggregate, every input row stays in the output, but\neach gets an extra computed value.\n\n```sql\n-- each employee row keeps its detail AND gets the dept average alongside it\nSELECT name, dept_id, salary,\n       AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg\nFROM   employees;\n```\n\nThe `OVER` clause is what makes a function a window function. Window functions are\nideal for **running totals, rankings, moving averages, and row comparisons**.\n\nRule of thumb: a window function adds a per-row analytic value while keeping every\nrow — think \"aggregate without losing the detail.\"\n",{"id":879,"difficulty":114,"q":880,"a":881},"over-clause","What does the OVER clause do?","The `OVER` clause **defines the window** — the set of rows a window function\noperates on for each row. It can contain three optional parts: `PARTITION BY`\n(split rows into groups), `ORDER BY` (order rows within the window), and a **frame\nclause** (`ROWS`\u002F`RANGE`, narrowing the window further).\n\n```sql\nSELECT name, salary,\n       SUM(salary) OVER (PARTITION BY dept_id ORDER BY hire_date) AS running_total\nFROM   employees;\n```\n\nAn **empty** `OVER ()` makes the window the **entire result set** — every row sees\nall rows.\n\nRule of thumb: `OVER` turns an ordinary function into a window function and\nspecifies which rows it looks at.\n",{"id":883,"difficulty":106,"q":884,"a":885},"window-vs-group-by","What is the difference between a window function and GROUP BY?","`GROUP BY` **collapses** rows — one output row per group. A window function\n**preserves** every row and attaches the aggregate alongside each one.\n\n```sql\n-- GROUP BY: one row per department\nSELECT dept_id, AVG(salary) FROM employees GROUP BY dept_id;\n\n-- window: every employee row, plus their department's average\nSELECT name, dept_id, salary,\n       AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg\nFROM employees;\n```\n\nThis is why you can compare a row to its group (e.g. `salary` vs `dept_avg`) with\na window function — impossible with a bare `GROUP BY` because the detail rows are\ngone.\n\nRule of thumb: `GROUP BY` summarizes into fewer rows; a window function annotates\neach row without reducing the count.\n",{"id":887,"difficulty":114,"q":888,"a":889},"partition-by","What does PARTITION BY do in a window function?","`PARTITION BY` **divides** the rows into independent groups (partitions); the\nwindow function restarts for each partition. It's the windowing analog of\n`GROUP BY`, but the rows aren't collapsed.\n\n```sql\n-- numbering restarts at 1 within each department\nSELECT name, dept_id,\n       ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn\nFROM   employees;\n```\n\nWithout `PARTITION BY`, the whole result set is one single partition.\n\nRule of thumb: `PARTITION BY` says \"compute this window function separately within\neach group.\"\n",{"id":891,"difficulty":106,"q":892,"a":893},"partition-by-vs-group-by","How does PARTITION BY differ from GROUP BY?","Both split rows into groups, but the **output shape** differs:\n\n- `GROUP BY` produces **one row per group** — it reduces the result.\n- `PARTITION BY` keeps **all rows**, computing the window function within each\n  group while leaving the detail intact.\n\n```sql\n-- 1 row per dept\nSELECT dept_id, COUNT(*) FROM employees GROUP BY dept_id;\n\n-- every row, with its dept's count attached\nSELECT name, dept_id, COUNT(*) OVER (PARTITION BY dept_id) AS dept_count\nFROM employees;\n```\n\nRule of thumb: same grouping idea, different result — `GROUP BY` collapses,\n`PARTITION BY` annotates.\n",{"id":895,"difficulty":106,"q":896,"a":897},"order-by-in-over","What is the role of ORDER BY inside the OVER clause?","`ORDER BY` inside `OVER` **orders the rows within each partition**, which matters\nfor two reasons:\n\n1. **Ranking functions** (`ROW_NUMBER`, `RANK`, `LAG`, `LEAD`) need an order to be\n   meaningful.\n2. For aggregates, adding `ORDER BY` switches the default frame to a **running**\n   (cumulative) calculation up to the current row.\n\n```sql\n-- without ORDER BY: same total for whole partition\nSUM(amount) OVER (PARTITION BY user_id)\n-- with ORDER BY: a running total up to each row\nSUM(amount) OVER (PARTITION BY user_id ORDER BY order_date)\n```\n\nIt is **independent** of the query's outer `ORDER BY`, which only sorts the final\noutput.\n\nRule of thumb: `ORDER BY` in `OVER` orders rows for the window calc (and triggers\nrunning aggregates); the outer `ORDER BY` sorts the result.\n",{"id":899,"difficulty":106,"q":900,"a":901},"running-total","How do you compute a running total with a window function?","Use an aggregate (`SUM`) with `OVER (... ORDER BY ...)`. The `ORDER BY` makes the\ndefault frame cumulative — rows from the start of the partition up to the current\nrow.\n\n```sql\nSELECT order_date, amount,\n       SUM(amount) OVER (ORDER BY order_date) AS running_total\nFROM   orders;\n```\n\nAdd `PARTITION BY` to reset the running total per group (e.g. per customer):\n\n```sql\nSUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date)\n```\n\nRule of thumb: `SUM(x) OVER (ORDER BY ...)` is the canonical running total; add\n`PARTITION BY` to restart it per group.\n",{"id":903,"difficulty":114,"q":904,"a":905},"window-aggregate-functions","Can regular aggregate functions be used as window functions?","Yes — `SUM`, `AVG`, `COUNT`, `MIN`, and `MAX` all work as window functions simply\nby adding an `OVER` clause. They then compute over the window instead of\ncollapsing rows.\n\n```sql\nSELECT name, salary, dept_id,\n       COUNT(*)   OVER (PARTITION BY dept_id) AS dept_headcount,\n       MAX(salary) OVER (PARTITION BY dept_id) AS dept_max,\n       salary - AVG(salary) OVER (PARTITION BY dept_id) AS diff_from_avg\nFROM   employees;\n```\n\nThe same function name behaves as a group aggregate (with `GROUP BY`) or a window\naggregate (with `OVER`) depending on context.\n\nRule of thumb: any aggregate becomes a window function just by adding `OVER` — no\n`GROUP BY` required.\n",{"id":907,"difficulty":106,"q":908,"a":909},"ranking-vs-aggregate-windows","What are the main categories of window functions?","Window functions fall into three families:\n\n- **Aggregate windows** — `SUM`, `AVG`, `COUNT`, `MIN`, `MAX` over a window.\n- **Ranking functions** — `ROW_NUMBER`, `RANK`, `DENSE_RANK`, `NTILE`,\n  `PERCENT_RANK`, `CUME_DIST` — position a row within its partition.\n- **Value \u002F offset functions** — `LAG`, `LEAD`, `FIRST_VALUE`, `LAST_VALUE`,\n  `NTH_VALUE` — pull a value from another row in the window.\n\n```sql\nROW_NUMBER() OVER (ORDER BY score DESC)        -- ranking\nLAG(price)   OVER (ORDER BY day)               -- offset\nAVG(price)   OVER (PARTITION BY product_id)    -- aggregate\n```\n\nRule of thumb: aggregate windows summarize, ranking functions order, offset\nfunctions reach to neighboring rows.\n",{"id":911,"difficulty":127,"q":912,"a":913},"where-window-functions-allowed","In which clauses can window functions be used?","Window functions are only allowed in the **`SELECT` list** and the **`ORDER BY`**\nclause. They are **not allowed** in `WHERE`, `GROUP BY`, or `HAVING`, because\nwindows are evaluated **after** those clauses (after grouping and filtering).\n\n```sql\n-- ILLEGAL: window function in WHERE\nSELECT name FROM employees\nWHERE ROW_NUMBER() OVER (ORDER BY salary) \u003C= 5;   -- error\n\n-- LEGAL: compute in a subquery\u002FCTE, then filter\nSELECT name FROM (\n    SELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn\n    FROM employees\n) t WHERE rn \u003C= 5;\n```\n\nRule of thumb: you can't filter on a window function directly — wrap it in a CTE\nor subquery and filter the alias.\n",{"id":915,"difficulty":127,"q":916,"a":917},"logical-processing-order","At what point are window functions evaluated in query processing?","Window functions run **late** in logical query processing — after `FROM`, `WHERE`,\n`GROUP BY`, and `HAVING`, but **before** the final `ORDER BY`, `DISTINCT`, and\n`LIMIT`. That ordering explains two facts:\n\n1. You can't reference a window function in `WHERE`\u002F`HAVING` (they run earlier).\n2. A window function operates on the rows that **survived** `WHERE`\u002F`GROUP BY`.\n\n```sql\n-- the SUM window only sees rows passing the WHERE filter\nSELECT name, SUM(salary) OVER ()\nFROM   employees\nWHERE  active = true;     -- filter applied BEFORE the window\n```\n\nRule of thumb: filtering happens first, windows next, final sort\u002Flimit last — so\nwindows see filtered rows but can't be filtered themselves.\n",{"id":919,"difficulty":106,"q":920,"a":921},"named-window","What is a named window (the WINDOW clause)?","The `WINDOW` clause lets you **define a window once** and reuse it by name across\nmultiple functions, avoiding repetition. It sits after `HAVING` and before\n`ORDER BY`.\n\n```sql\nSELECT name, dept_id, salary,\n       RANK()      OVER w AS rnk,\n       AVG(salary) OVER w AS dept_avg\nFROM   employees\nWINDOW w AS (PARTITION BY dept_id ORDER BY salary DESC);\n```\n\nBoth functions share window `w`. This is cleaner than repeating the same\n`PARTITION BY ... ORDER BY ...` in every function. (Supported in PostgreSQL, MySQL\n8+, SQL Server has limited support.)\n\nRule of thumb: define a window once in the `WINDOW` clause when several functions\nshare it.\n",{"id":923,"difficulty":106,"q":924,"a":925},"distinct-with-window","Can you use DISTINCT inside a window function?","Generally **no** — most databases do not support `COUNT(DISTINCT ...) OVER (...)`.\n`DISTINCT` aggregation isn't allowed with the `OVER` clause in standard SQL,\nPostgreSQL, and SQL Server.\n\n```sql\n-- typically ERRORS\nCOUNT(DISTINCT customer_id) OVER (PARTITION BY region)\n\n-- workaround: DENSE_RANK over the values, take the max\nSELECT region,\n       MAX(DENSE_RANK() OVER (PARTITION BY region ORDER BY customer_id))\n           OVER (PARTITION BY region) AS distinct_customers\nFROM orders;\n```\n\nCommon workarounds are a `DENSE_RANK` trick, or pre-aggregating distinct values in\na CTE.\n\nRule of thumb: `COUNT(DISTINCT)` as a window function usually isn't allowed —\npre-aggregate or use a `DENSE_RANK` workaround.\n",{"id":927,"difficulty":127,"q":928,"a":929},"window-function-performance","What are the performance considerations of window functions?","Window functions require the engine to **sort or hash** rows by the\n`PARTITION BY`\u002F`ORDER BY` keys, which is the main cost. Tips:\n\n- **Index** the partition\u002Forder columns so the engine can avoid a separate sort.\n- Each distinct window definition may add its own sort — **share windows** (named\n  `WINDOW` clause) where possible.\n- Filter rows **before** the window (`WHERE`) to shrink the input.\n- Beware huge partitions and unbounded frames (`RANGE` with peers) — they scan\n  more rows per output row.\n\nRule of thumb: window functions trade a sort for analytics — index the\npartition\u002Forder keys and minimize the number of distinct windows.\n",{"id":931,"difficulty":106,"q":932,"a":933},"window-vs-self-join","Why are window functions preferred over self-joins for analytics?","Before window functions, computing running totals, rankings, or row-to-row\ncomparisons required **correlated subqueries or self-joins**, which are verbose and\noften O(n²). Window functions express the same logic in **one pass**, more\nreadably and usually faster.\n\n```sql\n-- old self-join running total (slow, O(n^2))\nSELECT a.day, SUM(b.amount) AS rt\nFROM sales a JOIN sales b ON b.day \u003C= a.day\nGROUP BY a.day;\n\n-- window function (one pass)\nSELECT day, SUM(amount) OVER (ORDER BY day) AS rt FROM sales;\n```\n\nRule of thumb: replace self-joins\u002Fcorrelated subqueries for running totals and\nrankings with window functions — clearer and faster.\n",{"id":935,"difficulty":127,"q":936,"a":937},"filter-clause-window","How can you conditionally aggregate within a window?","Use the `FILTER (WHERE ...)` clause (PostgreSQL\u002FSQLite) or a `CASE` expression\ninside the aggregate (portable) to aggregate only rows meeting a condition within\nthe window.\n\n```sql\n-- PostgreSQL FILTER\nSELECT region,\n       COUNT(*) FILTER (WHERE status = 'paid') OVER (PARTITION BY region) AS paid\nFROM orders;\n\n-- portable CASE equivalent\nSELECT region,\n       SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END)\n           OVER (PARTITION BY region) AS paid\nFROM orders;\n```\n\nRule of thumb: use `FILTER (WHERE ...)` where supported, or `SUM(CASE WHEN ...)`\nfor a portable conditional window aggregate.\n",{"id":939,"difficulty":114,"q":940,"a":941},"empty-over-clause","What does an empty OVER() clause mean?","An empty `OVER ()` makes the window the **entire result set** — every row sees all\nthe rows. It's used to attach a grand total or overall aggregate to each row\nwithout grouping.\n\n```sql\nSELECT name, salary,\n       salary * 100.0 \u002F SUM(salary) OVER () AS pct_of_total_payroll\nFROM   employees;\n```\n\nHere `SUM(salary) OVER ()` is the company-wide payroll, repeated on every row, so\nyou can compute each person's share.\n\nRule of thumb: `OVER ()` with no partition or order = the whole result set —\nperfect for \"share of total\" calculations.\n",{"id":943,"difficulty":106,"q":944,"a":945},"combining-window-with-where","How do you filter rows based on a window function's result?","Since window functions aren't allowed in `WHERE`, compute the window value in a\n**CTE or subquery**, then filter the alias in the outer query. This is the standard\npattern for \"top-N per group\" and deduplication.\n\n```sql\n-- keep only the highest-paid employee per department\nWITH ranked AS (\n    SELECT name, dept_id, salary,\n           ROW_NUMBER() OVER (PARTITION BY dept_id\n                              ORDER BY salary DESC) AS rn\n    FROM employees\n)\nSELECT name, dept_id, salary\nFROM   ranked\nWHERE  rn = 1;\n```\n\nRule of thumb: wrap the window function in a CTE, then filter its output column —\nyou can't put it in `WHERE` directly.\n",18,{"description":104},"SQL window function interview questions — OVER, PARTITION BY, window vs GROUP BY aggregates, running totals, named windows, and where windows can be used.","sql\u002Fwindow-functions\u002Fwindow-basics","Window Function Basics","lU-Pdyj0JfKs5j1DsgcOY_HWoWio2OyoTkAgClSQdgI",{"id":953,"title":954,"body":955,"description":104,"difficulty":114,"extension":107,"framework":10,"frameworkSlug":12,"meta":959,"navigation":109,"order":30,"path":960,"questions":961,"questionsCount":1030,"related":247,"seo":1031,"seoDescription":1032,"stem":1033,"subtopic":1034,"topic":21,"topicSlug":22,"updated":328,"__hash__":1035},"qa\u002Fsql\u002Fbasics\u002Fselect-where.md","Select Where",{"type":101,"value":956,"toc":957},[],{"title":104,"searchDepth":30,"depth":30,"links":958},[],{},"\u002Fsql\u002Fbasics\u002Fselect-where",[962,966,970,974,978,982,986,990,994,998,1002,1006,1010,1014,1018,1022,1026],{"id":963,"difficulty":114,"q":964,"a":965},"what-is-select","What does a SELECT statement do?","`SELECT` **reads rows** from one or more tables and returns a result set. It has\ntwo core jobs: **projection** (which columns to return, listed after `SELECT`)\nand **source** (which table, named after `FROM`). Everything else — filtering,\ngrouping, ordering — refines that result.\n\n```sql\n-- project two columns from every row of users\nSELECT id, name\nFROM users;\n```\n\n`SELECT` is **read-only**: it never changes the data. `SELECT *` returns every\ncolumn, convenient for exploring but discouraged in production code because the\nresult silently changes when the table's columns do.\n\nRule of thumb: list the columns you actually need instead of `SELECT *`.\n",{"id":967,"difficulty":114,"q":968,"a":969},"where-clause","What does the WHERE clause do?","`WHERE` **filters rows** before they're returned, keeping only those for which the\ncondition evaluates to **`true`**. Rows that evaluate to `false` *or* `unknown`\n(NULL comparisons) are dropped.\n\n```sql\nSELECT name, age\nFROM users\nWHERE age >= 18;        -- only adults\n```\n\n`WHERE` runs **after `FROM`** (the rows exist) but **before `GROUP BY`,\n`SELECT`, and `ORDER BY`. Because it runs before `SELECT`, you generally **can't\nreference a column alias** defined in the `SELECT` list inside `WHERE`.\n\nRule of thumb: `WHERE` filters individual rows; use `HAVING` to filter groups.\n",{"id":971,"difficulty":114,"q":972,"a":973},"comparison-operators","What comparison operators can you use in WHERE?","The standard set: `=`, `\u003C>` (or `!=`) for not-equal, `\u003C`, `>`, `\u003C=`, `>=`. They\nwork on numbers, strings (lexicographic order), and dates. Combine them with the\nlogical operators `AND`, `OR`, and `NOT`.\n\n```sql\nSELECT *\nFROM orders\nWHERE total > 100 AND status \u003C> 'cancelled';\n```\n\nComparisons against `NULL` are special — `total > 100` is `unknown` when `total`\nis `NULL`, so that row is excluded. Use `IS NULL` \u002F `IS NOT NULL` for NULL tests.\n\nRule of thumb: `\u003C>` is the SQL-standard not-equal; `!=` works in most dialects but\nisn't standard.\n",{"id":975,"difficulty":106,"q":976,"a":977},"and-or-precedence","How do AND and OR precedence interact?","`AND` binds **tighter** than `OR` (like `*` vs `+` in arithmetic), so `OR` groups\nare evaluated last. Mixing them without parentheses is a classic bug that quietly\nreturns the wrong rows.\n\n```sql\n-- WRONG: parsed as  status='active' OR (status='trial' AND age > 18)\nSELECT * FROM users WHERE status = 'active' OR status = 'trial' AND age > 18;\n\n-- RIGHT: parenthesize the OR group\nSELECT * FROM users WHERE (status = 'active' OR status = 'trial') AND age > 18;\n```\n\nRule of thumb: when a `WHERE` clause mixes `AND` and `OR`, always add explicit\nparentheses — don't rely on precedence.\n",{"id":979,"difficulty":114,"q":980,"a":981},"distinct","What does SELECT DISTINCT do?","`DISTINCT` **removes duplicate rows** from the result, based on **all selected\ncolumns** together — not just the first one. Two rows are duplicates only if every\nprojected column matches.\n\n```sql\n-- unique cities\nSELECT DISTINCT city FROM users;\n\n-- unique (city, country) PAIRS, not unique cities\nSELECT DISTINCT city, country FROM users;\n```\n\n`DISTINCT` requires sorting or hashing the result to find duplicates, so it has a\ncost on large sets. If you're really testing for existence, `EXISTS` is often\ncheaper than `SELECT DISTINCT`.\n\nRule of thumb: `DISTINCT` applies to the whole row, not to one column.\n",{"id":983,"difficulty":114,"q":984,"a":985},"like-operator","How does the LIKE operator work?","`LIKE` does **pattern matching** on strings with two wildcards: `%` matches **any\nsequence** of characters (including none), and `_` matches **exactly one**\ncharacter.\n\n```sql\nSELECT * FROM users\nWHERE email LIKE '%@gmail.com'   -- ends with @gmail.com\n  AND name  LIKE 'A%'            -- starts with A\n  AND code  LIKE 'A_C';          -- A, any one char, C\n```\n\nMatching is case-sensitive in some databases (Postgres) and insensitive in others\n(MySQL default); Postgres offers `ILIKE` for case-insensitive matching. A leading\n`%` (`'%term'`) usually **can't use an index**, making it slow on big tables.\n\nRule of thumb: avoid a leading wildcard if you need the query to use an index.\n",{"id":987,"difficulty":114,"q":988,"a":989},"in-operator","What does the IN operator do?","`IN` tests whether a value **matches any item** in a list (or subquery result). It's\nshorthand for a chain of `OR` equality checks.\n\n```sql\n-- these are equivalent\nSELECT * FROM orders WHERE status IN ('shipped', 'delivered');\nSELECT * FROM orders WHERE status = 'shipped' OR status = 'delivered';\n```\n\n`IN` also accepts a subquery: `WHERE user_id IN (SELECT id FROM vips)`. Beware the\n**`NOT IN` NULL trap** — if the list\u002Fsubquery contains a `NULL`, `NOT IN` returns\nno rows at all. Prefer `NOT EXISTS` when NULLs are possible.\n\nRule of thumb: `IN` for a known list; `NOT EXISTS` instead of `NOT IN` when NULLs\nmight appear.\n",{"id":991,"difficulty":114,"q":992,"a":993},"between-operator","How does BETWEEN work, and is it inclusive?","`BETWEEN a AND b` tests whether a value falls in a range, and it is **inclusive of\nboth endpoints** — equivalent to `>= a AND \u003C= b`.\n\n```sql\nSELECT * FROM orders\nWHERE total BETWEEN 100 AND 200;   -- includes exactly 100 and 200\n```\n\nThe inclusive upper bound is a trap with **dates\u002Ftimestamps**: `BETWEEN '2026-01-01'\nAND '2026-01-31'` misses times on Jan 31 after midnight. For timestamps, prefer a\nhalf-open range: `>= '2026-01-01' AND \u003C '2026-02-01'`.\n\nRule of thumb: use `BETWEEN` for inclusive integer ranges; use `>= ... \u003C ...` for\ndate\u002Ftime ranges.\n",{"id":995,"difficulty":106,"q":996,"a":997},"is-null","How do you test for NULL values?","Use **`IS NULL`** and **`IS NOT NULL`** — never `= NULL`. In SQL, `NULL` means\n\"unknown,\" and any comparison *with* `NULL` (including `NULL = NULL`) evaluates to\n`unknown`, which `WHERE` treats as not-true.\n\n```sql\nSELECT * FROM users WHERE deleted_at IS NULL;       -- active users\nSELECT * FROM users WHERE phone = NULL;             -- BUG: returns nothing\n```\n\nThis three-valued logic (`true` \u002F `false` \u002F `unknown`) is why `NULL` rows slip\nthrough filters unexpectedly. Functions like `COALESCE(col, default)` let you treat\nNULLs as a concrete value when comparing.\n\nRule of thumb: NULL tests always use `IS NULL` \u002F `IS NOT NULL`.\n",{"id":999,"difficulty":114,"q":1000,"a":1001},"column-aliases","What are column and table aliases?","An **alias** renames a column or table for the duration of the query, using `AS`\n(optional for columns, conventional to omit for tables). Column aliases clean up\noutput and name computed expressions; table aliases shorten qualified references.\n\n```sql\nSELECT u.name AS customer,\n       o.total * 1.1 AS total_with_tax\nFROM users u\nJOIN orders o ON o.user_id = u.id;\n```\n\nBecause aliases are assigned in the `SELECT` step (which runs after `WHERE`), you\nusually **can't use a column alias in `WHERE`** — but you **can** in `ORDER BY`,\nwhich runs last.\n\nRule of thumb: alias every computed column so the result has meaningful names.\n",{"id":1003,"difficulty":127,"q":1004,"a":1005},"order-of-execution","In what logical order are the clauses of a SELECT evaluated?","Though you *write* `SELECT` first, the database evaluates clauses in this **logical\norder**: `FROM`\u002F`JOIN` → `WHERE` → `GROUP BY` → `HAVING` → `SELECT` → `DISTINCT` →\n`ORDER BY` → `LIMIT`.\n\n```sql\nSELECT user_id, COUNT(*) AS n   -- 5: projection + alias\nFROM orders                     -- 1: source rows\nWHERE total > 0                 -- 2: filter rows\nGROUP BY user_id                -- 3: group\nHAVING COUNT(*) > 5             -- 4: filter groups\nORDER BY n DESC                 -- 6: alias visible here\nLIMIT 10;                       -- 7\n```\n\nThis order explains the rules: `WHERE` can't see `SELECT` aliases (it runs first),\nbut `ORDER BY` can (it runs last), and `HAVING` can reference aggregates.\n\nRule of thumb: remember `FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT`.\n",{"id":1007,"difficulty":114,"q":1008,"a":1009},"arithmetic-expressions","Can you compute expressions in a SELECT list?","Yes — the `SELECT` list can contain **arithmetic, function calls, and concatenation**,\nnot just bare columns. The expression is computed per row.\n\n```sql\nSELECT name,\n       price * quantity        AS line_total,\n       ROUND(price * 0.2, 2)   AS tax\nFROM order_items;\n```\n\nWatch **integer division** in some databases: `5 \u002F 2` returns `2`, not `2.5`,\nunless an operand is a decimal\u002Ffloat (`5.0 \u002F 2`). Cast when you need fractional\nresults.\n\nRule of thumb: cast to a decimal type before dividing integers if you want a\nfractional answer.\n",{"id":1011,"difficulty":106,"q":1012,"a":1013},"not-operator","How does the NOT operator behave with NULLs?","`NOT` negates a condition, but under three-valued logic `NOT unknown` is still\n**`unknown`** — so negating a comparison that involves `NULL` doesn't \"flip in\" the\nNULL rows you might expect.\n\n```sql\n-- rows where status IS NULL are excluded by BOTH of these\nSELECT * FROM tasks WHERE status = 'done';\nSELECT * FROM tasks WHERE NOT (status = 'done');\n```\n\nTo include the NULLs, handle them explicitly: `WHERE status \u003C> 'done' OR status IS\nNULL`. This surprises people expecting a query and its `NOT` to partition the table.\n\nRule of thumb: when negating a condition, decide explicitly what should happen to\nNULL rows.\n",{"id":1015,"difficulty":106,"q":1016,"a":1017},"case-in-select","How do you return a conditional value in SELECT?","Use a **`CASE` expression** — SQL's if\u002Felse inside a query. It evaluates `WHEN`\nconditions top to bottom and returns the first match's value, or the `ELSE` (or\n`NULL` if no `ELSE`).\n\n```sql\nSELECT name,\n       CASE\n         WHEN age \u003C 13 THEN 'child'\n         WHEN age \u003C 20 THEN 'teen'\n         ELSE 'adult'\n       END AS age_group\nFROM users;\n```\n\n`CASE` works anywhere an expression is allowed — `SELECT`, `WHERE`, `ORDER BY`, even\ninside aggregates like `SUM(CASE WHEN ... THEN 1 ELSE 0 END)` for conditional counts.\n\nRule of thumb: reach for `CASE` whenever you need branching logic inside a query.\n",{"id":1019,"difficulty":106,"q":1020,"a":1021},"filtering-strings-case","How do you do case-insensitive string filtering?","Either normalize both sides with `LOWER()`\u002F`UPPER()`, or use a dialect feature.\nForcing the case on both sides works everywhere but can defeat a plain index.\n\n```sql\n-- portable: compare in a single case\nSELECT * FROM users WHERE LOWER(email) = LOWER('Alice@Example.com');\n\n-- Postgres shortcut for patterns\nSELECT * FROM users WHERE email ILIKE 'alice@%';\n```\n\nFor performance on large tables, store a normalized column or build a **functional\nindex** on `LOWER(email)` so the comparison stays index-friendly.\n\nRule of thumb: normalize with `LOWER()` for portability; add a functional index if\nit's a hot path.\n",{"id":1023,"difficulty":114,"q":1024,"a":1025},"limit-preview","How do you return just a few rows to preview a table?","Limit the row count: `LIMIT n` in Postgres\u002FMySQL\u002FSQLite, `FETCH FIRST n ROWS ONLY`\nin the SQL standard \u002F Oracle, and `SELECT TOP n` in SQL Server.\n\n```sql\nSELECT * FROM users LIMIT 10;                 -- Postgres\u002FMySQL\nSELECT * FROM users FETCH FIRST 10 ROWS ONLY; -- standard\nSELECT TOP 10 * FROM users;                   -- SQL Server\n```\n\nWithout an `ORDER BY`, the rows returned are **arbitrary** — the database may return\nany 10. Add `ORDER BY` for a deterministic preview.\n\nRule of thumb: `LIMIT` without `ORDER BY` gives unpredictable rows.\n",{"id":1027,"difficulty":106,"q":1028,"a":1029},"where-vs-having","What is the difference between WHERE and HAVING?","`WHERE` filters **individual rows before grouping**; `HAVING` filters **groups after\naggregation**. Because `WHERE` runs first, it **can't** reference aggregate functions\nlike `COUNT()`; `HAVING` can.\n\n```sql\nSELECT user_id, COUNT(*) AS orders\nFROM orders\nWHERE total > 0          -- drop junk rows first (per-row)\nGROUP BY user_id\nHAVING COUNT(*) > 5;     -- keep only busy users (per-group)\n```\n\nFiltering early in `WHERE` is also cheaper — fewer rows reach the grouping step.\n\nRule of thumb: filter raw rows in `WHERE`, filter aggregates in `HAVING`.\n",17,{"description":104},"SQL SELECT and WHERE interview questions — projecting columns, filtering rows, DISTINCT, LIKE, IN, BETWEEN, NULL handling and operator precedence.","sql\u002Fbasics\u002Fselect-where","SELECT & WHERE","jXdsYzvmdzmQPaNl6AcPW6mbKYqG-S5HAYiy7OjV_IY",{"id":1037,"title":1038,"body":1039,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":1043,"navigation":109,"order":30,"path":1044,"questions":1045,"questionsCount":323,"related":247,"seo":1106,"seoDescription":1107,"stem":1108,"subtopic":1038,"topic":55,"topicSlug":57,"updated":328,"__hash__":1109},"qa\u002Fsql\u002Fdml\u002Fviews.md","Views",{"type":101,"value":1040,"toc":1041},[],{"title":104,"searchDepth":30,"depth":30,"links":1042},[],{},"\u002Fsql\u002Fdml\u002Fviews",[1046,1050,1054,1058,1062,1066,1070,1074,1078,1082,1086,1090,1094,1098,1102],{"id":1047,"difficulty":114,"q":1048,"a":1049},"what-is-view","What is a view in SQL?","A **view** is a named, stored `SELECT` statement. Querying a view runs the\nunderlying query at that moment — the result is not stored (contrast with a\nmaterialized view). Views look and act like tables from the caller's\nperspective but contain no data of their own.\n\n```sql\n-- Define the view\nCREATE VIEW active_users AS\n  SELECT id, name, email\n  FROM   users\n  WHERE  deleted_at IS NULL;\n\n-- Query it exactly like a table\nSELECT * FROM active_users WHERE name ILIKE 'alice%';\n\n-- The database executes the underlying query at query time:\n-- SELECT id, name, email FROM users\n-- WHERE deleted_at IS NULL AND name ILIKE 'alice%'\n```\n\n**Rule of thumb:** use views to give a stable, simple name to a complex\nor frequently repeated query — and to hide columns or rows that callers\nshould not access.\n",{"id":1051,"difficulty":114,"q":1052,"a":1053},"create-replace-view","How do you create, replace, and drop a view?","```sql\n-- Create (fails if the view already exists)\nCREATE VIEW recent_orders AS\n  SELECT id, customer_id, total, created_at\n  FROM   orders\n  WHERE  created_at >= now() - INTERVAL '30 days';\n\n-- Replace (redefines in place — dependent GRANTs are preserved in Postgres)\nCREATE OR REPLACE VIEW recent_orders AS\n  SELECT id, customer_id, total, created_at, status\n  FROM   orders\n  WHERE  created_at >= now() - INTERVAL '30 days';\n\n-- Drop\nDROP VIEW recent_orders;\n\n-- Drop even if other views depend on it (Postgres)\nDROP VIEW recent_orders CASCADE;\n```\n\n`CREATE OR REPLACE` in Postgres requires the new definition to have the\nsame columns in the same order (you can add columns at the end, but not\nremove or reorder existing ones). SQL Server has no `OR REPLACE`; use\n`ALTER VIEW` instead.\n\n**Rule of thumb:** use `CREATE OR REPLACE VIEW` in migrations to avoid\ndropping dependent objects. Only use `DROP VIEW` when removing the view\nentirely.\n",{"id":1055,"difficulty":114,"q":1056,"a":1057},"view-use-cases","What are the main use cases for views?","1. **Simplify complex queries** — encapsulate multi-table joins so callers\n   write `SELECT * FROM order_summary` instead of a 10-line join.\n2. **Row and column security** — expose only the columns and rows a role\n   should see, without granting access to the base tables.\n3. **Stable interface over a changing schema** — rename or restructure tables\n   while keeping the view's column names unchanged for backward compatibility.\n4. **Soft-delete \u002F active-record filter** — `WHERE deleted_at IS NULL`\n   applied once in the view, not in every query.\n\n```sql\n-- Security: analysts can see revenue but not PII\nCREATE VIEW sales_summary AS\n  SELECT date_trunc('day', created_at) AS day,\n         SUM(total)                   AS revenue,\n         COUNT(*)                     AS order_count\n  FROM   orders\n  GROUP  BY 1;\n\nGRANT SELECT ON sales_summary TO analyst_role;\n-- analysts cannot SELECT from the orders table directly\n```\n\n**Rule of thumb:** a view is the right tool when multiple queries share\nthe same complex join or filter logic — it is a DRY principle applied to SQL.\n",{"id":1059,"difficulty":106,"q":1060,"a":1061},"updatable-views","Can you INSERT, UPDATE, or DELETE through a view?","Yes — a view is **updatable** if it satisfies all of these conditions:\n- Maps to exactly **one base table**.\n- Does not use `DISTINCT`, `GROUP BY`, `HAVING`, aggregates, window\n  functions, `UNION`, or set operations.\n- Does not use subqueries in the `SELECT` list.\n\n```sql\nCREATE VIEW active_products AS\n  SELECT id, name, price\n  FROM   products\n  WHERE  archived = FALSE;\n\n-- This INSERT goes through to the products table\nINSERT INTO active_products (name, price) VALUES ('Widget', 9.99);\n\n-- This UPDATE modifies the underlying products row\nUPDATE active_products SET price = 8.99 WHERE id = 1;\n\n-- Caveat: you could insert a row that then disappears from the view\n-- if archived is not set to FALSE by default. Use WITH CHECK OPTION to prevent this.\n```\n\n**Rule of thumb:** rely on updatable views only for simple single-table\nviews with a clear filter. For complex views, use `INSTEAD OF` triggers\nor handle mutations in application code on the base table.\n",{"id":1063,"difficulty":106,"q":1064,"a":1065},"with-check-option","What does WITH CHECK OPTION do on a view?","`WITH CHECK OPTION` prevents `INSERT` and `UPDATE` through the view from\nproducing rows that would no longer be visible through that view. Without it,\nyou can insert a row that immediately \"disappears\" from the view.\n\n```sql\nCREATE VIEW active_products AS\n  SELECT id, name, price, archived\n  FROM   products\n  WHERE  archived = FALSE\nWITH CHECK OPTION;\n\n-- This fails: the new row would have archived = TRUE,\n-- so it would not be visible through the view\nUPDATE active_products\nSET    archived = TRUE\nWHERE  id = 1;\n-- ERROR: new row violates check option for view \"active_products\"\n\n-- Without WITH CHECK OPTION the UPDATE would succeed and the row\n-- would silently vanish from the view's result set.\n```\n\n**Rule of thumb:** add `WITH CHECK OPTION` to every updatable view that\nfilters rows — it prevents mutations that produce invisible rows and makes\nthe view behave as a consistent, self-contained interface.\n",{"id":1067,"difficulty":106,"q":1068,"a":1069},"view-vs-cte","When should you use a view vs a CTE?","| | View | CTE |\n|---|---|---|\n| **Scope** | Persistent, reusable across sessions | Single-query, local |\n| **Access control** | Can `GRANT SELECT` on it | Cannot grant independently |\n| **Performance** | Optimizer can inline; no extra cost in Postgres | Same — inlined by default |\n| **Parameterization** | Cannot accept parameters | Cannot either (use functions) |\n| **Discoverability** | Visible in `information_schema` | Invisible outside the query |\n\n```sql\n-- Use a VIEW: reused in many places, needs access control\nCREATE VIEW daily_revenue AS\n  SELECT date_trunc('day', created_at) AS day, SUM(total) AS revenue\n  FROM orders GROUP BY 1;\n\n-- Use a CTE: decompose a single complex query for readability\nWITH base AS (\n  SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at) AS rn\n  FROM orders\n)\nSELECT * FROM base WHERE rn = 1;\n```\n\n**Rule of thumb:** use a view when you need to share, reuse, or restrict\naccess to a query across the codebase. Use a CTE when you need to\ndecompose a single query for readability — it disappears after the query runs.\n",{"id":1071,"difficulty":106,"q":1072,"a":1073},"materialized-view-vs-view","What is the difference between a view and a materialized view?","A **regular view** executes its underlying query every time it is queried —\nthe data is always current but the query cost is paid on every access.\n\nA **materialized view** stores the query result on disk like a table. Reads\nare instant (no recomputation), but the data is **stale** until explicitly\nrefreshed.\n\n```sql\n-- Regular view: always fresh, always recomputes\nCREATE VIEW monthly_sales AS\n  SELECT date_trunc('month', created_at) AS month, SUM(total) AS revenue\n  FROM orders GROUP BY 1;\n\n-- Materialized view (Postgres): fast reads, manual refresh\nCREATE MATERIALIZED VIEW monthly_sales_mv AS\n  SELECT date_trunc('month', created_at) AS month, SUM(total) AS revenue\n  FROM orders GROUP BY 1;\n\n-- Refresh (blocks reads unless CONCURRENTLY is used)\nREFRESH MATERIALIZED VIEW monthly_sales_mv;\n\n-- Non-blocking refresh (requires a unique index)\nCREATE UNIQUE INDEX ON monthly_sales_mv (month);\nREFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales_mv;\n```\n\n**Rule of thumb:** use a regular view when data must be current; use a\nmaterialized view for expensive aggregations where slightly stale data is\nacceptable (dashboards, reports). Schedule the refresh as a cron job.\n",{"id":1075,"difficulty":106,"q":1076,"a":1077},"view-performance","Do views have a performance cost compared to writing the query inline?","In **Postgres and SQL Server**, views are **inlined** by the optimizer —\nthe query planner substitutes the view definition into the outer query\nand optimizes the whole thing as a single query. There is typically no\nperformance difference between querying through a view and writing the\nequivalent SQL directly.\n\n```sql\n-- These two are optimized identically in Postgres:\nSELECT * FROM active_users WHERE name = 'Alice';\n\nSELECT * FROM (\n  SELECT id, name, email FROM users WHERE deleted_at IS NULL\n) sub WHERE name = 'Alice';\n-- Both produce the same plan: index scan on (name) with deleted_at IS NULL pushed down.\n```\n\n**Exception**: views with `DISTINCT`, `LIMIT`, subqueries in `SELECT`, or\nwindow functions can prevent full predicate pushdown — check with `EXPLAIN`.\n\n**Rule of thumb:** view indirection is free in most cases. Always use\n`EXPLAIN ANALYZE` to confirm that predicates from the outer query are\npushed inside the view's definition; if they are not, the view may force\na full scan.\n",{"id":1079,"difficulty":106,"q":1080,"a":1081},"security-view","How do views implement row-level security?","By granting `SELECT` on a view (not the base table), you restrict what\ndata each role can see. The view acts as a **security boundary**: callers\nonly see rows the view exposes.\n\n```sql\n-- Base table: orders (all customers)\n-- View: each salesperson sees only their own customer's orders\n\nCREATE VIEW my_orders AS\n  SELECT o.*\n  FROM   orders o\n  JOIN   salespeople s ON s.customer_id = o.customer_id\n  WHERE  s.username = current_user;   -- session-level user\n\n-- Grant only the view, not the table\nGRANT SELECT ON my_orders TO salesperson_role;\nREVOKE ALL ON orders FROM salesperson_role;\n```\n\nPostgres also offers **Row-Level Security (RLS)** as a more flexible\nalternative that enforces policies at the table level.\n\n**Rule of thumb:** views-as-security-boundaries work well for simple,\nrole-based access patterns. For fine-grained, data-driven access control\n(multi-tenant apps), prefer Postgres RLS — it applies even when queries\nbypass the view and hit the table directly.\n",{"id":1083,"difficulty":127,"q":1084,"a":1085},"schema-changes-view","What happens to a view when the underlying table schema changes?","Views store the query text, not the resolved columns. The behavior on\nschema change differs by database:\n\n- **Postgres**: if you `SELECT *` in the view and add a column to the\n  base table, the view does **not** automatically include the new column —\n  the `*` is expanded at view creation time. To pick up the new column,\n  you must `CREATE OR REPLACE VIEW`. If you drop a column referenced in\n  the view, querying the view raises an error.\n- **MySQL**: views are re-parsed on each access, so dropping a referenced\n  column makes the view fail at query time (not at drop time).\n- **SQL Server**: views can be refreshed with `sp_refreshview` to\n  re-resolve `*` expansions after schema changes.\n\n```sql\n-- Check for broken views in Postgres\nSELECT schemaname, viewname\nFROM   pg_views v\nWHERE  NOT EXISTS (\n  SELECT 1 FROM information_schema.view_table_usage\n  WHERE  view_name = v.viewname\n);\n\n-- Or simply try:\nSELECT * FROM problematic_view LIMIT 0;\n-- Will surface a dependency error immediately.\n```\n\n**Rule of thumb:** avoid `SELECT *` in view definitions — always list\ncolumns explicitly. Run a migration smoke-test against all views after any\ntable schema change.\n",{"id":1087,"difficulty":127,"q":1088,"a":1089},"recursive-view","Can a view be recursive?","Yes — in Postgres and SQL Server you can create a view that wraps a\nrecursive CTE, enabling callers to query hierarchical data (trees, graphs)\nwithout writing the recursion each time.\n\n```sql\n-- Postgres: recursive view for an employee org chart\nCREATE RECURSIVE VIEW org_chart (id, name, manager_id, depth, path) AS (\n  -- Anchor: top-level employees\n  SELECT id, name, manager_id, 0, ARRAY[id]\n  FROM   employees\n  WHERE  manager_id IS NULL\n\n  UNION ALL\n\n  -- Recursive: employees whose manager is already in the result\n  SELECT e.id, e.name, e.manager_id, oc.depth + 1, oc.path || e.id\n  FROM   employees e\n  JOIN   org_chart oc ON oc.id = e.manager_id\n);\n\n-- Callers query it without knowing it's recursive\nSELECT * FROM org_chart WHERE depth \u003C= 3 ORDER BY path;\n```\n\n**Rule of thumb:** wrap recursive CTEs in a view when the hierarchy query\nis reused across multiple callers. Set a depth limit inside the CTE to\nguard against infinite loops from cyclic data.\n",{"id":1091,"difficulty":127,"q":1092,"a":1093},"indexed-view-sql-server","What is an indexed view in SQL Server?","An **indexed view** (SQL Server's term for a materialized view) is a view\nwith a **clustered unique index** created on it. SQL Server physically stores\nthe result set and updates it as the underlying data changes (unlike Postgres,\nwhich requires a manual `REFRESH`).\n\n```sql\n-- Must use SCHEMABINDING to prevent base table changes from breaking the view\nCREATE VIEW dbo.daily_revenue\nWITH SCHEMABINDING AS\n  SELECT\n    CONVERT(DATE, created_at) AS day,\n    SUM(total)                AS revenue,\n    COUNT_BIG(*)              AS order_count   -- COUNT_BIG required for indexed views\n  FROM dbo.orders\n  GROUP BY CONVERT(DATE, created_at);\nGO\n\n-- Create the clustered index to materialize it\nCREATE UNIQUE CLUSTERED INDEX ix_daily_revenue_day\n  ON dbo.daily_revenue (day);\n```\n\nThe optimizer can automatically use the indexed view to satisfy matching\nqueries against the base table — even if the query doesn't mention the view.\n\n**Rule of thumb:** use indexed views in SQL Server for expensive, frequently\nqueried aggregations that can tolerate the write overhead of keeping the\nmaterialized result up to date automatically.\n",{"id":1095,"difficulty":127,"q":1096,"a":1097},"view-vs-table-function","When would you use a table-valued function instead of a view?","A **view** is a fixed query — it cannot accept parameters. A\n**table-valued function (TVF)** looks like a table to callers but accepts\narguments, making it a parameterizable view.\n\n```sql\n-- Postgres: table-valued function\nCREATE FUNCTION orders_for_customer(p_customer_id INT)\nRETURNS TABLE (id INT, total NUMERIC, created_at TIMESTAMPTZ)\nLANGUAGE SQL STABLE AS $$\n  SELECT id, total, created_at\n  FROM   orders\n  WHERE  customer_id = p_customer_id;\n$$;\n\n-- Call it like a table\nSELECT * FROM orders_for_customer(42) WHERE total > 100;\n\n-- SQL Server equivalent (inline TVF)\nCREATE FUNCTION dbo.OrdersForCustomer(@CustomerId INT)\nRETURNS TABLE AS\nRETURN (\n  SELECT id, total, created_at FROM orders WHERE customer_id = @CustomerId\n);\nSELECT * FROM dbo.OrdersForCustomer(42) WHERE total > 100;\n```\n\nThe optimizer inlines simple TVFs just like views, so they are as fast\nas writing the query directly.\n\n**Rule of thumb:** use a view when the query is the same for all callers;\nuse a table-valued function when callers need to pass a filter parameter\nthat determines which rows are returned.\n",{"id":1099,"difficulty":106,"q":1100,"a":1101},"view-dependencies","How do you find which tables and columns a view depends on?","```sql\n-- Postgres: view dependencies via information_schema\nSELECT vtu.view_name,\n       vtu.table_name,\n       vtu.column_name\nFROM   information_schema.view_column_usage vtu\nWHERE  vtu.view_name = 'active_users'\nORDER  BY vtu.table_name, vtu.column_name;\n\n-- Postgres: all objects that depend on a table (to check before DROP)\nSELECT dependent_ns.nspname AS schema,\n       dependent_view.relname AS dependent_view\nFROM   pg_depend\nJOIN   pg_rewrite ON pg_depend.objid = pg_rewrite.oid\nJOIN   pg_class dependent_view ON pg_rewrite.ev_class = dependent_view.oid\nJOIN   pg_class source_table   ON pg_depend.refobjid  = source_table.oid\nJOIN   pg_namespace dependent_ns ON dependent_ns.oid = dependent_view.relnamespace\nWHERE  source_table.relname = 'users'\n  AND  dependent_view.relname \u003C> 'users';\n\n-- SQL Server\nSELECT referencing_entity_name\nFROM   sys.dm_sql_referencing_entities('dbo.users', 'OBJECT');\n```\n\n**Rule of thumb:** always check view dependencies before dropping or\naltering a table. In Postgres, `DROP TABLE … CASCADE` will silently drop\nall dependent views — use it only when that is intentional.\n",{"id":1103,"difficulty":106,"q":1104,"a":1105},"view-schemabinding","What does SCHEMABINDING do on a view in SQL Server?","`WITH SCHEMABINDING` binds the view to the schema of the referenced base\ntables. While bound, the database prevents any change to the base tables\nthat would break the view — you cannot `DROP` a referenced column or the\ntable itself without first dropping or unbinding the view.\n\n```sql\n-- SQL Server: create a schema-bound view\nCREATE VIEW dbo.product_summary\nWITH SCHEMABINDING AS\n  SELECT p.id,\n         p.name,\n         c.name AS category\n  FROM   dbo.products p          -- must use two-part names (schema.table)\n  JOIN   dbo.categories c ON c.id = p.category_id;\n\n-- Attempt to drop a referenced column will now fail:\n-- ALTER TABLE dbo.products DROP COLUMN name;\n-- ERROR: cannot drop column because it is referenced by object 'product_summary'\n```\n\n`SCHEMABINDING` is also a **prerequisite** for creating an indexed view\n(materialized view) in SQL Server — without it, the clustered index\ncreation will be rejected.\n\n**Rule of thumb:** use `WITH SCHEMABINDING` on production views in SQL\nServer to prevent accidental schema changes from silently breaking them,\nand always use it when you plan to add a clustered index to the view.\n",{"description":104},"SQL views interview questions — creating and replacing views, updatable views, materialized views, security uses, WITH CHECK OPTION, and views vs CTEs across Postgres, MySQL, and SQL Server.","sql\u002Fdml\u002Fviews","4EK1oP4CMIjb70HVC-sAiNvcBsBI4ZONVgWiuz2oQ-k",{"id":1111,"title":1112,"body":1113,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":1117,"navigation":109,"order":30,"path":1118,"questions":1119,"questionsCount":323,"related":247,"seo":1180,"seoDescription":1181,"stem":1182,"subtopic":1183,"topic":82,"topicSlug":84,"updated":328,"__hash__":1184},"qa\u002Fsql\u002Ffunctions\u002Fdate-functions.md","Date Functions",{"type":101,"value":1114,"toc":1115},[],{"title":104,"searchDepth":30,"depth":30,"links":1116},[],{},"\u002Fsql\u002Ffunctions\u002Fdate-functions",[1120,1124,1128,1132,1136,1140,1144,1148,1152,1156,1160,1164,1168,1172,1176],{"id":1121,"difficulty":114,"q":1122,"a":1123},"current-date-time","How do you get the current date and time in SQL?","Each database provides standard and proprietary functions for the current\ntimestamp.\n\n```sql\n-- ANSI standard (all databases)\nSELECT CURRENT_DATE;        -- date only: '2026-06-20'\nSELECT CURRENT_TIME;        -- time only: '14:30:00.000000+00'\nSELECT CURRENT_TIMESTAMP;   -- date + time: '2026-06-20 14:30:00.000000+00'\n\n-- Postgres: preferred shorthands\nSELECT now();               -- same as CURRENT_TIMESTAMP (tz-aware)\nSELECT CURRENT_DATE;\nSELECT CURRENT_TIME;\n\n-- MySQL\nSELECT NOW();               -- '2026-06-20 14:30:00'\nSELECT CURDATE();           -- '2026-06-20'\nSELECT CURTIME();           -- '14:30:00'\nSELECT UTC_TIMESTAMP();     -- always UTC\n\n-- SQL Server\nSELECT GETDATE();           -- local server time\nSELECT GETUTCDATE();        -- UTC\nSELECT SYSDATETIMEOFFSET(); -- with timezone offset\n```\n\n**Rule of thumb:** always store and compare timestamps in UTC. In Postgres,\nuse `now()` (returns `TIMESTAMPTZ` in UTC) rather than `LOCALTIMESTAMP`\n(returns naive timestamp in the server timezone).\n",{"id":1125,"difficulty":114,"q":1126,"a":1127},"date-arithmetic","How do you add or subtract time from a date?","Date arithmetic is done with intervals or database-specific functions.\n\n```sql\n-- Postgres: interval arithmetic\nSELECT now() + INTERVAL '7 days';             -- 7 days from now\nSELECT now() - INTERVAL '1 month';            -- 1 month ago\nSELECT '2026-06-20'::DATE + INTERVAL '1 year 3 months';\nSELECT '2026-06-20'::DATE - '2026-01-01'::DATE;  -- → 170 (days as integer)\n\n-- MySQL: DATE_ADD \u002F DATE_SUB\nSELECT DATE_ADD(NOW(), INTERVAL 7 DAY);\nSELECT DATE_SUB(NOW(), INTERVAL 1 MONTH);\nSELECT DATEDIFF('2026-06-20', '2026-01-01');  -- → 170\n\n-- SQL Server: DATEADD \u002F DATEDIFF\nSELECT DATEADD(day, 7, GETDATE());\nSELECT DATEADD(month, -1, GETDATE());\nSELECT DATEDIFF(day, '2026-01-01', '2026-06-20');  -- → 170\n```\n\n**Rule of thumb:** in Postgres, prefer the `+`\u002F`-` operator with `INTERVAL`\nliterals — it is readable and handles month-end edge cases correctly\n(`'2026-01-31'::DATE + INTERVAL '1 month'` → `'2026-02-28'`).\n",{"id":1129,"difficulty":106,"q":1130,"a":1131},"date-trunc","What does DATE_TRUNC do and how is it used for grouping?","`DATE_TRUNC(unit, timestamp)` truncates a timestamp to the specified\nprecision — zeroing out all smaller units. This is the standard way to\ngroup time-series data by day, week, month, etc.\n\n```sql\n-- Postgres DATE_TRUNC\nSELECT DATE_TRUNC('month', now());           -- → '2026-06-01 00:00:00+00'\nSELECT DATE_TRUNC('week',  now());           -- → '2026-06-16 00:00:00+00' (Monday)\nSELECT DATE_TRUNC('hour',  now());           -- → '2026-06-20 14:00:00+00'\n\n-- Group orders by month\nSELECT DATE_TRUNC('month', created_at) AS month,\n       SUM(total)                      AS revenue,\n       COUNT(*)                        AS orders\nFROM   orders\nGROUP  BY 1\nORDER  BY 1;\n\n-- MySQL equivalent: DATE_FORMAT\nSELECT DATE_FORMAT(created_at, '%Y-%m-01') AS month,\n       SUM(total) AS revenue\nFROM   orders\nGROUP  BY 1;\n\n-- SQL Server: DATETRUNC (SQL Server 2022+) or DATEFROMPARTS workaround\nSELECT DATEFROMPARTS(YEAR(created_at), MONTH(created_at), 1) AS month,\n       SUM(total) AS revenue\nFROM   orders\nGROUP  BY DATEFROMPARTS(YEAR(created_at), MONTH(created_at), 1);\n```\n\n**Rule of thumb:** `DATE_TRUNC` is the idiomatic Postgres way to bucket\ntime-series data. Pair it with an index on the timestamp column for\nefficient range scans per bucket.\n",{"id":1133,"difficulty":114,"q":1134,"a":1135},"extract","How do you extract a specific part of a date (year, month, day)?","`EXTRACT` (standard SQL) and database-specific functions pull individual\ncomponents out of a date or timestamp.\n\n```sql\n-- ANSI EXTRACT (all databases)\nSELECT EXTRACT(YEAR  FROM now());   -- → 2026\nSELECT EXTRACT(MONTH FROM now());   -- → 6\nSELECT EXTRACT(DAY   FROM now());   -- → 20\nSELECT EXTRACT(DOW   FROM now());   -- → 6 (0=Sunday … 6=Saturday, Postgres)\nSELECT EXTRACT(WEEK  FROM now());   -- → ISO week number\n\n-- Postgres shorthand\nSELECT DATE_PART('year', now());    -- equivalent to EXTRACT\n\n-- MySQL\nSELECT YEAR(now());    -- → 2026\nSELECT MONTH(now());   -- → 6\nSELECT DAY(now());     -- → 20\nSELECT DAYOFWEEK(now()); -- 1=Sunday … 7=Saturday\n\n-- SQL Server\nSELECT YEAR(GETDATE());\nSELECT DATEPART(WEEKDAY, GETDATE());\n```\n\n**Rule of thumb:** use `EXTRACT` for portable code. In Postgres, `DATE_PART`\nis equivalent but returns `DOUBLE PRECISION` — use `EXTRACT` when you need\nan integer result for arithmetic.\n",{"id":1137,"difficulty":106,"q":1138,"a":1139},"age-datediff","How do you calculate the difference between two dates?","```sql\n-- Postgres: AGE() returns a human-readable interval\nSELECT AGE('2026-06-20', '1993-03-15');\n-- → '33 years 3 mons 5 days'\n\nSELECT AGE(now(), birth_date) FROM users;\n-- → age as interval; use EXTRACT to get the years component\nSELECT EXTRACT(YEAR FROM AGE(now(), birth_date)) AS age_years FROM users;\n\n-- Postgres: simple subtraction → interval (days)\nSELECT '2026-06-20'::DATE - '2026-01-01'::DATE;  -- → 170 (integer days)\n\n-- MySQL\nSELECT DATEDIFF('2026-06-20', '2026-01-01');      -- → 170 (days)\nSELECT TIMESTAMPDIFF(YEAR, birth_date, NOW())  AS age FROM users;\nSELECT TIMESTAMPDIFF(MONTH, '2026-01-01', '2026-06-20');  -- → 5 months\n\n-- SQL Server\nSELECT DATEDIFF(day,  '2026-01-01', '2026-06-20');  -- → 170\nSELECT DATEDIFF(year, birth_date, GETDATE()) AS age FROM users;\n```\n\n**Rule of thumb:** use `TIMESTAMPDIFF` (MySQL) or `DATEDIFF` (SQL Server)\nfor simple numeric differences. In Postgres, subtract dates directly for\ninteger days; use `AGE()` when you need a human-readable breakdown.\n",{"id":1141,"difficulty":106,"q":1142,"a":1143},"timezone-handling","How do you handle timezones when storing and querying timestamps?","The fundamental rule: **store in UTC, display in the user's timezone**.\nUse timezone-aware column types so conversions are unambiguous.\n\n```sql\n-- Postgres: TIMESTAMPTZ stores in UTC; AT TIME ZONE converts for display\nCREATE TABLE events (\n  id         BIGSERIAL PRIMARY KEY,\n  occurred   TIMESTAMPTZ NOT NULL DEFAULT now()\n);\n\n-- Convert to a specific timezone for display\nSELECT occurred AT TIME ZONE 'America\u002FNew_York' AS local_time FROM events;\nSELECT occurred AT TIME ZONE 'Asia\u002FKolkata'     AS ist_time   FROM events;\n\n-- Set the session timezone (affects display, not storage)\nSET TIME ZONE 'America\u002FNew_York';\nSELECT now();  -- displayed in ET, still stored as UTC internally\n\n-- MySQL: CONVERT_TZ\nSELECT CONVERT_TZ(occurred, 'UTC', 'America\u002FNew_York') FROM events;\n\n-- SQL Server: AT TIME ZONE (SQL Server 2016+)\nSELECT occurred AT TIME ZONE 'UTC' AT TIME ZONE 'Eastern Standard Time' FROM events;\n```\n\n**Rule of thumb:** always use `TIMESTAMPTZ` (Postgres) or an equivalent\nUTC-storing type. Never store local times without an explicit timezone\noffset — daylight-saving transitions will corrupt historical data.\n",{"id":1145,"difficulty":114,"q":1146,"a":1147},"date-format","How do you format a date as a string?","```sql\n-- Postgres: TO_CHAR\nSELECT TO_CHAR(now(), 'YYYY-MM-DD');            -- → '2026-06-20'\nSELECT TO_CHAR(now(), 'DD\u002FMM\u002FYYYY');            -- → '20\u002F06\u002F2026'\nSELECT TO_CHAR(now(), 'Month DD, YYYY');        -- → 'June     20, 2026'\nSELECT TO_CHAR(now(), 'FMMonth DD, YYYY');      -- → 'June 20, 2026' (FM removes padding)\nSELECT TO_CHAR(now(), 'YYYY-MM-DD HH24:MI:SS'); -- → '2026-06-20 14:30:00'\n\n-- MySQL: DATE_FORMAT\nSELECT DATE_FORMAT(NOW(), '%Y-%m-%d');          -- → '2026-06-20'\nSELECT DATE_FORMAT(NOW(), '%d\u002F%m\u002F%Y %H:%i:%s'); -- → '20\u002F06\u002F2026 14:30:00'\n\n-- SQL Server: FORMAT \u002F CONVERT\nSELECT FORMAT(GETDATE(), 'yyyy-MM-dd');         -- → '2026-06-20'\nSELECT CONVERT(VARCHAR, GETDATE(), 23);         -- → '2026-06-20' (style 23)\n```\n\n**Rule of thumb:** format dates as strings in the application layer when\npossible, since formatting functions are not portable and locale handling\nvaries. Use `TO_CHAR` in SQL only for exports or reports generated entirely\nin the database.\n",{"id":1149,"difficulty":127,"q":1150,"a":1151},"generate-series","How do you generate a date series to fill gaps in time-series data?","`GENERATE_SERIES` (Postgres) produces a set of dates or timestamps, making\nit easy to build a complete date spine and `LEFT JOIN` to fill in zero\nvalues for missing days.\n\n```sql\n-- Postgres: daily date spine for June 2026\nSELECT generate_series(\n  '2026-06-01'::DATE,\n  '2026-06-30'::DATE,\n  INTERVAL '1 day'\n) AS day;\n\n-- Fill gaps in daily revenue (days with no orders show 0)\nSELECT day::DATE,\n       COALESCE(SUM(o.total), 0) AS revenue\nFROM generate_series('2026-06-01'::DATE, '2026-06-30'::DATE, '1 day') AS day\nLEFT JOIN orders o ON o.created_at::DATE = day::DATE\nGROUP BY 1\nORDER BY 1;\n```\n\nMySQL and SQL Server do not have `GENERATE_SERIES` natively; the common\nworkaround is a **numbers\u002Fcalendar table** or a recursive CTE.\n\n**Rule of thumb:** always use a date spine (`GENERATE_SERIES` or a calendar\ntable) when charting time-series data — direct `GROUP BY` on timestamps\nsilently omits days with no records, creating misleading gaps in charts.\n",{"id":1153,"difficulty":106,"q":1154,"a":1155},"interval-type","What is the INTERVAL type and how do you use it?","An **interval** represents a duration (not a point in time). Postgres has\na rich `INTERVAL` type; MySQL and SQL Server use numeric + unit keywords\ninstead.\n\n```sql\n-- Postgres INTERVAL literals\nSELECT INTERVAL '1 year 2 months 3 days';\nSELECT INTERVAL '90 minutes';\nSELECT INTERVAL '2 hours 30 minutes';\n\n-- Arithmetic with intervals\nSELECT now() - INTERVAL '1 week';\nSELECT now() + INTERVAL '90 days';\n\n-- Storing intervals (Postgres)\nCREATE TABLE subscriptions (\n  id         SERIAL PRIMARY KEY,\n  started_at TIMESTAMPTZ NOT NULL,\n  duration   INTERVAL NOT NULL DEFAULT '1 month'\n);\nSELECT started_at + duration AS expires_at FROM subscriptions;\n\n-- Extract numeric parts from an interval\nSELECT EXTRACT(DAYS FROM INTERVAL '1 year 3 days');    -- → 3 (only the days part)\nSELECT EXTRACT(EPOCH FROM INTERVAL '2 hours');         -- → 7200 (total seconds)\n```\n\n**Rule of thumb:** use `EXTRACT(EPOCH FROM interval_col)` to convert an\ninterval to a total number of seconds for arithmetic or comparison — it is\nthe most reliable way to compare durations of mixed units.\n",{"id":1157,"difficulty":127,"q":1158,"a":1159},"now-vs-clock-timestamp","What is the difference between now() and clock_timestamp() in Postgres?","In Postgres, `now()` (and `CURRENT_TIMESTAMP`) returns the timestamp at\nthe **start of the current transaction** — it does not change within the\nsame transaction. `clock_timestamp()` returns the **actual wall-clock time**\nat the moment it is called.\n\n```sql\nBEGIN;\n  SELECT now();             -- → '2026-06-20 14:30:00.000'\n  PERFORM pg_sleep(2);\n  SELECT now();             -- → '2026-06-20 14:30:00.000'  (same! start of txn)\n  SELECT clock_timestamp(); -- → '2026-06-20 14:30:02.005'  (actual time now)\nCOMMIT;\n```\n\nThis matters for:\n- **Audit columns**: `DEFAULT now()` is fine — all rows in the same\n  transaction get the same \"created_at\", which is expected.\n- **Benchmarking \u002F profiling**: use `clock_timestamp()` to measure elapsed\n  time within a transaction.\n- **Rate limiting**: if checking \"last request time\" within a long transaction,\n  use `clock_timestamp()` to get the real current time.\n\n**Rule of thumb:** use `now()` for audit timestamps (consistent within a\ntransaction is a feature, not a bug). Use `clock_timestamp()` when you need\nthe actual wall time, such as for elapsed-time calculations or timeouts\nchecked inside a transaction.\n",{"id":1161,"difficulty":127,"q":1162,"a":1163},"date-trunc-index","How do you efficiently query by time period without slowing down writes?","Range queries on timestamp columns are common (`WHERE created_at >= X AND\ncreated_at \u003C Y`). The key to making them fast is a **B-tree index on the\nraw timestamp**, combined with using range predicates rather than wrapping\nthe column in a function.\n\n```sql\n-- GOOD: range predicate — index on created_at is used\nCREATE INDEX idx_orders_created ON orders (created_at);\n\nSELECT * FROM orders\nWHERE created_at >= '2026-06-01'\n  AND created_at \u003C  '2026-07-01';\n\n-- BAD: function applied to the column — index NOT used\nSELECT * FROM orders WHERE DATE_TRUNC('month', created_at) = '2026-06-01';\n\n-- Fix: use a range instead of DATE_TRUNC in the WHERE clause\nSELECT * FROM orders\nWHERE created_at >= DATE_TRUNC('month', now())\n  AND created_at \u003C  DATE_TRUNC('month', now()) + INTERVAL '1 month';\n-- OR: functional index on DATE_TRUNC\nCREATE INDEX idx_orders_month ON orders (DATE_TRUNC('month', created_at));\n```\n\n**Rule of thumb:** never wrap a timestamp column in a function in a\n`WHERE` clause if an index on the raw column exists. Rewrite the predicate\nas a range (`col >= start AND col \u003C end`) so the index is used.\n",{"id":1165,"difficulty":127,"q":1166,"a":1167},"lag-lead-dates","How do you calculate time between consecutive events per user?","Use the `LAG` window function to access the previous row's timestamp within\na partition, then subtract to get the gap.\n\n```sql\n-- Time between consecutive logins per user\nSELECT user_id,\n       logged_in_at,\n       LAG(logged_in_at) OVER (\n         PARTITION BY user_id\n         ORDER BY logged_in_at\n       ) AS prev_login,\n       logged_in_at - LAG(logged_in_at) OVER (\n         PARTITION BY user_id\n         ORDER BY logged_in_at\n       ) AS gap\nFROM   user_logins\nORDER  BY user_id, logged_in_at;\n\n-- Average gap between sessions per user\nSELECT user_id,\n       AVG(gap) AS avg_session_gap\nFROM (\n  SELECT user_id,\n         logged_in_at - LAG(logged_in_at) OVER (\n           PARTITION BY user_id ORDER BY logged_in_at\n         ) AS gap\n  FROM user_logins\n) sub\nWHERE gap IS NOT NULL  -- first row per user has no previous\nGROUP BY user_id;\n```\n\n**Rule of thumb:** `LAG` with a date subtraction is the standard SQL\npattern for inter-event time calculations. The first event per partition\nreturns NULL for `LAG` — always filter or handle that case explicitly.\n",{"id":1169,"difficulty":114,"q":1170,"a":1171},"make-date","How do you construct a date from year, month, and day components?","```sql\n-- Postgres: MAKE_DATE \u002F MAKE_TIMESTAMP\nSELECT MAKE_DATE(2026, 6, 20);                        -- → '2026-06-20'\nSELECT MAKE_TIMESTAMP(2026, 6, 20, 14, 30, 0);        -- → '2026-06-20 14:30:00'\nSELECT MAKE_TIMESTAMPTZ(2026, 6, 20, 14, 30, 0, 'UTC'); -- tz-aware\n\n-- MySQL: MAKEDATE \u002F MAKETIME \u002F STR_TO_DATE\nSELECT MAKEDATE(2026, 171);            -- → '2026-06-20' (day 171 of 2026)\nSELECT STR_TO_DATE('20\u002F06\u002F2026', '%d\u002F%m\u002F%Y');  -- → '2026-06-20'\n\n-- SQL Server: DATEFROMPARTS \u002F DATETIMEFROMPARTS\nSELECT DATEFROMPARTS(2026, 6, 20);    -- → '2026-06-20'\nSELECT DATETIMEFROMPARTS(2026, 6, 20, 14, 30, 0, 0);\n\n-- Practical: reconstruct a date from separate year\u002Fmonth columns\nSELECT MAKE_DATE(order_year, order_month, 1) AS period_start\nFROM   monthly_summaries;\n```\n\n**Rule of thumb:** use `MAKE_DATE` (Postgres) or `DATEFROMPARTS` (SQL\nServer) rather than string concatenation + casting — they are cleaner and\ncorrectly reject invalid dates (e.g., February 30) instead of silently\ncoercing them.\n",{"id":1173,"difficulty":114,"q":1174,"a":1175},"age-function","How do you calculate a person's age or the time elapsed between two dates?","```sql\n-- Postgres: AGE() returns an interval\nSELECT AGE(birthdate)                   AS age       FROM persons;  -- age from today\nSELECT AGE(NOW(), birthdate)            AS age       FROM persons;  -- explicit end date\nSELECT EXTRACT(YEAR FROM AGE(birthdate)) AS age_years FROM persons;\n\n-- MySQL: TIMESTAMPDIFF\nSELECT TIMESTAMPDIFF(YEAR, birthdate, CURDATE()) AS age_years FROM persons;\nSELECT TIMESTAMPDIFF(MONTH, start_date, end_date) AS months_elapsed FROM projects;\n\n-- SQL Server: DATEDIFF\nSELECT DATEDIFF(YEAR,  birthdate, GETDATE()) AS age_years   FROM persons;\nSELECT DATEDIFF(DAY,   start_dt,  end_dt)    AS days_open   FROM tickets;\nSELECT DATEDIFF(MONTH, start_dt,  end_dt)    AS months_open FROM projects;\n```\n\nNote: `DATEDIFF(YEAR, …)` in SQL Server counts year boundaries crossed,\nnot full years lived — a birthday on December 31 would add 1 year on\nJanuary 1 of the next year. Use `TIMESTAMPDIFF` (MySQL) or `AGE` (Postgres)\nfor more accurate age calculation.\n\n**Rule of thumb:** for precise age-in-years, use `TIMESTAMPDIFF(YEAR, …)`\n(MySQL) or `EXTRACT(YEAR FROM AGE(…))` (Postgres). Never subtract year\nnumbers directly — they don't account for the month\u002Fday portion.\n",{"id":1177,"difficulty":127,"q":1178,"a":1179},"date-range-overlap","How do you find rows where two date ranges overlap?","Two date ranges `[A_start, A_end]` and `[B_start, B_end]` overlap when\n`A_start \u003C B_end AND A_end > B_start`. This is the standard overlap\npredicate — all seven overlap cases satisfy it.\n\n```sql\n-- Find all bookings that overlap with a requested range\nSELECT *\nFROM   bookings\nWHERE  start_date \u003C '2026-07-01'   -- booking starts before requested end\n  AND  end_date   > '2026-06-15';  -- booking ends after requested start\n\n-- Find overlapping pairs within the same table (self-join)\nSELECT a.id AS booking_a, b.id AS booking_b\nFROM   bookings a\nJOIN   bookings b ON a.id \u003C b.id          -- avoid duplicate pairs\n       AND a.start_date \u003C b.end_date\n       AND a.end_date   > b.start_date;\n\n-- Postgres: use the built-in OVERLAPS operator (inclusive bounds)\nSELECT *\nFROM   bookings\nWHERE  (start_date, end_date) OVERLAPS ('2026-06-15'::DATE, '2026-07-01'::DATE);\n```\n\n**Rule of thumb:** the overlap condition is `A.start \u003C B.end AND A.end >\nB.start`. Make sure to use `\u003C` \u002F `>` (exclusive) vs `\u003C=` \u002F `>=` (inclusive)\nconsistently with your data model — half-open intervals `[start, end)` are\ngenerally easiest to reason about.\n",{"description":104},"SQL date and time function interview questions — NOW, CURRENT_DATE, DATE_TRUNC, EXTRACT, date arithmetic, interval types, timezone handling, and formatting across Postgres, MySQL, and SQL Server.","sql\u002Ffunctions\u002Fdate-functions","Date & Time Functions","W1E2_wYmKlkz2SVIgqmzxIqqc-UnOTEwm6BLBI9gDlU",{"id":1186,"title":1187,"body":1188,"description":104,"difficulty":127,"extension":107,"framework":10,"frameworkSlug":12,"meta":1192,"navigation":109,"order":30,"path":1193,"questions":1194,"questionsCount":323,"related":247,"seo":1255,"seoDescription":1256,"stem":1257,"subtopic":1187,"topic":73,"topicSlug":75,"updated":328,"__hash__":1258},"qa\u002Fsql\u002Fperformance\u002Fquery-optimization.md","Query Optimization",{"type":101,"value":1189,"toc":1190},[],{"title":104,"searchDepth":30,"depth":30,"links":1191},[],{},"\u002Fsql\u002Fperformance\u002Fquery-optimization",[1195,1199,1203,1207,1211,1215,1219,1223,1227,1231,1235,1239,1243,1247,1251],{"id":1196,"difficulty":114,"q":1197,"a":1198},"what-is-explain","What does EXPLAIN do and how do you read its output?","`EXPLAIN` shows the **query execution plan** the database chose — which\nindexes are used, what join strategies are applied, and the estimated cost\nand row counts at each step. `EXPLAIN ANALYZE` actually runs the query and\nadds real timings and row counts alongside the estimates.\n\n```sql\n-- Postgres: estimated plan only (does not run the query)\nEXPLAIN SELECT * FROM orders WHERE customer_id = 42;\n\n-- Postgres: run the query, show actual vs estimated\nEXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)\n  SELECT * FROM orders WHERE customer_id = 42;\n\n-- MySQL\nEXPLAIN SELECT * FROM orders WHERE customer_id = 42;\nEXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;  -- MySQL 8.0.18+\n\n-- SQL Server\nSET STATISTICS IO ON;\nSET SHOWPLAN_TEXT ON;\nSELECT * FROM orders WHERE customer_id = 42;\n```\n\nKey fields to read in Postgres output:\n- `Seq Scan` \u002F `Index Scan` \u002F `Index Only Scan` — access method\n- `rows=N` — estimated rows; compare to `actual rows=N` in ANALYZE\n- `cost=start..total` — planner's cost units (not wall-clock ms)\n- `Buffers: shared hit=N read=N` — cache hits vs disk reads\n\n**Rule of thumb:** always use `EXPLAIN ANALYZE` (not just `EXPLAIN`) on\nslow queries — large discrepancies between estimated and actual rows reveal\nstale statistics, which is the root cause of most bad query plans.\n",{"id":1200,"difficulty":106,"q":1201,"a":1202},"seq-scan-optimization","You see a Seq Scan on a large table — what do you check first?","A sequential scan on a large table is the most common source of slow queries.\nWork through this checklist:\n\n1. **Is there an index on the WHERE column?** If not, create one.\n2. **Is the index being used?** Check `EXPLAIN` — if not, the planner may\n   think a seq scan is cheaper (see next steps).\n3. **Are statistics fresh?** Run `ANALYZE table_name` and re-check the plan.\n4. **Is selectivity high?** An index on a low-cardinality column (e.g. a\n   boolean) won't help if 80 % of rows match.\n5. **Is there a function in the WHERE clause?** `WHERE lower(email) = '...'`\n   won't use an index on `email` — create a functional index.\n6. **Is `random_page_cost` tuned for SSDs?** Default is 4.0 (HDD); set to\n   1.1 for SSD storage to make index scans more attractive.\n\n```sql\n-- Run ANALYZE to refresh statistics\nANALYZE orders;\n\n-- Tune cost parameters for SSD (session level)\nSET random_page_cost = 1.1;\nSET effective_cache_size = '4GB';\n\nEXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;\n```\n\n**Rule of thumb:** stale statistics cause the planner to choose bad plans\nmore often than missing indexes. Always `ANALYZE` before concluding that an\nindex isn't working.\n",{"id":1204,"difficulty":106,"q":1205,"a":1206},"n-plus-one","What is the N+1 query problem and how do you fix it in SQL?","The **N+1 problem** occurs when code fetches a list of N parent records and\nthen issues one additional query per record to load its children — totalling\nN+1 round-trips instead of 1.\n\n```sql\n-- N+1: 1 query for customers + N queries for orders (one per customer)\nSELECT id, name FROM customers WHERE active = TRUE;  -- → N rows\n-- then in a loop:\nSELECT * FROM orders WHERE customer_id = ?;  -- N times\n\n-- Fix: JOIN or subquery — 1 round-trip total\nSELECT c.id, c.name, o.id AS order_id, o.total\nFROM   customers c\nLEFT JOIN orders o ON o.customer_id = c.id\nWHERE  c.active = TRUE;\n\n-- Or use a lateral join to get the latest order per customer\nSELECT c.id, c.name, latest.total\nFROM   customers c\nLEFT JOIN LATERAL (\n  SELECT total FROM orders WHERE customer_id = c.id\n  ORDER BY created_at DESC LIMIT 1\n) latest ON TRUE\nWHERE  c.active = TRUE;\n```\n\n**Rule of thumb:** if application logs show many nearly-identical queries\ndiffering only by a primary key value, you have an N+1 problem. Fix it with\na `JOIN`, an `IN (...)` batch fetch, or a lateral join.\n",{"id":1208,"difficulty":127,"q":1209,"a":1210},"join-strategies","What are the three join strategies and when does the optimizer choose each?","The query planner chooses from three physical join algorithms:\n\n1. **Nested Loop Join** — for each row in the outer table, scan the inner\n   table (optionally using an index). O(N × M) worst case. Best when the\n   outer set is very small or an index on the inner table makes the inner\n   scan cheap.\n2. **Hash Join** — build a hash table from the smaller side, then probe it\n   with each row from the larger side. O(N + M). Best for large unsorted\n   inputs with no useful index.\n3. **Merge Join** — sort both sides by the join key, then merge in O(N + M).\n   Best when both sides are already sorted (e.g., both have B-tree index\n   scans in join-key order).\n\n```sql\n-- Force a specific strategy (Postgres — for testing only)\nSET enable_hashjoin = off;\nSET enable_mergejoin = off;\nEXPLAIN SELECT * FROM orders o JOIN customers c ON c.id = o.customer_id;\n-- Now forced to Nested Loop\n```\n\n**Rule of thumb:** trust the optimizer to pick the right join strategy.\nAdd an index on the join column of the larger table to make nested-loop\njoins efficient, and ensure statistics are current so the planner estimates\nset sizes correctly.\n",{"id":1212,"difficulty":106,"q":1213,"a":1214},"statistics-analyze","What are table statistics and how do you update them?","**Statistics** are metadata the query planner uses to estimate how many\nrows a filter will return — column histograms, most common values, null\nfractions, and row counts. Stale statistics lead to bad execution plans.\n\n```sql\n-- Postgres: update statistics for a table (fast, non-blocking)\nANALYZE orders;\n\n-- Update all tables in the database\nANALYZE;\n\n-- Check when statistics were last collected\nSELECT relname, last_analyze, last_autoanalyze, n_live_tup, n_dead_tup\nFROM   pg_stat_user_tables\nWHERE  relname = 'orders';\n\n-- Increase statistics target for a column with non-uniform distribution\nALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;\n-- Default is 100 (100 histogram buckets); raise for skewed data.\nANALYZE orders;\n```\n\nPostgres runs `autovacuum` which calls `ANALYZE` automatically, but it may\nlag behind on high-churn tables.\n\n**Rule of thumb:** if a query plan suddenly gets worse after a large data\nload or bulk delete, run `ANALYZE table_name` immediately. For columns with\nvery skewed distributions (e.g., `status` with 99 % of rows as `active`),\nraise the statistics target.\n",{"id":1216,"difficulty":106,"q":1217,"a":1218},"pagination-offset","Why is OFFSET-based pagination slow and what is the alternative?","`LIMIT N OFFSET M` forces the database to scan and discard the first M rows\nbefore returning N. As M grows, the query gets slower — page 1 000 of 50\nresults means scanning 50 000 rows.\n\n```sql\n-- Slow: O(offset) scan for every page\nSELECT * FROM posts ORDER BY created_at DESC LIMIT 50 OFFSET 10000;\n\n-- Fast: keyset (cursor) pagination — O(log n) via index\n-- Page 1:\nSELECT id, title, created_at FROM posts\nORDER BY created_at DESC, id DESC\nLIMIT 50;\n-- → last row: created_at = '2026-06-01 12:00:00', id = 9834\n\n-- Next page: use the last row's values as the cursor\nSELECT id, title, created_at FROM posts\nWHERE (created_at, id) \u003C ('2026-06-01 12:00:00', 9834)\nORDER BY created_at DESC, id DESC\nLIMIT 50;\n-- Index on (created_at DESC, id DESC) makes this O(log n)\n```\n\n**Rule of thumb:** use keyset (cursor) pagination for any list that can\ngrow large. Reserve `OFFSET` only for small datasets (\u003C 10 000 rows) or\nwhere jumping to arbitrary page numbers is a hard requirement.\n",{"id":1220,"difficulty":106,"q":1221,"a":1222},"subquery-vs-join-perf","When does a subquery perform worse than a JOIN and how do you fix it?","In most modern databases (Postgres, MySQL 8+, SQL Server), the optimizer\ncan rewrite correlated subqueries as joins automatically. However,\n**correlated subqueries** that reference the outer query run once per outer\nrow — O(N) inner scans — and may not be rewritten:\n\n```sql\n-- Correlated subquery: potentially O(N) inner scans\nSELECT id, total,\n  (SELECT name FROM customers WHERE id = o.customer_id) AS customer_name\nFROM orders o;\n\n-- Equivalent JOIN: one pass, one lookup\nSELECT o.id, o.total, c.name AS customer_name\nFROM   orders o\nJOIN   customers c ON c.id = o.customer_id;\n\n-- Check with EXPLAIN: if you see \"SubPlan\" or \"Nested Loop\" with inner\n-- rows = outer rows, the subquery is not being optimised away.\n```\n\nNon-correlated subqueries in `IN (SELECT …)` are usually optimised to a\nsemi-join and are equivalent in performance to a `JOIN`.\n\n**Rule of thumb:** if `EXPLAIN` shows a `SubPlan` node repeating for every\nouter row, rewrite it as a `JOIN` or a lateral join. For `EXISTS` \u002F `IN`\nchecks, the optimizer almost always handles them correctly on its own.\n",{"id":1224,"difficulty":106,"q":1225,"a":1226},"query-rewrite-or","How can OR in a WHERE clause hurt performance and how do you fix it?","An `OR` across different columns often prevents the optimizer from using a\nsingle index efficiently because no single index covers both branches.\n\n```sql\n-- This may cause a Seq Scan even if both columns are indexed separately\nSELECT * FROM users WHERE email = 'alice@example.com' OR phone = '555-1234';\n\n-- Fix 1: UNION ALL (each branch uses its own index)\nSELECT * FROM users WHERE email = 'alice@example.com'\nUNION ALL\nSELECT * FROM users WHERE phone = '555-1234' AND email \u003C> 'alice@example.com';\n\n-- Fix 2: Postgres bitmap index scan (auto-handles OR over different indexes)\n-- Postgres may already do this — check EXPLAIN for \"BitmapAnd\"\u002F\"BitmapOr\"\n\n-- Fix 3: denormalise into a single search column \u002F use full-text search\n```\n\n**Rule of thumb:** use `EXPLAIN` to check whether an `OR` query is doing\na full scan. If so, split it into a `UNION ALL` so each branch can exploit\nits own index independently.\n",{"id":1228,"difficulty":114,"q":1229,"a":1230},"avoid-select-star","Why should you avoid SELECT * in production queries?","`SELECT *` fetches every column from the table, including large `TEXT`,\n`BYTEA`, or `JSONB` columns that the query may not need. This increases\nnetwork transfer, memory use, and makes covering-index optimisations\nimpossible.\n\n```sql\n-- BAD: fetches all 40 columns including a 1 MB blob column\nSELECT * FROM products WHERE category_id = 5;\n\n-- GOOD: only the columns the caller actually needs\nSELECT id, name, price, stock FROM products WHERE category_id = 5;\n```\n\nAdditional reasons to avoid `SELECT *`:\n- Adding a column to the table silently changes what the query returns,\n  breaking application code that expects a fixed schema.\n- Prevents the planner from choosing an index-only scan.\n- Makes query intent unclear to future readers.\n\n**Rule of thumb:** always list columns explicitly in production queries.\n`SELECT *` is fine for ad-hoc exploration but should never appear in\napplication code or stored procedures.\n",{"id":1232,"difficulty":127,"q":1233,"a":1234},"cte-performance-opt","Can CTEs hurt query performance and how?","In **Postgres pre-12**, CTEs were **optimisation fences** — the planner\nmaterialised (executed and stored) the CTE result before running the outer\nquery, preventing predicates from being pushed inside. This could cause full\nscans on the CTE that a plain subquery would have avoided.\n\n```sql\n-- Postgres \u003C 12: this CTE is materialised; the WHERE id = 42 is applied\n-- AFTER the full scan of orders inside the CTE\nWITH recent AS (\n  SELECT * FROM orders WHERE created_at > now() - INTERVAL '30 days'\n)\nSELECT * FROM recent WHERE id = 42;\n\n-- Postgres 12+: CTEs are inlined by default (no longer an optimisation fence)\n-- Force materialisation when you WANT the fence (e.g., to prevent repeated execution):\nWITH recent AS MATERIALIZED (\n  SELECT * FROM orders WHERE created_at > now() - INTERVAL '30 days'\n)\nSELECT * FROM recent WHERE id = 42;\n```\n\nMySQL and SQL Server have always inlined non-recursive CTEs.\n\n**Rule of thumb:** on Postgres 12+, CTEs behave like subqueries and are not\na performance concern. On older Postgres, replace CTEs with subqueries in\nthe `FROM` clause if `EXPLAIN` shows the CTE is preventing index use.\n",{"id":1236,"difficulty":127,"q":1237,"a":1238},"vacuum-and-bloat","What is table bloat and how does VACUUM address it?","In Postgres's MVCC model, `UPDATE` and `DELETE` do not overwrite rows —\nthey mark old row versions as dead and write new versions. **Dead tuples**\naccumulate until `VACUUM` reclaims their space. Without regular vacuuming,\nthe table grows (bloat), sequential scans slow down, and indexes carry dead\nentries.\n\n```sql\n-- Check dead tuple accumulation\nSELECT relname, n_live_tup, n_dead_tup,\n       round(n_dead_tup::numeric \u002F NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1)\n         AS dead_pct,\n       last_vacuum, last_autovacuum\nFROM   pg_stat_user_tables\nORDER  BY n_dead_tup DESC\nLIMIT  10;\n\n-- Manual vacuum (reclaims space for reuse; does not shrink the file)\nVACUUM orders;\n\n-- Full vacuum (reclaims and shrinks the file; locks the table)\nVACUUM FULL orders;\n\n-- Rebuild indexes alongside\nVACUUM (ANALYZE, VERBOSE) orders;\n```\n\n**Rule of thumb:** rely on `autovacuum` for routine maintenance. Run\n`VACUUM ANALYZE` manually after large bulk deletes or updates. Only use\n`VACUUM FULL` on heavily bloated tables during maintenance windows — it\nacquires an exclusive lock.\n",{"id":1240,"difficulty":106,"q":1241,"a":1242},"query-timeout","How do you set and enforce query timeouts in SQL?","Long-running queries can exhaust connection pools, hold locks, and degrade\nthe whole database. Most databases allow a maximum query duration:\n\n```sql\n-- Postgres: statement timeout (raises error if exceeded)\nSET statement_timeout = '5s';        -- session level\nSET LOCAL statement_timeout = '2s';  -- transaction level only\n\n-- Postgres: lock wait timeout (fail fast rather than queue behind a blocker)\nSET lock_timeout = '500ms';\n\n-- MySQL: per-query timeout (optimizer hint)\nSELECT \u002F*+ MAX_EXECUTION_TIME(3000) *\u002F * FROM orders WHERE customer_id = 42;\n\n-- SQL Server: per-connection timeout (set by client driver)\n-- In T-SQL:\nSET QUERY_GOVERNOR_COST_LIMIT 1000;  -- abort if estimated cost > 1000\n```\n\nIn application code, always set a reasonable timeout at the connection\nor query level — never leave it at infinity (the default).\n\n**Rule of thumb:** set `statement_timeout` to a value appropriate for the\ncontext: 5–30 s for OLTP API queries; longer for batch jobs. Use\n`lock_timeout` separately to fail fast on lock contention rather than\nqueuing indefinitely.\n",{"id":1244,"difficulty":106,"q":1245,"a":1246},"missing-index-detection","How do you proactively find slow queries and missing indexes in production?","```sql\n-- Postgres: pg_stat_statements (requires extension) — top slow queries\nCREATE EXTENSION IF NOT EXISTS pg_stat_statements;\n\nSELECT query,\n       calls,\n       round(total_exec_time::numeric \u002F calls, 2) AS avg_ms,\n       round(total_exec_time::numeric, 0)         AS total_ms,\n       rows \u002F calls                               AS avg_rows\nFROM   pg_stat_statements\nORDER  BY total_exec_time DESC\nLIMIT  20;\n\n-- Postgres: pg_stat_user_tables — tables with heavy sequential scans\nSELECT relname, seq_scan, seq_tup_read,\n       idx_scan,\n       round(seq_scan::numeric \u002F NULLIF(seq_scan + idx_scan, 0) * 100, 1) AS seq_pct\nFROM   pg_stat_user_tables\nWHERE  seq_scan > 0\nORDER  BY seq_tup_read DESC\nLIMIT  10;\n\n-- MySQL: slow query log\nSET GLOBAL slow_query_log = 'ON';\nSET GLOBAL long_query_time = 1;  -- log queries > 1 second\n```\n\n**Rule of thumb:** enable `pg_stat_statements` (Postgres) or the slow\nquery log (MySQL\u002FSQL Server) in production from day one. Review the top-10\nqueries by total time weekly — optimising one heavily-called query often\nhas more impact than tuning ten rarely-run ones.\n",{"id":1248,"difficulty":127,"q":1249,"a":1250},"partition-pruning","What is partition pruning and how does it improve query performance?","**Partition pruning** is the optimizer's ability to skip entire table\npartitions that cannot contain rows matching the query's `WHERE` clause.\nInstead of scanning all partitions, it reads only the ones that could\nhave relevant data.\n\n```sql\n-- Partitioned table (Postgres)\nCREATE TABLE events (\n  id         BIGINT NOT NULL,\n  created_at DATE   NOT NULL,\n  payload    JSONB\n) PARTITION BY RANGE (created_at);\n\nCREATE TABLE events_2026_q1 PARTITION OF events\n  FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');\nCREATE TABLE events_2026_q2 PARTITION OF events\n  FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');\n\n-- Query: optimizer prunes events_2026_q1 — only scans events_2026_q2\nEXPLAIN SELECT * FROM events WHERE created_at >= '2026-04-01';\n-- Plan shows: Seq Scan on events_2026_q2 (events_2026_q1 not mentioned)\n```\n\nPruning only works when the `WHERE` clause filters on the partition key\nwith a constant (not a function call or a join column).\n\n**Rule of thumb:** for pruning to work, the `WHERE` predicate on the\npartition key must be a literal or a parameter — not a function like\n`DATE_TRUNC(...)`. Check `EXPLAIN` to confirm partitions are being pruned.\n",{"id":1252,"difficulty":106,"q":1253,"a":1254},"index-only-scan","What is an Index Only Scan and how do you enable it?","An **Index Only Scan** reads all needed data directly from the index without\ntouching the main table (heap). It is the fastest read path — no random\nheap I\u002FO at all.\n\nFor an Index Only Scan to be chosen:\n1. All columns in `SELECT`, `WHERE`, and `ORDER BY` must be in the index.\n2. The table's **visibility map** must show that pages are all-visible\n   (recently vacuumed). If many pages are not all-visible, Postgres falls\n   back to heap fetches.\n\n```sql\n-- Query: SELECT email FROM users WHERE created_at > '2026-01-01'\n-- Index needed: (created_at) INCLUDE (email)\nCREATE INDEX idx_users_created_email\n  ON users (created_at DESC)\n  INCLUDE (email);\n\nEXPLAIN (ANALYZE, BUFFERS)\n  SELECT email FROM users WHERE created_at > '2026-01-01';\n-- → Index Only Scan using idx_users_created_email (Heap Fetches: 0)\n\n-- If Heap Fetches > 0, run VACUUM to update the visibility map:\nVACUUM users;\n```\n\n**Rule of thumb:** convert an `Index Scan` to an `Index Only Scan` by\nadding the `SELECT`ed columns to the index via `INCLUDE`. Then ensure the\ntable is regularly vacuumed so the visibility map stays current.\n",{"description":104},"SQL query optimization interview questions — EXPLAIN ANALYZE, execution plans, statistics, join strategies, N+1 problem, pagination patterns, and tuning tips across Postgres, MySQL, and SQL Server.","sql\u002Fperformance\u002Fquery-optimization","avdAD0y5v9vjkEh0IgAcWaowCwAcxFPScWt0qrM-2gw",{"id":1260,"title":1261,"body":1262,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":1266,"navigation":109,"order":30,"path":1267,"questions":1268,"questionsCount":323,"related":247,"seo":1329,"seoDescription":1330,"stem":1331,"subtopic":1332,"topic":47,"topicSlug":48,"updated":328,"__hash__":1333},"qa\u002Fsql\u002Fschema\u002Fddl.md","Ddl",{"type":101,"value":1263,"toc":1264},[],{"title":104,"searchDepth":30,"depth":30,"links":1265},[],{},"\u002Fsql\u002Fschema\u002Fddl",[1269,1273,1277,1281,1285,1289,1293,1297,1301,1305,1309,1313,1317,1321,1325],{"id":1270,"difficulty":114,"q":1271,"a":1272},"what-is-ddl","What is DDL and how does it differ from DML?","**DDL** (Data Definition Language) defines and modifies the *structure* of\ndatabase objects — tables, indexes, views, sequences, schemas. The core\nstatements are `CREATE`, `ALTER`, `DROP`, and `TRUNCATE`.\n\n**DML** (Data Manipulation Language) operates on the *data inside* those\nobjects — `SELECT`, `INSERT`, `UPDATE`, `DELETE`.\n\n```sql\n-- DDL: define structure\nCREATE TABLE products (id SERIAL PRIMARY KEY, name TEXT NOT NULL);\nALTER TABLE products ADD COLUMN price NUMERIC(10,2);\nDROP TABLE products;\n\n-- DML: manipulate data\nINSERT INTO products (name, price) VALUES ('Widget', 9.99);\nUPDATE products SET price = 8.99 WHERE id = 1;\nDELETE FROM products WHERE id = 1;\n```\n\n**Rule of thumb:** DDL changes persist after a transaction commits (and in\nmost databases auto-commit immediately). DML changes can be rolled back\nwithin a transaction.\n",{"id":1274,"difficulty":114,"q":1275,"a":1276},"create-table-basics","What are the essential parts of a CREATE TABLE statement?","A `CREATE TABLE` statement names the table and declares each **column** with\nits type and optional constraints. Common additions: a primary key, `NOT NULL`\nmarkers, defaults, and foreign keys.\n\n```sql\nCREATE TABLE orders (\n  id          INT  GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n  customer_id INT  NOT NULL REFERENCES customers(id),\n  status      TEXT NOT NULL DEFAULT 'pending',\n  total       NUMERIC(10, 2) NOT NULL CHECK (total >= 0),\n  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()\n);\n```\n\nKey parts:\n- **Column name + type** — required for every column.\n- **`NOT NULL`** — rejects `NULL` inserts; always add it unless `NULL` is\n  semantically meaningful.\n- **`DEFAULT`** — value used when the column is omitted in `INSERT`.\n- **`PRIMARY KEY`** — unique non-null identifier; creates an index\n  automatically.\n- **`REFERENCES`** — foreign-key constraint linking to a parent table.\n\n**Rule of thumb:** be explicit about `NOT NULL` and `DEFAULT` on every\ncolumn — relying on implicit NULLability makes the schema ambiguous.\n",{"id":1278,"difficulty":106,"q":1279,"a":1280},"alter-table","What can you do with ALTER TABLE and what are the risks?","`ALTER TABLE` modifies an existing table: add\u002Fdrop\u002Frename columns, change\ntypes, add\u002Fdrop constraints, rename the table itself.\n\n```sql\n-- Add a new column with a default (safe — no full rewrite in Postgres 11+)\nALTER TABLE orders ADD COLUMN shipped_at TIMESTAMPTZ;\n\n-- Drop a column (removes data permanently)\nALTER TABLE orders DROP COLUMN legacy_field;\n\n-- Rename a column\nALTER TABLE orders RENAME COLUMN status TO order_status;\n\n-- Change a type (may rewrite the whole table and lock it)\nALTER TABLE orders ALTER COLUMN total TYPE NUMERIC(14, 4);\n\n-- Add a NOT NULL constraint (safe only if the column has no NULLs)\nALTER TABLE orders ALTER COLUMN shipped_at SET NOT NULL;\n```\n\n**Risks:**\n- Type changes and adding `NOT NULL` to an existing column may **lock the\n  table** and run a full rewrite on large tables.\n- Dropping a column is irreversible without a backup.\n\n**Rule of thumb:** for large production tables, test `ALTER TABLE` on a copy\nfirst; use `CONCURRENTLY` options or online schema change tools (pt-online-schema-change,\ngh-ost) where supported.\n",{"id":1282,"difficulty":114,"q":1283,"a":1284},"drop-vs-truncate","What is the difference between DROP TABLE and TRUNCATE?","- **`DROP TABLE`** removes the table *definition and all its data* permanently.\n  The table no longer exists.\n- **`TRUNCATE`** removes **all rows** from a table but leaves the table\n  structure intact. It resets identity columns\u002Fsequences (in most databases)\n  and is much faster than `DELETE FROM table` because it deallocates data pages\n  rather than deleting row by row.\n\n```sql\n-- Remove the table entirely\nDROP TABLE staging_data;\n\n-- Empty the table but keep its structure\nTRUNCATE TABLE staging_data;\n\n-- TRUNCATE is faster than DELETE for clearing a whole table\n-- DELETE FROM staging_data;  -- slow: generates undo\u002Fredo for every row\n```\n\nIn Postgres, `TRUNCATE` can be inside a transaction and rolled back; in\nMySQL, `TRUNCATE` is DDL and implicitly commits.\n\n**Rule of thumb:** use `TRUNCATE` to reset a staging\u002Ftemp table between\nruns; use `DROP TABLE` only when you no longer need the schema.\n",{"id":1286,"difficulty":106,"q":1287,"a":1288},"schemas-namespacing","What is a SQL schema (namespace) and why use one?","A **schema** is a named namespace inside a database that groups related\ntables, views, functions, and other objects. In Postgres the default schema\nis `public`; SQL Server uses `dbo`.\n\n```sql\n-- Create a schema\nCREATE SCHEMA reporting;\n\n-- Create a table inside it\nCREATE TABLE reporting.monthly_revenue (\n  month  DATE PRIMARY KEY,\n  total  NUMERIC(14, 2) NOT NULL\n);\n\n-- Search path (Postgres): sets which schemas to look in without qualifying\nSET search_path TO reporting, public;\nSELECT * FROM monthly_revenue;  -- resolves to reporting.monthly_revenue\n```\n\nBenefits:\n- Logical grouping (`app`, `reporting`, `audit`).\n- Separate permissions per schema.\n- Avoids name collisions between different subsystems.\n\n**Rule of thumb:** use schemas to separate concerns within a single\ndatabase (e.g., `app` for application tables, `etl` for staging tables,\n`audit` for history).\n",{"id":1290,"difficulty":106,"q":1291,"a":1292},"sequences","What is a sequence and how does it relate to auto-increment columns?","A **sequence** is a database object that generates a monotonically\nincreasing series of integers. Auto-increment columns (`SERIAL`, `IDENTITY`)\nare backed by a sequence internally.\n\n```sql\n-- Explicit sequence (Postgres)\nCREATE SEQUENCE order_id_seq START 1000 INCREMENT 1;\n\n-- Use it as a default\nCREATE TABLE orders (\n  id INT DEFAULT nextval('order_id_seq') PRIMARY KEY\n);\n\n-- Advance and read the current value\nSELECT nextval('order_id_seq');   -- 1000 (first call)\nSELECT currval('order_id_seq');   -- 1000 (same session, same call)\nSELECT lastval();                 -- most recent nextval in this session\n```\n\nSequences are **non-transactional by design** — a rolled-back transaction\nstill consumes a number, so gaps in IDs are normal and expected.\n\n**Rule of thumb:** never rely on sequence values being gap-free; use them\nonly as unique opaque identifiers, not as row-count proxies.\n",{"id":1294,"difficulty":106,"q":1295,"a":1296},"create-index-ddl","What is the DDL to create and drop an index, and when does CREATE INDEX CONCURRENTLY matter?","```sql\n-- Standard index (locks the table for writes during build)\nCREATE INDEX idx_orders_customer ON orders (customer_id);\n\n-- Unique index\nCREATE UNIQUE INDEX idx_users_email ON users (email);\n\n-- Composite index\nCREATE INDEX idx_orders_status_date ON orders (status, created_at DESC);\n\n-- Postgres: build without blocking writes (slower, but safe in production)\nCREATE INDEX CONCURRENTLY idx_orders_total ON orders (total);\n\n-- Drop\nDROP INDEX idx_orders_customer;\nDROP INDEX CONCURRENTLY idx_orders_total;  -- Postgres\n```\n\n`CREATE INDEX CONCURRENTLY` builds the index in multiple passes while the\ntable stays writable, avoiding the write lock. The trade-off: it takes longer\nand cannot be run inside a transaction block.\n\n**Rule of thumb:** always use `CONCURRENTLY` when adding indexes to large\nlive production tables to avoid blocking reads and writes.\n",{"id":1298,"difficulty":106,"q":1299,"a":1300},"temp-tables","What are temporary tables and when should you use them?","A **temporary table** exists only for the duration of a session (or\ntransaction, depending on the database and declaration). It is invisible to\nother sessions and is automatically dropped when the session ends.\n\n```sql\n-- Postgres \u002F SQL Server\nCREATE TEMP TABLE staging_orders AS\n  SELECT * FROM orders WHERE status = 'pending';\n\n-- Manipulate without touching the real table\nUPDATE staging_orders SET status = 'processed';\n\n-- MySQL uses a slightly different syntax\nCREATE TEMPORARY TABLE staging_orders SELECT * FROM orders WHERE status = 'pending';\n```\n\nUse cases:\n- Breaking a complex multi-step ETL into readable steps.\n- Storing an intermediate result that is referenced multiple times (avoids\n  rerunning a slow subquery).\n- Isolating work from other sessions in long-running scripts.\n\n**Rule of thumb:** prefer CTEs for single-query decomposition; use temp\ntables when the intermediate result must be indexed, updated, or reused\nacross multiple queries.\n",{"id":1302,"difficulty":127,"q":1303,"a":1304},"ddl-in-transactions","Can DDL statements be run inside a transaction and rolled back?","It depends on the database:\n\n- **Postgres**: DDL is fully transactional. `CREATE TABLE`, `ALTER TABLE`,\n  `DROP TABLE` inside a `BEGIN` block can be rolled back if the transaction\n  aborts.\n- **MySQL**: DDL auto-commits. Any active transaction is committed before\n  the DDL executes; there is no way to roll back a `CREATE TABLE` in MySQL.\n- **SQL Server**: DDL is transactional (like Postgres).\n\n```sql\n-- Postgres: safe rollback of schema change\nBEGIN;\n  ALTER TABLE orders ADD COLUMN notes TEXT;\n  -- something fails...\nROLLBACK;\n-- The column was never added\n```\n\n**Rule of thumb:** in Postgres and SQL Server, wrap schema migrations in\ntransactions to get atomic, all-or-nothing deploys. In MySQL, deploy each\nDDL statement separately and use idempotent migration scripts (`IF NOT EXISTS`).\n",{"id":1306,"difficulty":114,"q":1307,"a":1308},"if-not-exists","What does CREATE TABLE IF NOT EXISTS do and why is it useful in migrations?","`CREATE TABLE IF NOT EXISTS` creates the table only if no table with that\nname already exists in the current schema. If the table already exists, the\nstatement succeeds silently (no error, no data changed).\n\n```sql\n-- Safe to run multiple times — won't fail on re-run\nCREATE TABLE IF NOT EXISTS audit_log (\n  id         BIGSERIAL PRIMARY KEY,\n  event      TEXT NOT NULL,\n  created_at TIMESTAMPTZ NOT NULL DEFAULT now()\n);\n```\n\nSimilarly: `DROP TABLE IF EXISTS`, `CREATE INDEX IF NOT EXISTS` (Postgres 9.5+).\n\n**Rule of thumb:** use `IF NOT EXISTS` \u002F `IF EXISTS` in every migration\nscript to make migrations **idempotent** — safe to re-run after a partial\nfailure without manual cleanup.\n",{"id":1310,"difficulty":114,"q":1311,"a":1312},"rename-table","How do you rename a table in SQL?","```sql\n-- Postgres \u002F MySQL\nALTER TABLE old_name RENAME TO new_name;\n\n-- SQL Server (stored procedure)\nEXEC sp_rename 'old_name', 'new_name';\n```\n\nRenaming a table does **not** automatically update views, stored procedures,\nor application code that reference the old name. Views in Postgres\nbecome invalid; in SQL Server they may silently continue working via the\ninternal object ID until they are recompiled.\n\n**Rule of thumb:** after renaming a table, search for all references to the\nold name in views, functions, application code, and ORM models and update\nthem in the same migration.\n",{"id":1314,"difficulty":127,"q":1315,"a":1316},"generated-columns","What are generated (computed) columns?","A **generated column** is a column whose value is automatically computed\nfrom other columns. The expression is evaluated by the database, not the\napplication. There are two variants:\n- **`STORED`** — the computed value is physically stored and updated on\n  each write (can be indexed).\n- **`VIRTUAL`** — computed on read, not stored (MySQL and SQL Server support this).\n\n```sql\n-- Postgres (STORED only)\nCREATE TABLE rectangles (\n  width  NUMERIC NOT NULL,\n  height NUMERIC NOT NULL,\n  area   NUMERIC GENERATED ALWAYS AS (width * height) STORED\n);\n\n-- MySQL (VIRTUAL — no storage cost)\nCREATE TABLE rectangles (\n  width  DECIMAL(10,2) NOT NULL,\n  height DECIMAL(10,2) NOT NULL,\n  area   DECIMAL(10,2) AS (width * height) VIRTUAL\n);\n```\n\n**Rule of thumb:** use generated columns for values always derivable from\nother columns (area, full name, tax amount) to keep the value consistent and\navoid application-level bugs from forgetting to update the derived field.\n",{"id":1318,"difficulty":127,"q":1319,"a":1320},"partitioning-ddl","How do you create a partitioned table in Postgres?","**Table partitioning** splits a logically single table into multiple\nphysical storage chunks (partitions) based on a column value. The database\nroutes rows transparently and can prune irrelevant partitions from queries.\n\n```sql\n-- Declarative partitioning (Postgres 10+)\nCREATE TABLE events (\n  id         BIGINT NOT NULL,\n  created_at DATE   NOT NULL,\n  payload    JSONB\n) PARTITION BY RANGE (created_at);\n\n-- Create child partitions (one per quarter)\nCREATE TABLE events_2026_q1\n  PARTITION OF events\n  FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');\n\nCREATE TABLE events_2026_q2\n  PARTITION OF events\n  FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');\n\n-- Queries automatically exclude irrelevant partitions\nSELECT * FROM events WHERE created_at >= '2026-04-01';\n-- Only scans events_2026_q2\n```\n\n**Rule of thumb:** partition very large tables (100 M+ rows) on the column\nmost commonly used in range filters (usually a timestamp). Add indexes on\neach partition individually.\n",{"id":1322,"difficulty":106,"q":1323,"a":1324},"view-ddl","How do you create and replace a view?","A **view** is a named, stored `SELECT` statement. Querying a view executes\nthe underlying query at runtime. Views simplify complex joins, restrict\ncolumn access, or present a stable interface over a changing schema.\n\n```sql\n-- Create a view\nCREATE VIEW active_customers AS\n  SELECT id, name, email\n  FROM customers\n  WHERE deleted_at IS NULL;\n\n-- Query it like a table\nSELECT * FROM active_customers WHERE name ILIKE 'smith%';\n\n-- Replace (redefine) without dropping\nCREATE OR REPLACE VIEW active_customers AS\n  SELECT id, name, email, tier\n  FROM customers\n  WHERE deleted_at IS NULL;\n\n-- Remove\nDROP VIEW active_customers;\n```\n\n**Rule of thumb:** use `CREATE OR REPLACE VIEW` in migrations so\ndependent objects (other views, grants) are preserved. Only `DROP VIEW` when\nyou are removing the view entirely.\n",{"id":1326,"difficulty":127,"q":1327,"a":1328},"materialized-view","What is a materialized view and how does it differ from a regular view?","A **materialized view** (Postgres, Oracle, SQL Server as \"indexed view\") is\na view whose results are **stored on disk** like a table. This makes reads\nfast but the data is stale until explicitly refreshed.\n\n```sql\n-- Create\nCREATE MATERIALIZED VIEW monthly_sales AS\n  SELECT date_trunc('month', created_at) AS month,\n         SUM(total)                      AS revenue\n  FROM orders\n  GROUP BY 1;\n\n-- Refresh (blocks reads in Postgres by default)\nREFRESH MATERIALIZED VIEW monthly_sales;\n\n-- Non-blocking refresh (requires a unique index)\nCREATE UNIQUE INDEX ON monthly_sales (month);\nREFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales;\n```\n\n**Rule of thumb:** use a materialized view for expensive aggregations or\nreports that can tolerate slightly stale data. Schedule `REFRESH` in a\nbackground job; use `CONCURRENTLY` on large views to avoid read downtime.\n",{"description":104},"SQL DDL interview questions — CREATE TABLE, ALTER TABLE, DROP, TRUNCATE, sequences, schemas, and safe migration patterns across Postgres, MySQL, and SQL Server.","sql\u002Fschema\u002Fddl","DDL — Creating & Altering Tables","oV0M-AEXqYY_Ji5SQb-iVOo4R4fkWqua1zr1W0KhBnk",{"id":1335,"title":1336,"body":1337,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":1341,"navigation":109,"order":30,"path":1342,"questions":1343,"questionsCount":323,"related":247,"seo":1404,"seoDescription":1405,"stem":1406,"subtopic":1407,"topic":91,"topicSlug":93,"updated":328,"__hash__":1408},"qa\u002Fsql\u002Fsecurity\u002Fsql-injection.md","Sql Injection",{"type":101,"value":1338,"toc":1339},[],{"title":104,"searchDepth":30,"depth":30,"links":1340},[],{},"\u002Fsql\u002Fsecurity\u002Fsql-injection",[1344,1348,1352,1356,1360,1364,1368,1372,1376,1380,1384,1388,1392,1396,1400],{"id":1345,"difficulty":114,"q":1346,"a":1347},"what-is-sql-injection","What is SQL injection?","**SQL injection** (SQLi) is an attack where an adversary inserts malicious\nSQL syntax into user-supplied input that is concatenated directly into a\nquery. The database executes the injected SQL as if it were written by the\ndeveloper.\n\n```python\n# Vulnerable: string concatenation with user input\nusername = request.get(\"username\")   # attacker provides: ' OR '1'='1\nquery = \"SELECT * FROM users WHERE username = '\" + username + \"'\"\n# Resulting SQL: SELECT * FROM users WHERE username = '' OR '1'='1'\n# → returns ALL users! The attacker is logged in without credentials.\n```\n\nConsequences range from data theft (reading all rows), authentication\nbypass, data destruction (`DROP TABLE`), to remote code execution via\ndatabase functions (`xp_cmdshell` in SQL Server).\n\n**Rule of thumb:** SQL injection is consistently in OWASP's Top 10 list\nof critical web application vulnerabilities and is 100 % preventable with\nparameterised queries. Never concatenate user input into SQL strings.\n",{"id":1349,"difficulty":114,"q":1350,"a":1351},"parameterised-queries","What are parameterised queries (prepared statements) and why do they prevent injection?","A **parameterised query** separates the SQL structure from the data values.\nThe query is sent to the database with **placeholders**, and the driver\nsends the data values separately. The database always treats the values as\ndata — never as SQL syntax.\n\n```python\n# Python + psycopg2 (Postgres) — SAFE\ncur.execute(\n    \"SELECT * FROM users WHERE username = %s AND password_hash = %s\",\n    (username, password_hash)   # values sent separately, never interpolated\n)\n# Even if username = \"' OR '1'='1\", it is treated as a literal string,\n# not SQL. The query finds no user named \"' OR '1'='1\".\n\n# Node.js + pg — SAFE\nconst result = await client.query(\n    'SELECT * FROM users WHERE id = $1',\n    [userId]\n);\n\n# Java + JDBC — SAFE\nPreparedStatement stmt = conn.prepareStatement(\n    \"SELECT * FROM orders WHERE customer_id = ?\");\nstmt.setInt(1, customerId);\n```\n\n**Rule of thumb:** **always use parameterised queries** (also called\nprepared statements) for any query that includes user-supplied data. This\nis the single most effective prevention against SQL injection.\n",{"id":1353,"difficulty":106,"q":1354,"a":1355},"orm-safety","Are ORM queries safe from SQL injection by default?","Most ORM frameworks (SQLAlchemy, Django ORM, ActiveRecord, Hibernate)\nuse parameterised queries by default, making their standard query API\ninjection-safe. However, raw SQL escape hatches in ORMs can re-introduce\nthe vulnerability.\n\n```python\n# Django ORM — SAFE (uses parameterised queries internally)\nUser.objects.filter(username=username)\n\n# Django raw() — UNSAFE if you concatenate input\nUser.objects.raw(f\"SELECT * FROM users WHERE username = '{username}'\")\n\n# Django raw() — SAFE with params\nUser.objects.raw(\"SELECT * FROM users WHERE username = %s\", [username])\n\n# SQLAlchemy — SAFE\nsession.execute(select(User).where(User.username == username))\n\n# SQLAlchemy text() — UNSAFE if you concatenate\nsession.execute(text(f\"SELECT * FROM users WHERE username = '{username}'\"))\n\n# SQLAlchemy text() — SAFE with bindparam\nsession.execute(text(\"SELECT * FROM users WHERE username = :u\"), {\"u\": username})\n```\n\n**Rule of thumb:** use the ORM's type-safe query API wherever possible.\nWhen you must write raw SQL, always use parameterised bindings — never\nf-strings or string concatenation.\n",{"id":1357,"difficulty":106,"q":1358,"a":1359},"injection-types","What are the main types of SQL injection attacks?","1. **In-band SQLi** — data is extracted through the same channel as the\n   attack (most common). Includes error-based (reading error messages) and\n   union-based (appending `UNION SELECT` to leak data).\n2. **Blind SQLi** — the application does not return data but the attacker\n   infers information from behaviour:\n   - **Boolean-based**: send a true vs false condition; observe response\n     differences.\n   - **Time-based**: use `pg_sleep()` or `SLEEP()` to cause a delay if a\n     condition is true.\n3. **Out-of-band SQLi** — data is exfiltrated via a different channel\n   (DNS lookup, HTTP request) using database features like `UTL_HTTP`\n   (Oracle) or `xp_cmdshell` (SQL Server).\n\n```sql\n-- Union-based example (attacker appends):\n-- Original: SELECT name FROM products WHERE id = 1\n-- Injected: id = 1 UNION SELECT password FROM users--\n-- Result: returns product name AND user passwords\n\n-- Time-based blind example:\n-- id = 1; IF (SELECT COUNT(*) FROM users WHERE username='admin') > 0\n--           BEGIN WAITFOR DELAY '0:0:5' END--\n-- If the response is delayed 5 s, an 'admin' user exists.\n```\n\n**Rule of thumb:** parameterised queries prevent all forms of in-band and\nmost out-of-band injection. Separately, disable dangerous stored procedures\n(`xp_cmdshell`, `UTL_HTTP`) unless explicitly required.\n",{"id":1361,"difficulty":127,"q":1362,"a":1363},"stored-procedure-injection","Can stored procedures be vulnerable to SQL injection?","Yes — stored procedures that build dynamic SQL internally via string\nconcatenation are still vulnerable. The parameter is safe from injection\nat the call site, but the SQL built inside the procedure is not.\n\n```sql\n-- SQL Server stored procedure — UNSAFE (dynamic SQL with concatenation)\nCREATE PROCEDURE SearchProducts @SearchTerm NVARCHAR(100)\nAS\nBEGIN\n  EXEC('SELECT * FROM products WHERE name LIKE ''%' + @SearchTerm + '%''')\n  -- Attacker passes: '; DROP TABLE products; --\n  -- Becomes: SELECT * FROM products WHERE name LIKE '%'; DROP TABLE products; --%'\nEND;\n\n-- SAFE: use sp_executesql with parameters\nCREATE PROCEDURE SearchProducts @SearchTerm NVARCHAR(100)\nAS\nBEGIN\n  DECLARE @sql NVARCHAR(500) = N'SELECT * FROM products WHERE name LIKE @term';\n  EXEC sp_executesql @sql, N'@term NVARCHAR(102)', @term = '%' + @SearchTerm + '%';\nEND;\n```\n\n**Rule of thumb:** dynamic SQL inside stored procedures must use\n`sp_executesql` with bound parameters (SQL Server), `EXECUTE USING` with\n`$1` placeholders (Postgres PL\u002FpgSQL), or `PREPARE`\u002F`EXECUTE` equivalents.\nNever concatenate user input into a dynamic SQL string, even inside a\nstored procedure.\n",{"id":1365,"difficulty":127,"q":1366,"a":1367},"second-order-injection","What is second-order (stored) SQL injection?","**Second-order injection** occurs in two steps:\n1. Malicious input is stored safely in the database (the initial insertion\n   is parameterised and appears safe).\n2. Later, that stored value is retrieved and concatenated into a new SQL\n   query without parameterisation — causing injection on the second use.\n\n```python\n# Step 1: user registers with username = \"admin'--\"\n# This INSERT is parameterised — safe at registration:\ncur.execute(\"INSERT INTO users (username) VALUES (%s)\", (username,))\n# username = \"admin'--\" is stored harmlessly.\n\n# Step 2: admin panel retrieves the username and uses it unsafely:\nadmin_username = fetch_user(user_id)[\"username\"]  # → \"admin'--\"\ncur.execute(f\"SELECT * FROM audit_log WHERE actor = '{admin_username}'\")\n# → SELECT * FROM audit_log WHERE actor = 'admin'--'\n# The -- comments out the rest → dumps all audit log rows\n```\n\n**Rule of thumb:** data from the database must be treated as untrusted\nwhen used in a new query — even if you stored it safely. Always use\nparameterised queries for every database query, including queries that\nuse data retrieved from the database itself.\n",{"id":1369,"difficulty":106,"q":1370,"a":1371},"input-validation","Is input validation sufficient to prevent SQL injection?","Input validation is a **useful defence-in-depth measure** but is **not\nsufficient on its own** to prevent SQL injection. Allowlists (accepting\nonly known-good patterns) are more reliable than denylists (rejecting\nknown-bad strings), but both can be bypassed by clever encoding or\nunexpected input formats.\n\n```python\n# Denylist — INSUFFICIENT (easily bypassed with encoding)\nif \"'\" in user_input or \";\" in user_input:\n    raise ValueError(\"Invalid input\")\n# Attacker uses URL encoding, Unicode lookalikes, or multi-byte tricks to bypass\n\n# Allowlist — better but still not sufficient alone\nimport re\nif not re.match(r'^[a-zA-Z0-9_]+$', username):\n    raise ValueError(\"Invalid username\")\n# Better — but parameterised queries are STILL required as the primary defence\n```\n\nThe correct stack:\n1. **Parameterised queries** — primary defence (mandatory)\n2. **Input validation\u002Fallowlists** — secondary, reduces attack surface\n3. **Least privilege** — limits blast radius if injection occurs\n4. **WAF** — tertiary, may block some automated scans\n\n**Rule of thumb:** validate inputs AND use parameterised queries. Input\nvalidation is not a substitute for parameterisation — it is an additional\nlayer. If you have to choose one, choose parameterised queries.\n",{"id":1373,"difficulty":106,"q":1374,"a":1375},"error-messages","How do database error messages contribute to SQL injection risk?","**Verbose database error messages** expose the query structure, table names,\ncolumn names, and database version to an attacker — information used to\nrefine an injection attack (error-based SQLi).\n\n```python\n# BAD: returning the raw database error to the client\nexcept Exception as e:\n    return {\"error\": str(e)}\n# Attacker sees: 'column \"passwrd\" does not exist' → typo in column name revealed\n# Or: 'relation \"users\" does not exist' → table name confirmed\n\n# GOOD: log the full error server-side; return a generic message to the client\nimport logging\nexcept Exception as e:\n    logging.error(\"Database error: %s\", e, exc_info=True)\n    return {\"error\": \"An internal error occurred. Please try again.\"}\n```\n\n**Rule of thumb:** never expose raw database error messages to end users.\nLog them server-side with full stack traces and return a generic \"internal\nerror\" to the client. Use different log levels (DEBUG in development, ERROR\nin production) to ensure errors are visible to developers but not attackers.\n",{"id":1377,"difficulty":106,"q":1378,"a":1379},"waf-and-defence-in-depth","What role does a Web Application Firewall (WAF) play in preventing SQL injection?","A **WAF** inspects HTTP requests and blocks patterns that look like SQL\ninjection attempts (quotes, SQL keywords in unusual positions, encoded\npayloads). It provides a useful additional layer but should not be the\nprimary defence.\n\nLimitations of WAFs:\n- They can be bypassed with obfuscation (encoding, case variation, comments).\n- They may produce false positives, blocking legitimate requests.\n- They do nothing for second-order injection (the attack comes from the\n  database, not HTTP).\n- They are a perimeter control — if bypassed, no protection remains.\n\n```\nDefence-in-depth layers (innermost = most important):\n4. WAF             — blocks automated scans, buys time\n3. Input validation — reduces attack surface\n2. Least privilege  — limits blast radius\n1. Parameterised queries — PRIMARY DEFENCE (cannot be bypassed)\n```\n\n**Rule of thumb:** deploy a WAF as a defence-in-depth measure, not as a\nsubstitute for parameterised queries. A WAF buys you protection against\nautomated tools and script kiddies; a determined attacker will bypass it.\n",{"id":1381,"difficulty":127,"q":1382,"a":1383},"nosql-injection","Does SQL injection also apply to NoSQL databases?","SQL injection is specific to SQL databases, but analogous **NoSQL injection**\nattacks exist. MongoDB, for example, is vulnerable to operator injection when\nuser input is used directly in a query object.\n\n```javascript\n\u002F\u002F MongoDB — UNSAFE: user controls the query operator\nconst username = req.body.username;  \u002F\u002F attacker sends: { \"$ne\": null }\nconst user = await User.findOne({ username: username });\n\u002F\u002F Becomes: db.users.findOne({ username: { $ne: null } })\n\u002F\u002F → returns the FIRST user in the collection, bypassing login!\n\n\u002F\u002F SAFE: validate that username is a string before using it\nif (typeof username !== 'string') throw new Error('Invalid input');\nconst user = await User.findOne({ username: username });\n```\n\nThe prevention principle is the same: **never allow untrusted input to\ncontrol query structure**. In MongoDB, validate types strictly; in Redis,\nnever concatenate user input into Lua scripts.\n\n**Rule of thumb:** the injection principle extends beyond SQL — any query\nlanguage that mixes structure and data is potentially vulnerable when user\ninput influences the structure. Validate types and use library-provided\nsafe query builders for every database technology.\n",{"id":1385,"difficulty":106,"q":1386,"a":1387},"orm-mass-assignment","What is mass assignment and how does it relate to database security?","**Mass assignment** is not SQL injection, but is an ORM-related vulnerability\nwhere an attacker sets database columns they should not control by sending\nextra fields in an HTTP request body.\n\n```python\n# Django — VULNERABLE to mass assignment\n# Attacker sends POST: { \"username\": \"alice\", \"is_admin\": true }\nuser = User(**request.POST.dict())  # copies ALL fields including is_admin!\nuser.save()\n\n# SAFE: use an explicit allowlist of fields\nuser = User(\n    username=request.POST['username'],\n    email=request.POST['email'],\n    # is_admin NOT included — cannot be set by the user\n)\n\n# Django Forms provide this automatically:\nform = UserRegistrationForm(request.POST)  # only processes declared fields\nif form.is_valid():\n    form.save()\n```\n\n**Rule of thumb:** never pass raw request data directly to ORM constructors\nor `update()` calls. Always explicitly allowlist the fields that users are\npermitted to set, and never expose internal fields like `is_admin`,\n`role`, or `account_balance` to user-controlled input.\n",{"id":1389,"difficulty":106,"q":1390,"a":1391},"detecting-sqli","How do you detect SQL injection vulnerabilities in an existing codebase?","```python\n# 1. Code review: grep for string interpolation into SQL\n# Dangerous patterns in Python:\n# f\"SELECT ... {user_input}\"\n# \"SELECT ... \" + variable\n# \"SELECT ... %s\" % variable   ← % formatting bypasses parameterisation!\n# cursor.execute(\"... \" + x)\n\n# 2. Automated scanning tools:\n# - sqlmap (black-box: tests a live endpoint for injection)\n# - Bandit (Python SAST: flags unsafe DB calls)\n# - Semgrep rules for SQL injection patterns\n# - OWASP ZAP (web app scanner)\n\n# 3. Database query logs: look for queries with unusual quoting\n# Postgres: log_min_duration_statement = 0 + pg_stat_statements\n\n# 4. Unit tests for injection payloads\ndef test_no_injection():\n    result = search_products(\"' OR '1'='1\")\n    assert len(result) == 0  # should find nothing, not all products\n```\n\n**Rule of thumb:** include injection payload tests in your test suite for\nevery query that accepts user input. Run `sqlmap` or a similar scanner\nagainst staging environments before release. Add a Semgrep or Bandit check\nto CI to catch string concatenation into SQL at code-review time.\n",{"id":1393,"difficulty":127,"q":1394,"a":1395},"dynamic-order-by-injection","How do you safely handle a dynamic ORDER BY clause?","`ORDER BY` column names and directions cannot be passed as bind parameters\n— only values can. This means dynamic sorting is a common injection vector\nwhen developers concatenate the sort column directly from user input.\n\n```python\n# UNSAFE: attacker controls column name\ncolumn = request.args.get(\"sort\")  # attacker sends: \"1; DROP TABLE orders; --\"\ncur.execute(f\"SELECT * FROM orders ORDER BY {column}\")  # injection!\n\n# SAFE: allowlist of permitted column names\nALLOWED_SORT_COLUMNS = {\"id\", \"created_at\", \"total\", \"status\"}\nALLOWED_DIRECTIONS  = {\"ASC\", \"DESC\"}\n\ncolumn    = request.args.get(\"sort\",      \"created_at\")\ndirection = request.args.get(\"direction\", \"DESC\").upper()\n\nif column not in ALLOWED_SORT_COLUMNS:\n    column = \"created_at\"   # fall back to safe default\nif direction not in ALLOWED_DIRECTIONS:\n    direction = \"DESC\"\n\n# Now safe to interpolate — values are from a known-good set\ncur.execute(f\"SELECT * FROM orders ORDER BY {column} {direction}\")\n```\n\n**Rule of thumb:** for any dynamic SQL identifier (column name, table name,\nschema name), use an explicit allowlist — never accept the raw user value.\nBind parameters cannot protect identifiers, only values.\n",{"id":1397,"difficulty":106,"q":1398,"a":1399},"escape-vs-parameterise","What is the difference between escaping and parameterising?","**Escaping** modifies the input string to neutralise special characters\n(e.g., replacing `'` with `''`) before interpolating it into SQL.\n**Parameterising** sends the SQL structure and data values to the database\nas separate payloads — the driver handles quoting internally.\n\n```python\n# Escaping — FRAGILE (easily bypassed with multi-byte character tricks)\nname = user_input.replace(\"'\", \"''\")\ncur.execute(f\"SELECT * FROM users WHERE name = '{name}'\")\n\n# Parameterising — CORRECT\ncur.execute(\"SELECT * FROM users WHERE name = %s\", (user_input,))\n```\n\nWhy escaping is unreliable:\n- Requires knowing all dangerous characters for the current charset.\n- Multi-byte encodings (GBK, BIG5) can hide a `'` byte inside a two-byte\n  sequence, making `replace(\"'\", \"''\")` ineffective.\n- A single missed escape anywhere in a large codebase is a vulnerability.\n\n**Rule of thumb:** never escape and interpolate — always parameterise.\nEscaping is a last resort when a driver or ORM provides no parameterisation\noption and you must write raw SQL. Even then, use the driver's official\nescaping function (`psycopg2.extensions.adapt`, `mysqli_real_escape_string`)\n— never roll your own.\n",{"id":1401,"difficulty":106,"q":1402,"a":1403},"least-privilege-sqli","How does least-privilege database access reduce SQL injection impact?","If injection does occur despite parameterisation (e.g., via a legacy code\npath), **least privilege** limits the blast radius — the attacker can only\ndo what the compromised database account can do.\n\n```sql\n-- BAD: application uses a superuser or DBA account\n-- Attacker can: DROP TABLE, read pg_shadow (password hashes), run COPY TO,\n--               call xp_cmdshell (SQL Server), access all schemas\n\n-- GOOD: application uses a restricted role\nCREATE ROLE app_runtime NOINHERIT;\nGRANT SELECT, INSERT, UPDATE, DELETE ON orders    TO app_runtime;\nGRANT SELECT, INSERT, UPDATE, DELETE ON customers TO app_runtime;\nGRANT SELECT ON products TO app_runtime;\n-- NOT granted: DROP TABLE, ALTER, TRUNCATE, CREATE, COPY TO\u002FFROM\n-- NOT granted: pg_read_file, pg_execute_server_program\n-- NOT granted: access to other schemas\n\n-- Even with injection, attacker can only DML on those three tables\n```\n\n**Rule of thumb:** least privilege is not a substitute for parameterised\nqueries, but it is an essential backstop. A successful injection against\na read-only reporting account is far less damaging than one against a DBA\naccount. Always pair both controls.\n",{"description":104},"SQL injection interview questions — how attacks work, parameterised queries, ORM safety, stored procedure risks, second-order injection, blind injection, WAFs, and prevention best practices.","sql\u002Fsecurity\u002Fsql-injection","SQL Injection","-Q1GM9d1hBs2zfZExPTxntUlmwQfL0ra-HAaWbczZIE",{"id":1410,"title":1411,"body":1412,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":1432,"navigation":109,"order":30,"path":1433,"questions":1434,"questionsCount":751,"related":247,"seo":1511,"seoDescription":1512,"stem":1513,"subtopic":1514,"topic":29,"topicSlug":31,"updated":328,"__hash__":1515},"qa\u002Fsql\u002Fsubqueries\u002Fctes.md","Ctes",{"type":101,"value":1413,"toc":1429},[1414,1418],[639,1415,1417],{"id":1416},"about-common-table-expressions","About Common Table Expressions",[644,1419,1420,1421,1424,1425,1428],{},"CTEs (the ",[653,1422,1423],{},"WITH"," clause) turn deeply nested subqueries into readable, top-to-bottom\npipelines, and — uniquely — enable ",[648,1426,1427],{},"recursion"," for hierarchical data like org charts\nand category trees. Interviewers probe whether you understand a CTE's single-statement\nscope, how it differs from views and temp tables, the anchor\u002Frecursive structure of a\nrecursive CTE, and the reality that CTEs are mainly a readability tool, not a magic\nperformance boost.",{"title":104,"searchDepth":30,"depth":30,"links":1430},[1431],{"id":1416,"depth":30,"text":1417},{},"\u002Fsql\u002Fsubqueries\u002Fctes",[1435,1439,1443,1447,1451,1455,1459,1463,1467,1471,1475,1479,1483,1487,1491,1495,1499,1503,1507],{"id":1436,"difficulty":114,"q":1437,"a":1438},"what-is-a-cte","What is a Common Table Expression (CTE)?","A **CTE** is a named temporary result set, defined with the `WITH` clause, that\nexists only for the duration of a single statement. You define it once, then\nreference it by name in the query that follows — like a disposable, inline view.\n\n```sql\nWITH recent_orders AS (\n    SELECT * FROM orders WHERE created_at > '2026-01-01'\n)\nSELECT customer_id, COUNT(*)\nFROM   recent_orders          -- reference the CTE by name\nGROUP  BY customer_id;\n```\n\nCTEs make complex queries **readable** by breaking them into named, top-to-bottom\nsteps instead of deeply nested subqueries.\n\nRule of thumb: a CTE is a named subquery you define up front with `WITH` to make a\nquery read like sequential steps.\n",{"id":1440,"difficulty":114,"q":1441,"a":1442},"cte-syntax","What is the syntax of a CTE?","A CTE starts with `WITH`, names the result set, and defines it in parentheses;\nthe main query follows and references the name.\n\n```sql\nWITH cte_name (optional, col, list) AS (\n    SELECT ...               -- the CTE body\n)\nSELECT * FROM cte_name;      -- the main query MUST follow immediately\n```\n\nKey rules: the main statement must come **right after** the CTE definition; the\noptional column list renames the output columns; and the CTE is only visible to\nthe statement it's attached to.\n\nRule of thumb: `WITH name AS (...)` then the query — the CTE and its consumer are\none statement.\n",{"id":1444,"difficulty":106,"q":1445,"a":1446},"cte-vs-subquery","What is the difference between a CTE and a subquery?","Functionally a CTE is similar to a derived-table subquery, but it offers things a\nplain subquery can't:\n\n- **Reusability** — reference the same CTE multiple times in one query; a derived\n  table must be repeated.\n- **Readability** — named, top-to-bottom steps instead of inside-out nesting.\n- **Recursion** — only CTEs can be recursive.\n\n```sql\n-- a subquery repeated twice...\nSELECT * FROM (SELECT ...) a JOIN (SELECT ...) b ON ...;\n\n-- ...vs a CTE defined once, used twice\nWITH t AS (SELECT ...)\nSELECT * FROM t a JOIN t b ON ...;\n```\n\nPerformance is usually comparable — many engines treat a CTE as syntactic sugar\nfor a subquery.\n\nRule of thumb: prefer a CTE when logic is reused, recursive, or complex enough\nthat naming the steps helps.\n",{"id":1448,"difficulty":106,"q":1449,"a":1450},"cte-vs-view","What is the difference between a CTE and a view?","Both name a query, but their **scope and persistence** differ:\n\n- A **view** is a permanent schema object, stored in the catalog and reusable by\n  **any** query and user, until dropped.\n- A **CTE** is temporary and **scoped to a single statement** — it vanishes when\n  the query finishes and isn't stored anywhere.\n\n```sql\n-- view: created once, reused forever\nCREATE VIEW active_customers AS SELECT * FROM customers WHERE active = true;\n\n-- CTE: lives only inside this one statement\nWITH active_customers AS (SELECT * FROM customers WHERE active = true)\nSELECT * FROM active_customers;\n```\n\nRule of thumb: reuse across many queries → create a view; one-off, query-local\nlogic → use a CTE.\n",{"id":1452,"difficulty":106,"q":1453,"a":1454},"multiple-ctes","Can you define multiple CTEs in one query?","Yes. Write a single `WITH`, then separate each CTE definition with a **comma**.\nThey're listed top-to-bottom and any CTE can reference the ones **defined before\nit**.\n\n```sql\nWITH dept_avg AS (\n    SELECT dept_id, AVG(salary) AS avg_sal\n    FROM   employees GROUP BY dept_id\n),\nhigh_paying AS (\n    SELECT dept_id FROM dept_avg WHERE avg_sal > 80000  -- uses dept_avg\n)\nSELECT e.name\nFROM   employees e\nJOIN   high_paying h ON e.dept_id = h.dept_id;\n```\n\nOnly one `WITH` keyword is needed regardless of how many CTEs follow.\n\nRule of thumb: chain CTEs with commas to build a pipeline of named steps, each\nable to use the ones above it.\n",{"id":1456,"difficulty":106,"q":1457,"a":1458},"cte-chaining","How do you chain CTEs to build a multi-step pipeline?","Because a CTE can reference earlier CTEs, you can express a **data pipeline** as a\nsequence of transformations — each step reads cleanly off the previous one.\n\n```sql\nWITH raw AS (\n    SELECT customer_id, total FROM orders WHERE status = 'paid'\n),\nper_customer AS (\n    SELECT customer_id, SUM(total) AS spend FROM raw GROUP BY customer_id\n),\nranked AS (\n    SELECT customer_id, spend,\n           RANK() OVER (ORDER BY spend DESC) AS rnk\n    FROM per_customer\n)\nSELECT * FROM ranked WHERE rnk \u003C= 10;   -- top 10 spenders\n```\n\nThis is far more readable than three levels of nested subqueries.\n\nRule of thumb: model ETL-style logic as chained CTEs — filter, aggregate, rank,\nthen select.\n",{"id":1460,"difficulty":127,"q":1461,"a":1462},"recursive-cte","What is a recursive CTE?","A **recursive CTE** references **itself** to process hierarchical or iterative\ndata — org charts, category trees, graph traversal, or number\u002Fdate series. It's\ndeclared with `WITH RECURSIVE` (the keyword is optional in SQL Server).\n\n```sql\nWITH RECURSIVE subordinates AS (\n    SELECT id, name, manager_id          -- anchor: the starting row(s)\n    FROM   employees WHERE id = 1\n    UNION ALL\n    SELECT e.id, e.name, e.manager_id    -- recursive: joins back to the CTE\n    FROM   employees e\n    JOIN   subordinates s ON e.manager_id = s.id\n)\nSELECT * FROM subordinates;\n```\n\nIt runs the anchor once, then repeatedly runs the recursive part against the\nprevious result until no new rows appear.\n\nRule of thumb: use a recursive CTE to walk hierarchies and trees that a flat\n`JOIN` can't express.\n",{"id":1464,"difficulty":127,"q":1465,"a":1466},"recursive-cte-parts","What are the two parts of a recursive CTE?","A recursive CTE has two members joined by `UNION ALL`:\n\n1. **Anchor member** — the non-recursive base case; runs **once** to seed the\n   result (e.g. the root of a tree).\n2. **Recursive member** — references the CTE itself; runs **repeatedly**, each\n   iteration operating on the rows the previous iteration produced.\n\n```sql\nWITH RECURSIVE nums AS (\n    SELECT 1 AS n                     -- anchor\n    UNION ALL\n    SELECT n + 1 FROM nums WHERE n \u003C 5 -- recursive, with a stop condition\n)\nSELECT n FROM nums;   -- 1,2,3,4,5\n```\n\nThe recursive member **must** have a terminating condition or the query loops\nuntil it hits the recursion limit.\n\nRule of thumb: anchor seeds, recursive member iterates, `UNION ALL` glues them —\nalways include a stopping condition.\n",{"id":1468,"difficulty":127,"q":1469,"a":1470},"recursive-cte-infinite-loop","How do you prevent an infinite loop in a recursive CTE?","A recursive CTE loops forever if the recursive member never stops producing new\nrows — common with **cyclic data** (A reports to B, B reports to A).\n\nDefenses:\n- A **terminating predicate** in the recursive member (`WHERE n \u003C 100`,\n  `WHERE level \u003C 10`).\n- Track a **path \u002F visited set** and exclude already-seen nodes to break cycles.\n- Rely on the engine's **recursion limit** as a safety net (Postgres has no hard\n  default but you can `LIMIT`; SQL Server defaults to `MAXRECURSION 100`).\n\n```sql\n-- SQL Server: cap iterations explicitly\nSELECT * FROM subordinates OPTION (MAXRECURSION 50);\n```\n\nRule of thumb: every recursive CTE needs a stop condition; for graphs that may\ncontain cycles, also track visited nodes.\n",{"id":1472,"difficulty":106,"q":1473,"a":1474},"cte-multiple-references","Can a CTE be referenced multiple times in the same query?","Yes — a key advantage over a derived table. Define the CTE once and use its name\nas many times as needed, including **self-joins**.\n\n```sql\n-- compare each employee's salary to their manager's, using one CTE twice\nWITH emp AS (\n    SELECT id, name, salary, manager_id FROM employees\n)\nSELECT e.name, e.salary, m.salary AS manager_salary\nFROM   emp e\nJOIN   emp m ON e.manager_id = m.id;\n```\n\nWhether the engine computes the CTE once and caches it or re-evaluates per\nreference depends on the database and whether it's materialized.\n\nRule of thumb: reuse a CTE by name instead of copy-pasting the same subquery —\ndefine once, reference freely.\n",{"id":1476,"difficulty":127,"q":1477,"a":1478},"cte-materialization","Are CTEs materialized or inlined?","It depends on the database. **Materialized** means the CTE is computed once into a\ntemporary work table; **inlined** means it's folded into the main query and\noptimized together (often faster, since predicates can push down).\n\n- **PostgreSQL** — inlined since v12 when referenced once and not recursive; before\n  v12, CTEs were an **optimization fence** (always materialized). You can force\n  either with `MATERIALIZED` \u002F `NOT MATERIALIZED`.\n- **SQL Server \u002F MySQL 8+** — generally inline CTEs into the plan.\n\n```sql\n-- Postgres: force materialization (an optimization barrier)\nWITH t AS MATERIALIZED (SELECT * FROM big_table WHERE active)\nSELECT * FROM t WHERE id = 5;\n```\n\nRule of thumb: don't assume a CTE is materialized — check `EXPLAIN`; in modern\nPostgres use `MATERIALIZED`\u002F`NOT MATERIALIZED` to control it.\n",{"id":1480,"difficulty":127,"q":1481,"a":1482},"cte-performance","Do CTEs improve query performance?","Not inherently — CTEs are primarily a **readability** tool. In most engines a CTE\nproduces the same plan as the equivalent subquery or derived table.\n\nCaveats that can hurt or help:\n- An **older Postgres** materializing a CTE could **block predicate pushdown**,\n  making it slower than an inline subquery.\n- A CTE **referenced many times** that the engine recomputes each time can be\n  slower than expected — materializing it once may help.\n- Recursive CTEs over deep hierarchies can be expensive regardless.\n\nRule of thumb: choose CTEs for clarity, not speed; verify with `EXPLAIN` and\nconsider explicit materialization only when the plan shows a problem.\n",{"id":1484,"difficulty":106,"q":1485,"a":1486},"cte-in-update-delete","Can CTEs be used with INSERT, UPDATE, and DELETE?","Yes. Put the `WITH` clause in front of the DML statement and reference the CTE in\nit — handy for staging the rows to modify.\n\n```sql\n-- delete all but the latest order per customer\nWITH ranked AS (\n    SELECT id, ROW_NUMBER() OVER (PARTITION BY customer_id\n                                  ORDER BY created_at DESC) AS rn\n    FROM orders\n)\nDELETE FROM orders\nWHERE id IN (SELECT id FROM ranked WHERE rn > 1);\n```\n\nPostgres even allows **data-modifying CTEs** (`INSERT\u002FUPDATE\u002FDELETE ... RETURNING`\ninside a `WITH`), letting one statement write to several tables.\n\nRule of thumb: use a CTE to compute exactly which rows a DML statement should\ntouch — especially with window functions for \"keep the latest\" logic.\n",{"id":1488,"difficulty":106,"q":1489,"a":1490},"cte-window-functions","Why are CTEs often paired with window functions?","Window functions like `ROW_NUMBER()` and `RANK()` **can't be used directly in\n`WHERE`** (they're computed after filtering). A CTE (or derived table) lets you\ncompute the window value first, then filter on it in the outer query.\n\n```sql\n-- top 3 earners per department\nWITH ranked AS (\n    SELECT name, dept_id, salary,\n           ROW_NUMBER() OVER (PARTITION BY dept_id\n                              ORDER BY salary DESC) AS rn\n    FROM employees\n)\nSELECT name, dept_id, salary\nFROM   ranked\nWHERE  rn \u003C= 3;       -- filter on the window result\n```\n\nRule of thumb: compute the window function in a CTE, then filter its alias in the\nouter query — the standard \"top-N-per-group\" pattern.\n",{"id":1492,"difficulty":114,"q":1493,"a":1494},"cte-scope","What is the scope of a CTE?","A CTE is visible **only within the single statement** it's attached to. Once that\nstatement finishes, the CTE is gone — you can't reference it from a later\nstatement, and it isn't stored in the schema.\n\n```sql\nWITH t AS (SELECT 1 AS x)\nSELECT * FROM t;     -- works\n\nSELECT * FROM t;     -- ERROR: t no longer exists\n```\n\nAlso, later CTEs in the same `WITH` can see earlier ones, but not vice versa\n(except a recursive CTE referencing itself).\n\nRule of thumb: a CTE lives and dies with one statement — for cross-statement reuse\nuse a view or temp table.\n",{"id":1496,"difficulty":114,"q":1497,"a":1498},"cte-column-aliasing","How do you rename a CTE's output columns?","Provide an explicit **column list** in parentheses after the CTE name. The CTE\nbody's columns are mapped positionally to those names — useful when the body uses\nexpressions or you want clearer names.\n\n```sql\nWITH sales (region, total) AS (\n    SELECT region_id, SUM(amount)\n    FROM   orders\n    GROUP  BY region_id\n)\nSELECT region, total FROM sales ORDER BY total DESC;\n```\n\nThe number of names must match the number of output columns. This is also the\nstandard way to name the columns of a **recursive** CTE.\n\nRule of thumb: add `(col1, col2, ...)` after the CTE name to give its columns\nexplicit names — required when the body's columns are unnamed expressions.\n",{"id":1500,"difficulty":106,"q":1501,"a":1502},"cte-vs-temp-table","What is the difference between a CTE and a temporary table?","Both hold an intermediate result, but their lifetime and storage differ:\n\n- A **CTE** is logical and scoped to one statement; it isn't physically stored\n  (unless the engine materializes it internally) and can't be indexed.\n- A **temp table** is a real table in `tempdb`\u002Fsession scope, persists for the\n  **session\u002Ftransaction**, can be **indexed**, and is reusable across multiple\n  statements.\n\n```sql\n-- temp table: reusable across statements, can index\nCREATE TEMP TABLE big_spenders AS\n    SELECT customer_id, SUM(total) s FROM orders GROUP BY customer_id;\nCREATE INDEX ON big_spenders (s);\nSELECT * FROM big_spenders WHERE s > 1000;\n```\n\nRule of thumb: one-statement, no-index logic → CTE; reuse across statements or\nneed an index on the intermediate → temp table.\n",{"id":1504,"difficulty":106,"q":1505,"a":1506},"cte-recursive-use-cases","What are common use cases for recursive CTEs?","Recursive CTEs shine wherever data is **hierarchical or generated iteratively**:\n\n- **Org charts \u002F management chains** — all reports under a manager.\n- **Category \u002F comment trees** — nested parent-child structures.\n- **Bill of materials** — parts made of sub-parts.\n- **Graph traversal** — shortest path, connected nodes.\n- **Generating series** — number ranges or contiguous date sequences.\n\n```sql\n-- generate every date in a month\nWITH RECURSIVE d AS (\n    SELECT DATE '2026-06-01' AS day\n    UNION ALL\n    SELECT day + 1 FROM d WHERE day \u003C DATE '2026-06-30'\n)\nSELECT day FROM d;\n```\n\nRule of thumb: reach for a recursive CTE whenever you'd otherwise need a loop to\nwalk a tree, graph, or generated series.\n",{"id":1508,"difficulty":106,"q":1509,"a":1510},"when-not-to-use-cte","When should you avoid using a CTE?","CTEs are great for clarity, but they aren't always the right tool:\n\n- When you need the intermediate **reused across multiple statements** — a view or\n  temp table fits better.\n- When you need an **index** on the intermediate result — use a temp table.\n- On **older Postgres**, when materialization would block predicate pushdown and\n  slow the query — an inline subquery may be faster.\n- For a trivial one-liner where a simple subquery is just as clear.\n\nRule of thumb: use CTEs for readable, single-statement logic; switch to views or\ntemp tables when you need persistence, reuse, or indexing.\n",{"description":104},"SQL CTE interview questions — the WITH clause, recursive CTEs, CTE vs subquery vs view, materialization, multiple CTEs and chaining, and performance.","sql\u002Fsubqueries\u002Fctes","Common Table Expressions (CTEs)","b36Ff8SVYX5Lj0B-Q5_So2hKhCvupG7vokR_m7ylLp4",{"id":1517,"title":1518,"body":1519,"description":104,"difficulty":127,"extension":107,"framework":10,"frameworkSlug":12,"meta":1523,"navigation":109,"order":30,"path":1524,"questions":1525,"questionsCount":323,"related":247,"seo":1586,"seoDescription":1587,"stem":1588,"subtopic":1589,"topic":64,"topicSlug":66,"updated":328,"__hash__":1590},"qa\u002Fsql\u002Ftransactions\u002Fisolation-concurrency.md","Isolation Concurrency",{"type":101,"value":1520,"toc":1521},[],{"title":104,"searchDepth":30,"depth":30,"links":1522},[],{},"\u002Fsql\u002Ftransactions\u002Fisolation-concurrency",[1526,1530,1534,1538,1542,1546,1550,1554,1558,1562,1566,1570,1574,1578,1582],{"id":1527,"difficulty":114,"q":1528,"a":1529},"what-is-isolation","What is transaction isolation?","**Transaction isolation** controls how and when the changes made by one\ntransaction become visible to other concurrent transactions. Higher isolation\nprevents more anomalies but increases contention; lower isolation is faster\nbut allows more concurrency bugs.\n\nSQL defines four standard isolation levels ranked from weakest to strongest:\n`READ UNCOMMITTED` → `READ COMMITTED` → `REPEATABLE READ` → `SERIALIZABLE`.\n\n```sql\n-- Set isolation level for the current transaction (Postgres)\nBEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;\n\n-- Or set a session default (MySQL)\nSET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;\n```\n\n**Rule of thumb:** most applications are fine with `READ COMMITTED` (the\nPostgres default). Upgrade to `REPEATABLE READ` or `SERIALIZABLE` only\nwhen your application logic requires a consistent snapshot across multiple\nreads in the same transaction.\n",{"id":1531,"difficulty":106,"q":1532,"a":1533},"read-phenomena","What are the three read phenomena isolation levels protect against?","The SQL standard defines three anomalies that can occur when transactions\nrun concurrently:\n\n1. **Dirty read** — reading uncommitted changes from another transaction.\n   If that transaction rolls back, you read data that never existed.\n2. **Non-repeatable read** — reading the same row twice in the same\n   transaction and getting different values because another transaction\n   committed an update between the two reads.\n3. **Phantom read** — running the same range query twice and getting\n   different *sets of rows* because another transaction inserted or deleted\n   rows between the two reads.\n\n```\n-- Dirty read example (requires READ UNCOMMITTED)\nTx A: UPDATE products SET price = 999 WHERE id = 1  (not committed)\nTx B: SELECT price FROM products WHERE id = 1  → 999  (dirty!)\nTx A: ROLLBACK\n-- Tx B acted on a price of 999 that never permanently existed.\n```\n\n**Rule of thumb:** map each anomaly to the isolation level that prevents\nit: `READ COMMITTED` prevents dirty reads; `REPEATABLE READ` also prevents\nnon-repeatable reads; `SERIALIZABLE` also prevents phantoms.\n",{"id":1535,"difficulty":114,"q":1536,"a":1537},"read-uncommitted","What is READ UNCOMMITTED and when (if ever) is it useful?","**`READ UNCOMMITTED`** is the lowest isolation level. Transactions can read\n**uncommitted (\"dirty\") changes** from other transactions. This means you can\nread data that another transaction later rolls back — data that was never\npermanently committed.\n\n```sql\n-- SQL Server: allow dirty reads\nSET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;\nSELECT * FROM orders WHERE status = 'pending';\n-- May return rows that are being modified by another transaction\n-- and might disappear if that transaction rolls back.\n```\n\nIt is rarely appropriate. One accepted use case: very approximate counts or\nestimates on a large table where absolute accuracy is not required and locking\noverhead matters more than precision.\n\nIn **Postgres**, `READ UNCOMMITTED` is mapped to `READ COMMITTED` internally —\nPostgres's MVCC architecture never allows dirty reads regardless of the\nisolation level set.\n\n**Rule of thumb:** never use `READ UNCOMMITTED` for any business-logic query.\nIf you need a rough count on a large table, use `pg_class.reltuples` in\nPostgres instead.\n",{"id":1539,"difficulty":114,"q":1540,"a":1541},"read-committed","What does READ COMMITTED guarantee and what can still go wrong?","**`READ COMMITTED`** is the default in Postgres and Oracle. Each statement\nwithin a transaction sees only rows that were committed *before that\nstatement started*. This prevents dirty reads.\n\nWhat can still go wrong:\n- **Non-repeatable reads** — two `SELECT`s in the same transaction can\n  return different values for the same row if another transaction commits\n  between them.\n- **Phantom reads** — a range query can return different row counts if\n  another transaction inserts\u002Fdeletes rows between queries.\n\n```sql\n-- Non-repeatable read under READ COMMITTED:\n-- Tx A (READ COMMITTED):\nBEGIN;\n  SELECT balance FROM accounts WHERE id = 1;  -- → 500\n  -- Tx B commits: UPDATE accounts SET balance = 0 WHERE id = 1;\n  SELECT balance FROM accounts WHERE id = 1;  -- → 0 (different!)\nCOMMIT;\n```\n\n**Rule of thumb:** `READ COMMITTED` is correct for most OLTP workloads\nwhere each query is self-contained. If your transaction reads a value and\nthen uses it in a later write, consider `REPEATABLE READ` to prevent the\nvalue from changing between the read and the write.\n",{"id":1543,"difficulty":106,"q":1544,"a":1545},"repeatable-read","What does REPEATABLE READ guarantee?","**`REPEATABLE READ`** ensures that if a transaction reads a row, it will\nsee the same values for that row on every subsequent read within the same\ntransaction — even if another transaction commits updates to that row in\nbetween. This prevents both dirty reads and non-repeatable reads.\n\nIn **Postgres**, `REPEATABLE READ` uses a **snapshot** taken at the start\nof the transaction, so the transaction sees a consistent view of all data\nas it was when it began. Postgres also prevents phantom reads under this\nlevel (stronger than the SQL standard requires).\n\nIn **MySQL (InnoDB)**, `REPEATABLE READ` is the default and uses a\nconsistent read snapshot for `SELECT`s, but phantom rows can still appear\nin locking reads (`SELECT FOR UPDATE`).\n\n```sql\nBEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;\n  SELECT balance FROM accounts WHERE id = 1;  -- → 500\n  -- Another transaction commits: UPDATE accounts SET balance = 0 WHERE id = 1\n  SELECT balance FROM accounts WHERE id = 1;  -- → still 500 (snapshot)\nCOMMIT;\n```\n\n**Rule of thumb:** use `REPEATABLE READ` when a transaction reads the same\ndata multiple times and the business logic requires it to be consistent\nacross those reads (e.g., computing a report in multiple steps).\n",{"id":1547,"difficulty":127,"q":1548,"a":1549},"serializable","What does SERIALIZABLE isolation guarantee?","**`SERIALIZABLE`** is the strongest isolation level. It guarantees that the\nresult of concurrent transactions is equivalent to running them one after\nanother in some serial order — as if there were no concurrency at all.\n\nPostgres implements this via **Serializable Snapshot Isolation (SSI)**, which\ntracks read\u002Fwrite dependencies and aborts transactions that would create a\ncycle (a non-serializable schedule). It avoids broad locking but can abort\ntransactions that need to be retried.\n\n```sql\n-- Classic serialization anomaly (write skew) — prevented by SERIALIZABLE:\n-- Two doctors both check \"at least one doctor is on call\" and both\n-- decide to go off call — ending with zero doctors on call.\n\n-- Under SERIALIZABLE, one of the two transactions is aborted and must retry.\nBEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;\n  SELECT COUNT(*) FROM doctors WHERE on_call = TRUE;  -- → 2\n  UPDATE doctors SET on_call = FALSE WHERE id = 42;\nCOMMIT; -- may fail with serialization failure → retry\n```\n\n**Rule of thumb:** use `SERIALIZABLE` for financial ledgers, inventory\nmanagement, or any domain where write skew would produce incorrect results.\nBuild retry logic for `SQLSTATE 40001` (serialization failure) into the\napplication.\n",{"id":1551,"difficulty":127,"q":1552,"a":1553},"mvcc","What is MVCC and how does it enable concurrency without locking reads?","**MVCC** (Multi-Version Concurrency Control) allows readers and writers to\noperate concurrently without blocking each other by keeping **multiple\nversions of each row** (old and new) in the storage engine.\n\n- A **reader** sees the version of each row that was committed before its\n  transaction (or query) started — it never waits for a writer.\n- A **writer** creates a new row version alongside the old one. Other\n  readers still see the old version until the new one is committed.\n\n```\n-- Timeline (Postgres MVCC):\nt=1: INSERT INTO t VALUES (1, 'old')  -- row version v1\nt=2: Tx A begins (snapshot = v1)\nt=3: Tx B: UPDATE t SET val='new' WHERE id=1  -- creates v2\nt=4: Tx B: COMMIT\nt=5: Tx A: SELECT * FROM t  -- still sees v1 (its snapshot)\nt=6: Tx A: COMMIT\nt=7: VACUUM reclaims v1 (no transaction needs it anymore)\n```\n\nThe downside of MVCC is **dead row accumulation** — old versions must be\ncleaned up by `VACUUM` in Postgres. A long-running transaction prevents\nVACUUM from reclaiming any versions created after its snapshot.\n\n**Rule of thumb:** understand that Postgres reads never block writes and\nwrites never block reads — this is MVCC in action. Long-running transactions\nare the enemy of MVCC health because they pin old row versions in storage.\n",{"id":1555,"difficulty":127,"q":1556,"a":1557},"write-skew","What is write skew and how does SERIALIZABLE prevent it?","**Write skew** is a concurrency anomaly where two transactions each read an\noverlapping set of rows, make a decision based on what they read, and then\neach write to a *different* row — producing a state that neither transaction\nwould have allowed if it had run alone.\n\n```sql\n-- Invariant: at least one doctor must be on call at all times.\n-- Both Tx A and Tx B read: 2 doctors on call → each decides to go off call.\n\n-- Tx A:\nBEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;\n  SELECT COUNT(*) FROM on_call WHERE shift_id = 1;  -- → 2, safe to go off\n  UPDATE on_call SET doctor_id = NULL WHERE doctor_id = 101 AND shift_id = 1;\nCOMMIT;\n\n-- Tx B (concurrent):\nBEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;\n  SELECT COUNT(*) FROM on_call WHERE shift_id = 1;  -- → 2, safe to go off\n  UPDATE on_call SET doctor_id = NULL WHERE doctor_id = 202 AND shift_id = 1;\nCOMMIT;\n\n-- Result: 0 doctors on call — invariant violated.\n-- Under SERIALIZABLE, one transaction is aborted and must retry.\n```\n\n`REPEATABLE READ` does NOT prevent write skew because each transaction\nwrites to a *different* row. Only `SERIALIZABLE` (SSI) detects the\nrw-dependency cycle and prevents it.\n\n**Rule of thumb:** write skew is subtle and hard to spot in code reviews.\nAudit any transaction that reads a set of rows and then writes based on an\naggregate of that set — it is a write-skew candidate.\n",{"id":1559,"difficulty":106,"q":1560,"a":1561},"lost-update","What is a lost update and how do you prevent it?","A **lost update** occurs when two transactions both read a value, compute a\nnew value based on it, and then both write back — the second write overwrites\nthe first writer's change, effectively losing it.\n\n```sql\n-- Both sessions read stock = 10\n-- Session A: stock = 10 - 1 = 9  → UPDATE inventory SET stock = 9 ...\n-- Session B: stock = 10 - 1 = 9  → UPDATE inventory SET stock = 9 ...\n-- Result: stock = 9 instead of 8. One sale is lost.\n\n-- Fix 1: atomic UPDATE (no read-then-write race)\nUPDATE inventory SET stock = stock - 1 WHERE product_id = 42 AND stock > 0;\n\n-- Fix 2: SELECT FOR UPDATE (pessimistic lock)\nBEGIN;\n  SELECT stock FROM inventory WHERE product_id = 42 FOR UPDATE;\n  UPDATE inventory SET stock = stock - 1 WHERE product_id = 42;\nCOMMIT;\n\n-- Fix 3: optimistic locking with a version column\nUPDATE inventory SET stock = stock - 1, version = version + 1\nWHERE product_id = 42 AND version = 7;\n-- 0 rows affected → conflict → retry\n```\n\n**Rule of thumb:** the safest fix is an atomic `UPDATE col = col - delta`\n(the database computes the new value from the current one, no race window).\nUse `SELECT FOR UPDATE` when the read-compute-write logic is too complex to\nexpress in a single `UPDATE`.\n",{"id":1563,"difficulty":106,"q":1564,"a":1565},"phantom-read","What is a phantom read and what isolation level prevents it?","A **phantom read** occurs when a transaction executes the same range query\ntwice and gets different rows because another transaction inserted or deleted\nqualifying rows between the two reads.\n\n```sql\n-- Tx A (REPEATABLE READ in standard SQL, not Postgres):\nBEGIN;\n  SELECT COUNT(*) FROM bookings WHERE room_id = 5 AND date = '2026-07-01';\n  -- → 0 (room is free)\n  -- Tx B inserts a booking for room 5, date 2026-07-01 and commits\n  SELECT COUNT(*) FROM bookings WHERE room_id = 5 AND date = '2026-07-01';\n  -- → 1 (phantom row appeared!)\nCOMMIT;\n```\n\n- `READ COMMITTED`: phantoms possible.\n- `REPEATABLE READ` (standard SQL): phantoms still possible for inserts;\n  prevented for updates on existing rows. Postgres's MVCC snapshot prevents\n  phantoms completely at this level.\n- `SERIALIZABLE`: prevents all phantoms.\n\n**Rule of thumb:** if your application logic checks \"does row X exist before\ninserting it,\" use `SERIALIZABLE` or an explicit lock (`SELECT FOR UPDATE \u002F\nFOR SHARE`) to prevent phantom inserts from racing with your check.\n",{"id":1567,"difficulty":127,"q":1568,"a":1569},"gap-locks","What are gap locks and next-key locks in MySQL InnoDB?","**Gap locks** and **next-key locks** are MySQL InnoDB mechanisms that prevent\nphantom reads under `REPEATABLE READ` by locking *ranges* of index space,\nnot just existing rows.\n\n- **Gap lock**: locks the gap between two index values — prevents inserts\n  into that range by other transactions.\n- **Next-key lock**: a gap lock plus the index record at the upper boundary.\n  InnoDB uses next-key locks by default under `REPEATABLE READ`.\n\n```sql\n-- Under REPEATABLE READ in MySQL, this SELECT FOR UPDATE\n-- locks not just the rows where age BETWEEN 20 AND 30,\n-- but also the gaps so no new rows in that range can be inserted.\nSELECT * FROM users WHERE age BETWEEN 20 AND 30 FOR UPDATE;\n```\n\nGap locks can cause unexpected lock contention: inserting a value in a\nrange scanned by another transaction will block even if the inserted row\ndoes not match the other transaction's WHERE clause exactly.\n\n**Rule of thumb:** if you see unexpected INSERT waits in MySQL, check\nwhether a concurrent transaction's range scan holds a gap lock covering\nyour insert position. Upgrading to `READ COMMITTED` disables gap locks\nif you don't need phantom prevention.\n",{"id":1571,"difficulty":114,"q":1572,"a":1573},"default-isolation-levels","What is the default isolation level in Postgres, MySQL, and SQL Server?","| Database | Default isolation level | MVCC? |\n|---|---|---|\n| **Postgres** | `READ COMMITTED` | Yes — reads never block writes |\n| **MySQL InnoDB** | `REPEATABLE READ` | Yes — consistent read snapshots |\n| **SQL Server** | `READ COMMITTED` | Optional (`READ_COMMITTED_SNAPSHOT`) |\n| **Oracle** | `READ COMMITTED` | Yes |\n\n```sql\n-- Check current isolation level\n-- Postgres:\nSHOW transaction_isolation;\n\n-- MySQL:\nSELECT @@transaction_isolation;\n\n-- SQL Server:\nSELECT transaction_isolation_level\nFROM   sys.dm_exec_sessions\nWHERE  session_id = @@SPID;\n```\n\nSQL Server has `READ_COMMITTED_SNAPSHOT` (RCSI) — an opt-in mode that\ngives `READ COMMITTED` MVCC-style behaviour (readers do not block writers),\nsimilar to Postgres's default.\n\n**Rule of thumb:** know your database's default before assuming behaviour.\nCode written for Postgres (`READ COMMITTED`) may behave differently when\nported to MySQL (`REPEATABLE READ`) and vice versa.\n",{"id":1575,"difficulty":127,"q":1576,"a":1577},"serialization-failure-retry","How should application code handle a serialization failure?","When the database aborts a `SERIALIZABLE` (or `REPEATABLE READ` in MySQL)\ntransaction due to a conflict, it raises **SQLSTATE `40001`** (serialization\nfailure). The correct response is to **retry the entire transaction** from\nthe beginning — not just the failed statement.\n\n```python\n# Python + psycopg2 example\nimport psycopg2\nfrom psycopg2 import errors\nimport time\n\nMAX_RETRIES = 5\n\ndef transfer(conn, from_id, to_id, amount):\n    for attempt in range(MAX_RETRIES):\n        try:\n            with conn.cursor() as cur:\n                conn.autocommit = False\n                cur.execute(\"SET TRANSACTION ISOLATION LEVEL SERIALIZABLE\")\n                cur.execute(\"UPDATE accounts SET balance = balance - %s WHERE id = %s\",\n                            (amount, from_id))\n                cur.execute(\"UPDATE accounts SET balance = balance + %s WHERE id = %s\",\n                            (amount, to_id))\n                conn.commit()\n                return  # success\n        except errors.SerializationFailure:\n            conn.rollback()\n            time.sleep(0.05 * (2 ** attempt))  # exponential back-off\n    raise RuntimeError(\"Transaction failed after max retries\")\n```\n\n**Rule of thumb:** serialization failures are expected and normal under\n`SERIALIZABLE` — design every transaction that uses this level with a retry\nloop and exponential back-off. Never surface the raw database error to the\nend user.\n",{"id":1579,"difficulty":127,"q":1580,"a":1581},"snapshot-isolation","What is snapshot isolation and how does it differ from SERIALIZABLE?","**Snapshot isolation (SI)** gives each transaction a consistent snapshot of\nthe database as it was at transaction start. Reads always see committed data\nfrom that snapshot, and writes conflict only if two transactions write to the\n*same row* (first-committer-wins). This prevents dirty reads, non-repeatable\nreads, and most phantom reads.\n\nThe key difference from `SERIALIZABLE`: SI still allows **write skew** —\ntwo transactions can each read a set of rows and write to different rows\nbased on that read, producing a state that neither transaction would have\npermitted alone (see write-skew question).\n\n```\nIsolation guarantees comparison:\n┌─────────────────────┬──────┬──────┬─────────┬──────────────┐\n│                     │ R.U. │ R.C. │ Rep.Rd. │ Serializable │\n├─────────────────────┼──────┼──────┼─────────┼──────────────┤\n│ Dirty read          │  ✗   │  ✓   │   ✓     │      ✓       │\n│ Non-repeatable read │  ✗   │  ✗   │   ✓     │      ✓       │\n│ Phantom read        │  ✗   │  ✗   │  ✓*     │      ✓       │\n│ Write skew          │  ✗   │  ✗   │   ✗     │      ✓       │\n└─────────────────────┴──────┴──────┴─────────┴──────────────┘\n(* Postgres REPEATABLE READ prevents phantoms via MVCC snapshot)\n```\n\n**Rule of thumb:** \"snapshot isolation\" is what most databases actually\nimplement when you ask for `REPEATABLE READ`. It is safe for the vast\nmajority of workloads. Upgrade to `SERIALIZABLE` only when write skew is\na real risk in your domain.\n",{"id":1583,"difficulty":127,"q":1584,"a":1585},"lock-modes","What lock modes do databases use and how do they interact?","Databases use a hierarchy of locks with compatibility rules that determine\nwhich locks can be held simultaneously by different transactions.\n\nCommon lock modes (Postgres naming):\n\n| Mode | Abbr | Conflicts with |\n|---|---|---|\n| Access Share | AS | Access Exclusive only |\n| Row Share | RS | Exclusive, Access Exclusive |\n| Row Exclusive | RX | Share, Share Row Exclusive, Exclusive, Access Exclusive |\n| Share | S | Row Exclusive, Share Row Exclusive, Exclusive, Access Exclusive |\n| Exclusive | X | Everything except Access Share |\n| Access Exclusive | AX | Everything |\n\n```sql\n-- SELECT acquires Access Share (compatible with almost everything)\nSELECT * FROM orders;\n\n-- INSERT\u002FUPDATE\u002FDELETE acquire Row Exclusive\nUPDATE orders SET status = 'shipped' WHERE id = 1;\n\n-- ALTER TABLE requires Access Exclusive — blocks ALL other operations\nALTER TABLE orders ADD COLUMN notes TEXT;\n\n-- Check current locks\nSELECT relation::regclass, mode, granted\nFROM   pg_locks\nWHERE  relation IS NOT NULL;\n```\n\n**Rule of thumb:** `ALTER TABLE` takes an `Access Exclusive` lock and blocks\nevery read and write on the table for its duration. On large tables, use\n`CREATE INDEX CONCURRENTLY` and multi-step migrations to minimise lock time.\n",{"description":104},"SQL isolation levels interview questions — READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE, dirty reads, phantom reads, lost updates, MVCC, and locking behaviour across Postgres, MySQL, and SQL Server.","sql\u002Ftransactions\u002Fisolation-concurrency","Isolation Levels & Concurrency","egB7GrHYIzUcvVVoWUXEw7Qtp0A6GWgHZzO7X2jjX2Q",{"id":1592,"title":1593,"body":1594,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":1645,"navigation":109,"order":30,"path":1646,"questions":1647,"questionsCount":946,"related":247,"seo":1720,"seoDescription":1721,"stem":1722,"subtopic":1593,"topic":38,"topicSlug":40,"updated":328,"__hash__":1723},"qa\u002Fsql\u002Fwindow-functions\u002Franking-functions.md","Ranking Functions",{"type":101,"value":1595,"toc":1642},[1596,1600],[639,1597,1599],{"id":1598},"about-ranking-functions","About Ranking Functions",[644,1601,1602,1603,1606,1607,1606,1610,1606,1613,1606,1616,1619,1620,1623,1624,1627,1628,1606,1631,1606,1634,1637,1638,1641],{},"Ranking functions — ",[653,1604,1605],{},"ROW_NUMBER",", ",[653,1608,1609],{},"RANK",[653,1611,1612],{},"DENSE_RANK",[653,1614,1615],{},"NTILE",[653,1617,1618],{},"PERCENT_RANK",",\n",[653,1621,1622],{},"CUME_DIST"," — assign a position to each row within its partition. The most-tested\ndistinction is how each handles ",[648,1625,1626],{},"ties"," (unique vs gaps vs no-gaps), which drives the\nright choice for ",[648,1629,1630],{},"top-N-per-group",[648,1632,1633],{},"deduplication",[648,1635,1636],{},"pagination",", and ",[648,1639,1640],{},"Nth\nhighest value"," problems — all built on the rank-in-a-CTE-then-filter pattern.",{"title":104,"searchDepth":30,"depth":30,"links":1643},[1644],{"id":1598,"depth":30,"text":1599},{},"\u002Fsql\u002Fwindow-functions\u002Franking-functions",[1648,1652,1656,1660,1664,1668,1672,1676,1680,1684,1688,1692,1696,1700,1704,1708,1712,1716],{"id":1649,"difficulty":114,"q":1650,"a":1651},"what-are-ranking-functions","What are ranking window functions?","**Ranking functions** assign a position number to each row **within its\npartition**, based on the window's `ORDER BY`. The main ones are `ROW_NUMBER`,\n`RANK`, `DENSE_RANK`, and `NTILE`, plus the distribution functions `PERCENT_RANK`\nand `CUME_DIST`.\n\n```sql\nSELECT name, salary,\n       ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn,\n       RANK()       OVER (ORDER BY salary DESC) AS rnk,\n       DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk\nFROM   employees;\n```\n\nAll ranking functions **require** an `ORDER BY` in the `OVER` clause — without an\norder, \"rank\" has no meaning.\n\nRule of thumb: ranking functions number rows by an ordering; they always need\n`ORDER BY` in `OVER`.\n",{"id":1653,"difficulty":114,"q":1654,"a":1655},"row-number","What does ROW_NUMBER() do?","`ROW_NUMBER()` assigns a **unique, sequential integer** to each row within the\npartition, in the window's `ORDER BY` order — `1, 2, 3, ...` with **no ties and no\ngaps**. Even rows with equal ordering values get distinct numbers (the tie-break is\narbitrary unless you add more `ORDER BY` columns).\n\n```sql\nSELECT name, dept_id,\n       ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn\nFROM   employees;\n```\n\nIt's the go-to for **pagination**, **deduplication**, and **top-N-per-group**.\n\nRule of thumb: `ROW_NUMBER()` = a unique 1,2,3 sequence per partition, no ties —\nuse it when you need exactly one row per position.\n",{"id":1657,"difficulty":106,"q":1658,"a":1659},"rank-vs-dense-rank","What is the difference between RANK() and DENSE_RANK()?","Both give **tied rows the same rank**, but they differ in what comes next:\n\n- `RANK()` **leaves gaps** after ties — if two rows tie at 1, the next is 3.\n- `DENSE_RANK()` **leaves no gaps** — after a tie at 1, the next is 2.\n\n```sql\n-- salaries: 100, 100, 90\n-- RANK():       1,   1,   3\n-- DENSE_RANK(): 1,   1,   2\nSELECT name, salary,\n       RANK()       OVER (ORDER BY salary DESC) AS rnk,\n       DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk\nFROM employees;\n```\n\nRule of thumb: `RANK` skips numbers after ties (like Olympic ranking);\n`DENSE_RANK` keeps them consecutive.\n",{"id":1661,"difficulty":106,"q":1662,"a":1663},"row-number-vs-rank","How does ROW_NUMBER() differ from RANK() and DENSE_RANK()?","The key difference is **how ties are handled**:\n\n- `ROW_NUMBER()` — always unique; tied rows get **different** numbers (arbitrary\n  order among the tie).\n- `RANK()` — tied rows get the **same** rank, then a **gap**.\n- `DENSE_RANK()` — tied rows get the **same** rank, **no gap**.\n\n```sql\n-- values 100, 100, 90:\n-- ROW_NUMBER: 1, 2, 3\n-- RANK:       1, 1, 3\n-- DENSE_RANK: 1, 1, 2\n```\n\nRule of thumb: choose `ROW_NUMBER` for unique positions, `RANK`\u002F`DENSE_RANK` when\nties should share a position (gaps vs no gaps).\n",{"id":1665,"difficulty":106,"q":1666,"a":1667},"ntile","What does NTILE() do?","`NTILE(n)` divides the ordered rows of a partition into **n roughly equal buckets**\nand labels each row with its bucket number `1..n`. It's used for **quartiles,\ndeciles, percentile bands**, and bucketing.\n\n```sql\n-- split employees into 4 salary quartiles\nSELECT name, salary,\n       NTILE(4) OVER (ORDER BY salary DESC) AS quartile\nFROM   employees;\n```\n\nIf the row count doesn't divide evenly, the **earlier buckets get one extra row**.\n\nRule of thumb: `NTILE(n)` splits ordered rows into n balanced groups — use it for\nquartiles\u002Fdeciles and even distribution.\n",{"id":1669,"difficulty":127,"q":1670,"a":1671},"percent-rank","What does PERCENT_RANK() compute?","`PERCENT_RANK()` returns the **relative rank** of a row as a value between `0` and\n`1`: `(rank - 1) \u002F (total_rows - 1)`. The first row is always `0`; the last is `1`.\nIt tells you what fraction of rows rank **below** the current one.\n\n```sql\nSELECT name, salary,\n       PERCENT_RANK() OVER (ORDER BY salary) AS pct_rank\nFROM   employees;\n```\n\nIt's useful for **percentile-style comparisons** (e.g. \"this salary is higher than\n80% of others\").\n\nRule of thumb: `PERCENT_RANK()` = where a row sits on a 0–1 scale relative to the\nrest; first row 0, last row 1.\n",{"id":1673,"difficulty":127,"q":1674,"a":1675},"cume-dist","What does CUME_DIST() compute and how does it differ from PERCENT_RANK()?","`CUME_DIST()` (cumulative distribution) returns the **fraction of rows with a value\nless than or equal to** the current row: `rows_\u003C=_current \u002F total_rows`. It ranges\nin `(0, 1]`.\n\nThe difference from `PERCENT_RANK()`:\n- `CUME_DIST` = count of rows **≤ current** \u002F total (includes the current row).\n- `PERCENT_RANK` = `(rank - 1) \u002F (n - 1)` (excludes current; first row is 0).\n\n```sql\nSELECT name, salary,\n       CUME_DIST()    OVER (ORDER BY salary) AS cume,\n       PERCENT_RANK() OVER (ORDER BY salary) AS pct_rank\nFROM employees;\n```\n\nRule of thumb: `CUME_DIST` answers \"what proportion are at or below me?\";\n`PERCENT_RANK` answers \"what's my relative rank position 0–1?\".\n",{"id":1677,"difficulty":114,"q":1678,"a":1679},"ranking-requires-order-by","Why do ranking functions require an ORDER BY in the OVER clause?","Ranking is meaningless without a defined order — the function needs to know **by\nwhat** to rank. So `ROW_NUMBER`, `RANK`, `DENSE_RANK`, `NTILE`, etc. all **require**\n`ORDER BY` inside `OVER`; omitting it is an error in most databases.\n\n```sql\n-- ERROR: no ordering to rank by\nROW_NUMBER() OVER (PARTITION BY dept_id)\n\n-- correct\nROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC)\n```\n\n(Aggregate windows like `SUM OVER ()` don't need `ORDER BY`, but ranking functions\ndo.)\n\nRule of thumb: every ranking function needs `ORDER BY` in `OVER` to define the\nranking criterion.\n",{"id":1681,"difficulty":106,"q":1682,"a":1683},"top-n-per-group","How do you select the top N rows per group?","The classic pattern: number rows within each partition with a ranking function in a\n**CTE\u002Fsubquery**, then filter on that number in the outer query (window functions\ncan't go in `WHERE`).\n\n```sql\n-- top 3 highest-paid employees per department\nWITH ranked AS (\n    SELECT name, dept_id, salary,\n           DENSE_RANK() OVER (PARTITION BY dept_id\n                              ORDER BY salary DESC) AS rnk\n    FROM employees\n)\nSELECT name, dept_id, salary\nFROM   ranked\nWHERE  rnk \u003C= 3;\n```\n\nUse `ROW_NUMBER` for \"exactly N rows\" or `RANK`\u002F`DENSE_RANK` to **include ties** at\nthe cutoff.\n\nRule of thumb: rank in a CTE, filter `rnk \u003C= N` — pick `ROW_NUMBER` for an exact N,\n`DENSE_RANK` to keep ties.\n",{"id":1685,"difficulty":106,"q":1686,"a":1687},"nth-highest-value","How do you find the Nth highest value using ranking functions?","Rank the rows descending, then filter for the Nth rank. Use `DENSE_RANK` when you\nwant the Nth **distinct** value (so duplicate values count once).\n\n```sql\n-- the 3rd highest distinct salary\nWITH ranked AS (\n    SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk\n    FROM employees\n)\nSELECT DISTINCT salary FROM ranked WHERE rnk = 3;\n```\n\nWith `ROW_NUMBER` you'd get the 3rd row, not the 3rd distinct value; with `RANK`\nyou'd risk gaps. `DENSE_RANK` is the safe choice for \"Nth highest distinct.\"\n\nRule of thumb: for the Nth highest distinct value, `DENSE_RANK() ... = N`.\n",{"id":1689,"difficulty":106,"q":1690,"a":1691},"deduplication-with-row-number","How do you remove duplicate rows using ROW_NUMBER()?","Partition by the columns that define a duplicate, number the rows, and keep only\n`rn = 1` (deleting or excluding the rest).\n\n```sql\n-- keep the most recent record per email, drop older duplicates\nWITH ranked AS (\n    SELECT id, ROW_NUMBER() OVER (PARTITION BY email\n                                  ORDER BY created_at DESC) AS rn\n    FROM users\n)\nDELETE FROM users\nWHERE id IN (SELECT id FROM ranked WHERE rn > 1);\n```\n\nThe `ORDER BY` decides **which** duplicate is the \"keeper\" (e.g. newest).\n\nRule of thumb: `ROW_NUMBER()` partitioned by the dup key, keep `rn = 1`, remove\n`rn > 1` — the standard dedup pattern.\n",{"id":1693,"difficulty":106,"q":1694,"a":1695},"pagination-with-row-number","How can ROW_NUMBER() be used for pagination?","Number the ordered rows, then select the slice for a page. This was the classic\npagination method before `OFFSET`\u002F`FETCH` and is still used in SQL Server pre-2012.\n\n```sql\n-- page 2, 10 rows per page (rows 11–20)\nWITH ordered AS (\n    SELECT *, ROW_NUMBER() OVER (ORDER BY created_at DESC) AS rn\n    FROM products\n)\nSELECT * FROM ordered WHERE rn BETWEEN 11 AND 20;\n```\n\nModern engines often prefer `ORDER BY ... LIMIT 10 OFFSET 10`, but `ROW_NUMBER`\npagination works everywhere and pairs well with deterministic ordering. Note both\nget slow at deep offsets — keyset pagination scales better.\n\nRule of thumb: `ROW_NUMBER` + `BETWEEN` slices pages; for large datasets prefer\nkeyset pagination over deep offsets.\n",{"id":1697,"difficulty":106,"q":1698,"a":1699},"ranking-tie-breaking","How do you make ROW_NUMBER() deterministic when ordering values tie?","`ROW_NUMBER()` always produces unique numbers, but when the `ORDER BY` values tie,\nthe assignment **among tied rows is arbitrary** and can change between runs. Add a\n**tie-breaker** column (ideally a unique key) to the `ORDER BY` to make it\ndeterministic.\n\n```sql\nROW_NUMBER() OVER (ORDER BY salary DESC, id ASC)  -- id breaks ties stably\n```\n\nWithout the tie-break, paginating or deduplicating can return inconsistent results\nacross executions.\n\nRule of thumb: append a unique column to the window `ORDER BY` so tied rows get a\nstable, repeatable order.\n",{"id":1701,"difficulty":114,"q":1702,"a":1703},"rank-with-partition","How does PARTITION BY affect ranking functions?","`PARTITION BY` makes the rank **restart at 1 for each group**. Without it, ranking\nruns across the entire result set as one partition.\n\n```sql\n-- rank salaries WITHIN each department (resets per dept)\nSELECT name, dept_id, salary,\n       RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank\nFROM   employees;\n```\n\nSo an employee can be rank 1 in their department even if they're not the\nhighest-paid company-wide.\n\nRule of thumb: add `PARTITION BY` to rank within groups (rank resets per group);\nomit it to rank globally.\n",{"id":1705,"difficulty":106,"q":1706,"a":1707},"filtering-on-rank","Why can't you filter directly on a ranking function in WHERE?","Window functions — including ranking functions — are evaluated **after** `WHERE`,\nso the rank doesn't exist yet when `WHERE` runs. Referencing it there is an error.\n\n```sql\n-- ILLEGAL\nSELECT name FROM employees\nWHERE RANK() OVER (ORDER BY salary DESC) \u003C= 5;\n\n-- LEGAL: compute in a CTE, filter outside\nWITH r AS (\n    SELECT name, RANK() OVER (ORDER BY salary DESC) AS rnk FROM employees\n)\nSELECT name FROM r WHERE rnk \u003C= 5;\n```\n\nRule of thumb: ranks are computed after filtering — always wrap them in a CTE or\nsubquery before filtering.\n",{"id":1709,"difficulty":127,"q":1710,"a":1711},"ntile-uneven-buckets","What happens with NTILE() when rows don't divide evenly?","When the row count isn't divisible by `n`, `NTILE` makes the **first buckets one\nrow larger** than the later ones. For example, 10 rows into `NTILE(3)` gives\nbuckets of sizes **4, 3, 3**.\n\n```sql\n-- 10 rows, NTILE(3): bucket 1 has 4 rows, buckets 2 and 3 have 3 each\nSELECT val, NTILE(3) OVER (ORDER BY val) AS bucket FROM nums;\n```\n\nThis guarantees buckets differ in size by at most one, with the extras front-loaded.\n\nRule of thumb: `NTILE` front-loads the remainder — earlier buckets get the extra\nrows when the count doesn't divide evenly.\n",{"id":1713,"difficulty":106,"q":1714,"a":1715},"choosing-ranking-function","How do you choose between ROW_NUMBER, RANK, and DENSE_RANK?","Pick based on **how you want ties handled**:\n\n- Need **exactly one row** per position (pagination, dedup, \"the single latest\")\n  → `ROW_NUMBER()`.\n- Ties should **share a rank with gaps** (standings where 2 golds means no silver)\n  → `RANK()`.\n- Ties should **share a rank without gaps** (Nth distinct value, dense tiers)\n  → `DENSE_RANK()`.\n\n```sql\nROW_NUMBER() -- 1,2,3,4   (unique)\nRANK()       -- 1,1,3,4   (gap after tie)\nDENSE_RANK() -- 1,1,2,3   (no gap)\n```\n\nRule of thumb: unique → `ROW_NUMBER`; ties-with-gaps → `RANK`; ties-no-gaps →\n`DENSE_RANK`.\n",{"id":1717,"difficulty":127,"q":1718,"a":1719},"median-with-percentile","How can ranking\u002Fdistribution functions help compute a median?","A median is the value at the 50th percentile. You can approximate it with\n`CUME_DIST`\u002F`PERCENT_RANK`, but most databases offer the dedicated **ordered-set\naggregate** `PERCENTILE_CONT`\u002F`PERCENTILE_DISC` (a `WITHIN GROUP` function, related\nto window analytics).\n\n```sql\n-- exact continuous median per department (PostgreSQL \u002F Oracle \u002F SQL Server)\nSELECT dept_id,\n       PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary\nFROM   employees\nGROUP  BY dept_id;\n```\n\n`PERCENTILE_CONT` interpolates between rows; `PERCENTILE_DISC` returns an actual\ndata value.\n\nRule of thumb: for a median, prefer `PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY\n...)` over hand-rolling it from rank functions.\n",{"description":104},"SQL ranking function interview questions — ROW_NUMBER vs RANK vs DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST, top-N-per-group, and deduplication patterns.","sql\u002Fwindow-functions\u002Franking-functions","XP3lJpDpG6os_ZieA-3c5KBo1W1xOy7nVJY4ZYgYt8g",{"id":1725,"title":1726,"body":1727,"description":104,"difficulty":114,"extension":107,"framework":10,"frameworkSlug":12,"meta":1731,"navigation":109,"order":39,"path":1732,"questions":1733,"questionsCount":553,"related":247,"seo":1798,"seoDescription":1799,"stem":1800,"subtopic":1801,"topic":21,"topicSlug":22,"updated":328,"__hash__":1802},"qa\u002Fsql\u002Fbasics\u002Fsorting-limiting.md","Sorting Limiting",{"type":101,"value":1728,"toc":1729},[],{"title":104,"searchDepth":30,"depth":30,"links":1730},[],{},"\u002Fsql\u002Fbasics\u002Fsorting-limiting",[1734,1738,1742,1746,1750,1754,1758,1762,1766,1770,1774,1778,1782,1786,1790,1794],{"id":1735,"difficulty":114,"q":1736,"a":1737},"order-by-basics","What does ORDER BY do?","`ORDER BY` **sorts the result set** by one or more columns or expressions. It runs\n**last** in the logical pipeline (just before `LIMIT`), so it can reference `SELECT`\naliases. Without it, row order is **not guaranteed** — the database may return rows\nin any order.\n\n```sql\nSELECT name, created_at\nFROM users\nORDER BY created_at DESC;   -- newest first\n```\n\nSorting happens after filtering and grouping, on the final rows. Large unindexed\nsorts can be expensive because the whole result must be ordered.\n\nRule of thumb: if you care about row order, you must say so with `ORDER BY` — never\nrely on insertion order.\n",{"id":1739,"difficulty":114,"q":1740,"a":1741},"asc-desc","How do you control sort direction?","Append `ASC` (ascending, the default) or `DESC` (descending) to each sort key.\nDirection is **per column**, so you can mix them.\n\n```sql\nSELECT name, age\nFROM users\nORDER BY age DESC, name ASC;   -- oldest first; ties broken alphabetically\n```\n\n`ASC` is implicit, so `ORDER BY age` sorts ascending. Each key after the first only\nbreaks ties left by the preceding keys.\n\nRule of thumb: list sort keys from most significant to least; each one is a\ntie-breaker for the ones before it.\n",{"id":1743,"difficulty":114,"q":1744,"a":1745},"order-by-multiple","How do you sort by multiple columns?","List the columns comma-separated; the database sorts by the **first**, then breaks\nties with the **second**, and so on. Order of the keys matters.\n\n```sql\nSELECT department, salary, name\nFROM employees\nORDER BY department ASC, salary DESC;  -- group by dept, then highest paid first\n```\n\nThis is how you get \"grouped\" looking output without aggregation — rows for the same\ndepartment sit together, ordered within the department.\n\nRule of thumb: put the column you want grouped together first, the tie-breaker\nsecond.\n",{"id":1747,"difficulty":106,"q":1748,"a":1749},"order-by-expression","Can you sort by an expression or alias?","Yes. Because `ORDER BY` runs after `SELECT`, you can sort by a **computed\nexpression**, a **column alias**, or even a **column position number** (`ORDER BY 2`).\n\n```sql\nSELECT name, price * quantity AS total\nFROM order_items\nORDER BY total DESC;        -- alias works here (unlike in WHERE)\n```\n\nSorting by position (`ORDER BY 2`) is terse but fragile — reordering the `SELECT`\nlist silently changes the sort. Prefer aliases for clarity.\n\nRule of thumb: sort by alias for readability; avoid positional `ORDER BY` numbers\nin production code.\n",{"id":1751,"difficulty":106,"q":1752,"a":1753},"null-ordering","Where do NULLs sort in ORDER BY?","It's **dialect-dependent**. Postgres\u002FOracle sort `NULL`s **last** in `ASC` (first in\n`DESC`); MySQL\u002FSQL Server sort them **first** in `ASC`. The SQL standard lets you\ncontrol it with `NULLS FIRST` \u002F `NULLS LAST`.\n\n```sql\n-- Postgres\u002FOracle: force NULLs to the bottom regardless of direction\nSELECT name, last_login\nFROM users\nORDER BY last_login DESC NULLS LAST;\n```\n\nMySQL lacks `NULLS LAST`, so you emulate it: `ORDER BY last_login IS NULL,\nlast_login DESC`.\n\nRule of thumb: if NULLs matter in your sort, state `NULLS FIRST\u002FLAST` explicitly\nrather than trusting the default.\n",{"id":1755,"difficulty":114,"q":1756,"a":1757},"limit-clause","How do you limit the number of rows returned?","Use `LIMIT n` (Postgres\u002FMySQL\u002FSQLite), `FETCH FIRST n ROWS ONLY` (SQL standard \u002F\nOracle \u002F DB2), or `SELECT TOP n` (SQL Server). It caps the result to the first `n`\nrows **after** ordering.\n\n```sql\nSELECT name, score\nFROM players\nORDER BY score DESC\nLIMIT 10;                  -- top 10 scorers\n```\n\n`LIMIT` is applied last, so it works on the sorted result. A `LIMIT` with no\n`ORDER BY` returns an arbitrary subset.\n\nRule of thumb: always pair `LIMIT` with `ORDER BY` for a deterministic \"top N\".\n",{"id":1759,"difficulty":106,"q":1760,"a":1761},"offset-pagination","How do you paginate results with OFFSET?","`LIMIT n OFFSET m` skips the first `m` rows and returns the next `n` — the classic\npage query. Page `p` (1-based, page size `s`) uses `OFFSET (p - 1) * s`.\n\n```sql\n-- page 3, 20 rows per page  ->  skip 40, take 20\nSELECT id, title\nFROM articles\nORDER BY published_at DESC, id DESC\nLIMIT 20 OFFSET 40;\n```\n\nAlways order by something **unique** (add `id` as a tie-breaker) so rows don't shift\nbetween pages. `OFFSET` still **scans and discards** the skipped rows, so deep pages\nget slow.\n\nRule of thumb: `OFFSET` is fine for early pages; for deep pagination use keyset\npagination.\n",{"id":1763,"difficulty":127,"q":1764,"a":1765},"keyset-pagination","What is keyset (cursor) pagination and why is it faster?","Keyset pagination fetches the next page by **filtering past the last row seen**\ninstead of counting an offset. The database jumps straight to the position via an\nindex, so cost stays constant no matter how deep you page.\n\n```sql\n-- next page after the last (published_at, id) you saw\nSELECT id, title, published_at\nFROM articles\nWHERE (published_at, id) \u003C ('2026-06-01 10:00', 5000)\nORDER BY published_at DESC, id DESC\nLIMIT 20;\n```\n\nUnlike `OFFSET`, it doesn't scan-and-throw-away earlier rows, and it's stable when\nrows are inserted. The trade-off: you can't jump to an arbitrary page number.\n\nRule of thumb: use keyset pagination for infinite scroll \u002F deep pages; `OFFSET`\nonly for shallow, numbered pages.\n",{"id":1767,"difficulty":127,"q":1768,"a":1769},"offset-performance","Why does OFFSET get slow for deep pages?","`OFFSET m` doesn't skip rows for free — the database must **generate and discard**\nall `m` preceding rows to reach the page. At `OFFSET 1000000`, it produces a million\nrows just to throw them away.\n\n```sql\n-- reads ~100020 rows, returns 20 — the 100000 are wasted work\nSELECT * FROM events ORDER BY created_at LIMIT 20 OFFSET 100000;\n```\n\nCost grows linearly with the offset. Keyset pagination avoids this by seeking\ndirectly to the boundary with an indexed `WHERE`.\n\nRule of thumb: if page numbers reach the thousands, switch from `OFFSET` to keyset\npagination.\n",{"id":1771,"difficulty":127,"q":1772,"a":1773},"top-n-per-group-offset","How do you get the top N rows per group?","A plain `LIMIT` caps the **whole** result, not per group. To get the top N **within\neach group**, rank rows inside each partition with a window function and filter.\n\n```sql\n-- top 3 highest-paid employees per department\nSELECT *\nFROM (\n  SELECT name, department, salary,\n         ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn\n  FROM employees\n) ranked\nWHERE rn \u003C= 3;\n```\n\nUse `ROW_NUMBER` for an exact N, `RANK`\u002F`DENSE_RANK` to include ties. Postgres also\noffers `LATERAL` joins for the same job.\n\nRule of thumb: \"top N per group\" means a window function, not `LIMIT`.\n",{"id":1775,"difficulty":106,"q":1776,"a":1777},"order-by-case","How do you sort by a custom order?","Use a `CASE` expression (or a lookup) in `ORDER BY` to map values to a sort rank —\nhandy for non-alphabetical orderings like priority levels.\n\n```sql\nSELECT title, priority\nFROM tickets\nORDER BY CASE priority\n           WHEN 'high'   THEN 1\n           WHEN 'medium' THEN 2\n           WHEN 'low'    THEN 3\n         END;\n```\n\nWithout this, `ORDER BY priority` would sort alphabetically (`high`, `low`,\n`medium`) — rarely what you want. MySQL also has `FIELD()` as a shortcut.\n\nRule of thumb: encode custom sort orders with a `CASE` rank in `ORDER BY`.\n",{"id":1779,"difficulty":106,"q":1780,"a":1781},"stable-sort","Why should you add a unique tie-breaker to ORDER BY?","When the sort keys aren't unique, rows with equal keys can come back in **any\norder**, and that order may differ between runs or pages. Adding a unique\ntie-breaker (like the primary key) makes the sort **deterministic**.\n\n```sql\n-- created_at ties resolved consistently by id\nSELECT * FROM orders\nORDER BY created_at DESC, id DESC\nLIMIT 20;\n```\n\nThis matters most for pagination: without a stable order, the same row can appear on\ntwo pages or be skipped.\n\nRule of thumb: end every paginated `ORDER BY` with a unique column.\n",{"id":1783,"difficulty":127,"q":1784,"a":1785},"limit-with-ties","How do you include rows tied with the Nth row?","A plain `LIMIT` cuts off at exactly N rows, even if the (N+1)th ties the Nth. The\nstandard `FETCH FIRST n ROWS WITH TIES` (Postgres 13+, SQL Server, Oracle) keeps all\ntied rows.\n\n```sql\n-- the top 3 scores, plus anyone tied with 3rd place\nSELECT name, score\nFROM players\nORDER BY score DESC\nFETCH FIRST 3 ROWS WITH TIES;\n```\n\nIt requires an `ORDER BY` (ties are defined by it). Without `WITH TIES`, you'd\narbitrarily drop one of the tied players.\n\nRule of thumb: use `WITH TIES` when \"top N\" should never split a tie.\n",{"id":1787,"difficulty":127,"q":1788,"a":1789},"distinct-on","How do you get one row per group (Postgres DISTINCT ON)?","Postgres's `DISTINCT ON (cols)` keeps the **first row per group** as defined by\n`ORDER BY`. It's a concise way to grab the latest\u002Flargest row per key without a\nwindow function.\n\n```sql\n-- the most recent order per user\nSELECT DISTINCT ON (user_id) user_id, id, created_at\nFROM orders\nORDER BY user_id, created_at DESC;\n```\n\nThe leading `ORDER BY` columns **must** start with the `DISTINCT ON` columns. Other\ndatabases achieve this with `ROW_NUMBER() = 1`.\n\nRule of thumb: `DISTINCT ON` is Postgres shorthand for \"latest row per group.\"\n",{"id":1791,"difficulty":127,"q":1792,"a":1793},"collation","What is collation and how does it affect sorting?","A **collation** defines the rules for comparing and ordering text — case sensitivity,\naccent handling, and locale-specific alphabet order. Two databases with different\ncollations can sort the same strings differently.\n\n```sql\n-- force a specific collation for this sort (Postgres)\nSELECT name FROM users ORDER BY name COLLATE \"de-DE-x-icu\";\n```\n\nCollation affects `ORDER BY`, `=`\u002F`LIKE` comparisons, and unique constraints. A\nmismatch between two columns being compared can even cause errors or prevent index\nuse.\n\nRule of thumb: set collation deliberately when sorting human-readable text across\nlocales.\n",{"id":1795,"difficulty":106,"q":1796,"a":1797},"random-order","How do you return rows in random order?","Order by a random function: `ORDER BY RANDOM()` (Postgres\u002FSQLite) or `ORDER BY\nRAND()` (MySQL). Combined with `LIMIT`, it samples random rows.\n\n```sql\nSELECT * FROM questions\nORDER BY RANDOM()\nLIMIT 5;                 -- 5 random questions\n```\n\nThis sorts the **entire table** by a random value first, so it's slow on large\ntables. For big tables, sample with a `WHERE random() \u003C 0.01` pre-filter or use\n`TABLESAMPLE`.\n\nRule of thumb: `ORDER BY RANDOM()` is fine for small tables; sample first on large\nones.\n",{"description":104},"SQL ORDER BY and LIMIT interview questions — sorting direction, NULL ordering, pagination with OFFSET, keyset pagination and dialect differences.","sql\u002Fbasics\u002Fsorting-limiting","Sorting & Limiting","IlsiDok4lcx1ZfIwq03O5MLmxe-f0biY8gYoImgk4wI",{"id":1804,"title":1805,"body":1806,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":1810,"navigation":109,"order":39,"path":1811,"questions":1812,"questionsCount":323,"related":247,"seo":1873,"seoDescription":1874,"stem":1875,"subtopic":1876,"topic":82,"topicSlug":84,"updated":328,"__hash__":1877},"qa\u002Fsql\u002Ffunctions\u002Fconditional-null-functions.md","Conditional Null Functions",{"type":101,"value":1807,"toc":1808},[],{"title":104,"searchDepth":30,"depth":30,"links":1809},[],{},"\u002Fsql\u002Ffunctions\u002Fconditional-null-functions",[1813,1817,1821,1825,1829,1833,1837,1841,1845,1849,1853,1857,1861,1865,1869],{"id":1814,"difficulty":114,"q":1815,"a":1816},"case-expression","What is the CASE expression and what are its two forms?","`CASE` is SQL's conditional expression — an inline if\u002Felse that returns a\nvalue. It has two forms:\n\n**Simple CASE** — compares one expression to several values:\n\n```sql\nSELECT order_id,\n       CASE status\n         WHEN 'pending'   THEN 'Awaiting payment'\n         WHEN 'shipped'   THEN 'On its way'\n         WHEN 'delivered' THEN 'Complete'\n         ELSE 'Unknown'\n       END AS status_label\nFROM orders;\n```\n\n**Searched CASE** — evaluates independent boolean conditions:\n\n```sql\nSELECT order_id, total,\n       CASE\n         WHEN total >= 500 THEN 'Large'\n         WHEN total >= 100 THEN 'Medium'\n         ELSE 'Small'\n       END AS size_category\nFROM orders;\n```\n\n`CASE` can appear anywhere an expression is valid: `SELECT`, `WHERE`,\n`ORDER BY`, `GROUP BY`, inside aggregate functions.\n\n**Rule of thumb:** use the simple form for equality checks on a single\ncolumn; use the searched form when conditions involve different columns,\ncomparisons, or `IS NULL` checks.\n",{"id":1818,"difficulty":114,"q":1819,"a":1820},"coalesce","What does COALESCE do?","`COALESCE(expr1, expr2, …)` returns the **first non-NULL value** from its\nargument list. It is the standard way to substitute a default for a NULL.\n\n```sql\n-- Use a fallback when a column might be NULL\nSELECT COALESCE(phone, 'N\u002FA')          AS phone    FROM users;\nSELECT COALESCE(discount, 0)           AS discount FROM orders;\nSELECT COALESCE(nickname, first_name)  AS display  FROM users;\n\n-- Chain multiple fallbacks\nSELECT COALESCE(preferred_email, work_email, personal_email, 'no-email@unknown.com')\nFROM contacts;\n\n-- Avoid NULL in arithmetic (NULL + anything = NULL)\nSELECT price * COALESCE(quantity, 0) AS line_total FROM cart_items;\n```\n\n`COALESCE` short-circuits: it stops evaluating arguments as soon as it\nfinds a non-NULL value. All arguments must be type-compatible.\n\n**Rule of thumb:** use `COALESCE` to provide defaults for nullable columns.\nIt is cleaner and more portable than `CASE WHEN col IS NULL THEN default\nELSE col END` — they are exactly equivalent.\n",{"id":1822,"difficulty":106,"q":1823,"a":1824},"nullif","What does NULLIF do and when is it useful?","`NULLIF(expr1, expr2)` returns `NULL` if the two expressions are equal,\notherwise returns `expr1`. It is the inverse of `COALESCE` — converting a\nspecific value to `NULL`.\n\n```sql\n-- Prevent division-by-zero: replace 0 with NULL before dividing\nSELECT numerator \u002F NULLIF(denominator, 0) AS ratio FROM metrics;\n-- If denominator = 0 → NULL instead of ERROR\n\n-- Convert a sentinel value to NULL\nSELECT NULLIF(phone, 'N\u002FA') AS phone FROM contacts;\n-- 'N\u002FA' → NULL; other values pass through unchanged\n\n-- Combine with COALESCE for clean defaults\nSELECT COALESCE(NULLIF(TRIM(notes), ''), 'No notes') AS notes FROM tickets;\n-- Empty-string or whitespace-only → 'No notes'\n```\n\n**Rule of thumb:** reach for `NULLIF(col, 0)` any time you are dividing\nby a column that may be zero. Pair with `COALESCE` when you also want to\nreplace the resulting NULL with a default.\n",{"id":1826,"difficulty":114,"q":1827,"a":1828},"ifnull-nvl-isnull","What are IFNULL, NVL, and ISNULL — and how do they compare to COALESCE?","These are two-argument shortcuts for replacing NULL with a default — they\nare equivalent to `COALESCE(expr, default)` but are database-specific:\n\n| Function | Database |\n|---|---|\n| `IFNULL(expr, default)` | MySQL |\n| `NVL(expr, default)` | Oracle |\n| `ISNULL(expr, default)` | SQL Server \u002F Sybase |\n| `COALESCE(expr, default)` | All (ANSI SQL) |\n\n```sql\n-- MySQL\nSELECT IFNULL(discount, 0) FROM orders;\n\n-- SQL Server\nSELECT ISNULL(discount, 0) FROM orders;\n\n-- COALESCE — works everywhere and accepts more than two arguments\nSELECT COALESCE(discount, 0) FROM orders;\n```\n\n**Rule of thumb:** use `COALESCE` in any code that needs to be portable.\nUse `IFNULL`\u002F`ISNULL` only in database-specific stored procedures where\nportability is not a concern. Avoid `NVL` outside Oracle.\n",{"id":1830,"difficulty":114,"q":1831,"a":1832},"iif","What is IIF and when can you use it?","`IIF(condition, true_value, false_value)` is a compact inline if\u002Felse\navailable in **SQL Server** and **MySQL** (as `IF`). It is syntactic sugar\nfor a two-branch `CASE`.\n\n```sql\n-- SQL Server IIF\nSELECT order_id,\n       IIF(total > 100, 'Large', 'Small') AS size\nFROM orders;\n\n-- MySQL IF (same concept, different name)\nSELECT order_id,\n       IF(total > 100, 'Large', 'Small') AS size\nFROM orders;\n\n-- Equivalent CASE (works everywhere)\nSELECT order_id,\n       CASE WHEN total > 100 THEN 'Large' ELSE 'Small' END AS size\nFROM orders;\n```\n\n**Rule of thumb:** use `CASE` for portability. Use `IIF` (SQL Server) or\n`IF` (MySQL) in database-specific scripts where you prefer the terser\nsyntax, but be aware that porting the query later requires rewriting it.\n",{"id":1834,"difficulty":106,"q":1835,"a":1836},"null-in-comparisons","Why do comparisons with NULL often return unexpected results?","`NULL` represents an unknown value. Any comparison involving `NULL` evaluates\nto **UNKNOWN** (not TRUE or FALSE) under SQL's three-valued logic. Since\n`WHERE` clauses only keep rows where the condition is TRUE, rows with NULL\nin a comparison are silently excluded.\n\n```sql\n-- All of these evaluate to UNKNOWN, not TRUE:\nSELECT NULL = NULL;      -- UNKNOWN\nSELECT NULL \u003C> 1;        -- UNKNOWN\nSELECT NULL > 0;         -- UNKNOWN\nSELECT 1 = NULL;         -- UNKNOWN\n\n-- Correct tests for NULL\nSELECT NULL IS NULL;     -- TRUE\nSELECT NULL IS NOT NULL; -- FALSE\n\n-- Common mistake: missing IS NULL rows\nSELECT * FROM orders WHERE discount \u003C> 0;\n-- Excludes rows where discount IS NULL — those rows have unknown discount\n\n-- Fix: explicitly include the NULL case\nSELECT * FROM orders WHERE discount \u003C> 0 OR discount IS NULL;\n```\n\n**Rule of thumb:** whenever a column may be NULL, check whether your\n`WHERE` clause correctly handles it. Any condition using `=`, `\u003C>`, `>`,\n`\u003C`, `LIKE`, or `IN` silently drops NULL rows.\n",{"id":1838,"difficulty":106,"q":1839,"a":1840},"null-in-aggregates","How does NULL affect aggregate functions?","All aggregate functions (except `COUNT(*)`) **ignore NULL values** in their\ninput. This is usually what you want, but it can produce surprising results.\n\n```sql\n-- Setup: discount column has some NULLs\n-- | id | total | discount |\n-- | 1  | 100   | 10       |\n-- | 2  | 200   | NULL     |\n-- | 3  | 300   | 20       |\n\nSELECT COUNT(*)        AS total_rows,       -- → 3 (counts all rows)\n       COUNT(discount) AS non_null_discount, -- → 2 (skips NULLs)\n       SUM(discount)   AS total_discount,    -- → 30 (NULLs ignored)\n       AVG(discount)   AS avg_discount       -- → 15 (30 \u002F 2, not 30 \u002F 3!)\nFROM orders;\n-- AVG denominator is the COUNT of non-NULL rows, not total rows.\n\n-- Fix: use COALESCE to treat NULL as 0 in AVG\nSELECT AVG(COALESCE(discount, 0)) AS avg_discount FROM orders; -- → 10\n```\n\n**Rule of thumb:** `AVG(col)` divides by the count of non-NULL values —\nthis can silently exclude missing data from the average. When NULLs should\nbe treated as 0 in averages, use `AVG(COALESCE(col, 0))`.\n",{"id":1842,"difficulty":106,"q":1843,"a":1844},"conditional-aggregation","What is conditional aggregation and how do you use it?","**Conditional aggregation** uses a `CASE` expression inside an aggregate\nfunction to count, sum, or average only rows matching a condition — pivoting\nrow data into columns without a JOIN.\n\n```sql\n-- Count orders by status in a single pass\nSELECT COUNT(*)                                         AS total,\n       COUNT(CASE WHEN status = 'pending'   THEN 1 END) AS pending,\n       COUNT(CASE WHEN status = 'shipped'   THEN 1 END) AS shipped,\n       COUNT(CASE WHEN status = 'delivered' THEN 1 END) AS delivered,\n       SUM(CASE WHEN status = 'pending' THEN total ELSE 0 END) AS pending_revenue\nFROM orders;\n\n-- Postgres \u002F SQL Server shorthand: FILTER clause\nSELECT COUNT(*) FILTER (WHERE status = 'pending')   AS pending,\n       COUNT(*) FILTER (WHERE status = 'shipped')   AS shipped,\n       SUM(total) FILTER (WHERE status = 'pending') AS pending_revenue\nFROM orders;\n```\n\n**Rule of thumb:** use conditional aggregation to pivot data in a single\nquery instead of multiple subqueries or UNION. The `FILTER` clause\n(Postgres\u002FSQL Server) is cleaner than `CASE WHEN … THEN 1 END` — prefer\nit when available.\n",{"id":1846,"difficulty":114,"q":1847,"a":1848},"greatest-least","What are GREATEST and LEAST?","`GREATEST(val1, val2, …)` returns the largest value from its arguments.\n`LEAST(val1, val2, …)` returns the smallest. They work across any\ncomparable types and return NULL if any argument is NULL (Postgres\u002FMySQL);\nSQL Server does not have these functions natively.\n\n```sql\nSELECT GREATEST(10, 20, 5);          -- → 20\nSELECT LEAST(10, 20, 5);             -- → 5\nSELECT GREATEST(NULL, 10, 20);       -- → NULL (any NULL propagates)\n\n-- Practical: clamp a value within a min\u002Fmax range\nSELECT LEAST(GREATEST(user_rating, 1), 5) AS clamped_rating FROM reviews;\n-- Ensures rating is always between 1 and 5\n\n-- Use COALESCE to handle NULLs when comparing columns\nSELECT GREATEST(COALESCE(a, 0), COALESCE(b, 0)) AS max_ab FROM t;\n```\n\nSQL Server equivalent using CASE:\n```sql\nSELECT CASE WHEN a > b THEN a ELSE b END AS greatest_ab FROM t;\n```\n\n**Rule of thumb:** use `GREATEST`\u002F`LEAST` for clamping values to a range\nor comparing columns from the same row — they are cleaner than nested\n`CASE` expressions for this pattern.\n",{"id":1850,"difficulty":106,"q":1851,"a":1852},"null-safe-equals","How do you compare two values that may both be NULL?","Standard `=` returns UNKNOWN when either side is NULL. To check whether two\nnullable columns are \"equal\" (including both being NULL), use database-\nspecific NULL-safe equality.\n\n```sql\n-- Standard SQL: verbose but portable\nSELECT *\nFROM   t1\nJOIN   t2 ON (t1.col = t2.col) OR (t1.col IS NULL AND t2.col IS NULL);\n\n-- Postgres: IS NOT DISTINCT FROM (NULL-safe equality)\nSELECT * FROM t1 JOIN t2 ON t1.col IS NOT DISTINCT FROM t2.col;\n-- NULL IS NOT DISTINCT FROM NULL → TRUE\n-- 1   IS NOT DISTINCT FROM 1    → TRUE\n-- 1   IS NOT DISTINCT FROM NULL → FALSE\n\n-- MySQL: \u003C=> (spaceship operator)\nSELECT * FROM t1 JOIN t2 ON t1.col \u003C=> t2.col;\n-- NULL \u003C=> NULL → 1 (TRUE)\n\n-- SQL Server: no shorthand — use the verbose ANSI form\n```\n\n**Rule of thumb:** use `IS NOT DISTINCT FROM` (Postgres) or `\u003C=>` (MySQL)\nwhen joining or comparing nullable columns where two NULLs should be\nconsidered equal. This is common in upsert logic and change-detection\nqueries.\n",{"id":1854,"difficulty":127,"q":1855,"a":1856},"boolean-logic-nulls","How does three-valued logic affect AND and OR with NULLs?","SQL uses three truth values: TRUE, FALSE, and UNKNOWN (NULL). `AND` and\n`OR` follow specific rules when UNKNOWN is involved:\n\n```\nTRUE  AND UNKNOWN = UNKNOWN\nFALSE AND UNKNOWN = FALSE     ← FALSE wins in AND\nTRUE  OR  UNKNOWN = TRUE      ← TRUE wins in OR\nFALSE OR  UNKNOWN = UNKNOWN\nNOT UNKNOWN       = UNKNOWN\n```\n\n```sql\n-- Pitfall: NOT IN with a NULL in the subquery\nSELECT * FROM products\nWHERE  id NOT IN (SELECT product_id FROM discontinued WHERE product_id IS NULL);\n-- If the subquery returns even one NULL, NOT IN always returns UNKNOWN\n-- → zero rows returned! (UNKNOWN is treated as FALSE in WHERE)\n\n-- Fix: use NOT EXISTS instead\nSELECT * FROM products p\nWHERE NOT EXISTS (\n  SELECT 1 FROM discontinued d WHERE d.product_id = p.id\n);\n-- NOT EXISTS correctly handles NULLs — returns TRUE if no match found\n```\n\n**Rule of thumb:** never use `NOT IN (subquery)` when the subquery can\nreturn NULLs — it silently returns zero rows. Always use `NOT EXISTS` as\nthe safe alternative.\n",{"id":1858,"difficulty":114,"q":1859,"a":1860},"decode-decode","How do you handle multiple conditional mappings cleanly?","For mapping one value to another across many cases, `CASE` is the standard\napproach. Some databases also offer compact alternatives.\n\n```sql\n-- Standard CASE — readable and portable\nSELECT status,\n       CASE status\n         WHEN 1 THEN 'Active'\n         WHEN 2 THEN 'Inactive'\n         WHEN 3 THEN 'Banned'\n         ELSE        'Unknown'\n       END AS status_label\nFROM users;\n\n-- Oracle DECODE (not available in other databases)\nSELECT DECODE(status, 1, 'Active', 2, 'Inactive', 3, 'Banned', 'Unknown')\nFROM users;\n\n-- Postgres: use a lookup table or CASE — no DECODE\n-- Alternative: join to a status lookup table (preferred for long lists)\nSELECT u.id, s.label\nFROM users u\nJOIN status_codes s ON s.code = u.status;\n```\n\n**Rule of thumb:** for short, stable mappings (\u003C 6 values), use `CASE`.\nFor longer or changeable mappings, maintain a lookup\u002Freference table and\njoin to it — the mapping is then data, not code, and can be updated without\na schema change.\n",{"id":1862,"difficulty":106,"q":1863,"a":1864},"try-convert","How do you safely cast a string to a number or date without raising an error?","If a string column contains non-numeric values and you try to `CAST` it to\na number, the query fails. SQL Server and MySQL offer safe-cast functions\nthat return NULL on failure. Postgres requires a different approach.\n\n```sql\n-- SQL Server: TRY_CAST \u002F TRY_CONVERT (return NULL on failure)\nSELECT TRY_CAST('123'   AS INT);        -- → 123\nSELECT TRY_CAST('abc'   AS INT);        -- → NULL (no error)\nSELECT TRY_CONVERT(DATE, '2026-06-20'); -- → '2026-06-20'\nSELECT TRY_CONVERT(DATE, 'not-a-date'); -- → NULL\n\n-- MySQL: CAST silently coerces and produces 0 or NULL\nSELECT CAST('abc' AS UNSIGNED);  -- → 0 (silent coercion, with a warning)\n\n-- Postgres: no TRY_CAST built-in; use regexp check before casting\nSELECT CASE\n         WHEN value ~ '^\\d+$' THEN value::INT\n         ELSE NULL\n       END AS safe_int\nFROM input_data;\n-- Or wrap in a PL\u002FpgSQL function that catches exceptions\n```\n\n**Rule of thumb:** validate and sanitise data at the application boundary\nbefore it enters the database. When cleaning dirty data inside SQL, use\n`TRY_CAST` (SQL Server) or a regex guard (Postgres) to avoid query-aborting\ncast errors.\n",{"id":1866,"difficulty":106,"q":1867,"a":1868},"case-in-order-by","How can you use CASE in ORDER BY to create custom sort orders?","`CASE` inside `ORDER BY` lets you define a custom sort priority that is not\npossible with a simple column sort — for example, putting a specific status\nfirst, sorting NULL values to the bottom, or combining multiple conditions.\n\n```sql\n-- Sort 'pending' orders first, then 'shipped', then all others alphabetically\nSELECT order_id, status\nFROM   orders\nORDER  BY CASE status\n            WHEN 'pending' THEN 1\n            WHEN 'shipped' THEN 2\n            ELSE 3\n          END,\n         status ASC;  -- secondary sort within the ELSE group\n\n-- Sort NULLs to the bottom (Postgres also has NULLS LAST natively)\nSELECT id, priority\nFROM   tasks\nORDER  BY CASE WHEN priority IS NULL THEN 1 ELSE 0 END,\n         priority ASC;\n\n-- Postgres native null ordering (cleaner):\nSELECT id, priority FROM tasks ORDER BY priority ASC NULLS LAST;\n```\n\n**Rule of thumb:** use `CASE` in `ORDER BY` when the sort requirement\ncannot be expressed as `ASC`\u002F`DESC` on existing columns. For `NULL` ordering\nspecifically, prefer `NULLS FIRST` \u002F `NULLS LAST` in databases that support\nit (Postgres, Oracle, SQL Server 2022+) — it is more readable.\n",{"id":1870,"difficulty":127,"q":1871,"a":1872},"pivot-with-case","How do you pivot rows into columns using CASE?","**Pivoting** transforms row values into column headers. SQL lacks a native\n`PIVOT` syntax in all databases, but conditional aggregation with `CASE`\nachieves the same result and is portable.\n\n```sql\n-- Data: (year, quarter, revenue)\n-- Goal: one row per year with columns q1, q2, q3, q4\n\nSELECT year,\n       SUM(CASE WHEN quarter = 1 THEN revenue END) AS q1,\n       SUM(CASE WHEN quarter = 2 THEN revenue END) AS q2,\n       SUM(CASE WHEN quarter = 3 THEN revenue END) AS q3,\n       SUM(CASE WHEN quarter = 4 THEN revenue END) AS q4\nFROM   quarterly_revenue\nGROUP  BY year\nORDER  BY year;\n\n-- SQL Server native PIVOT syntax (not portable):\nSELECT year, [1] AS q1, [2] AS q2, [3] AS q3, [4] AS q4\nFROM   quarterly_revenue\nPIVOT (SUM(revenue) FOR quarter IN ([1],[2],[3],[4])) AS p;\n```\n\n**Rule of thumb:** use `CASE`-based conditional aggregation for portability\nand readability. Use SQL Server's `PIVOT` syntax only in SQL Server-specific\ncode where dynamic pivot (unknown column count) is needed — even then, it\nrequires dynamic SQL to handle a variable number of pivot columns.\n",{"description":104},"SQL conditional and NULL function interview questions — CASE, COALESCE, NULLIF, IIF, GREATEST, LEAST, NVL, NULL handling pitfalls, and conditional aggregation across Postgres, MySQL, and SQL Server.","sql\u002Ffunctions\u002Fconditional-null-functions","Conditional & NULL Functions","dBvP1iGJ3yvGCVGzunS29UEwxdb3VGDQxovXGTlNlQQ",{"id":1879,"title":1880,"body":1881,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":1885,"navigation":109,"order":39,"path":1886,"questions":1887,"questionsCount":323,"related":247,"seo":1948,"seoDescription":1949,"stem":1950,"subtopic":1951,"topic":47,"topicSlug":48,"updated":328,"__hash__":1952},"qa\u002Fsql\u002Fschema\u002Fconstraints.md","Constraints",{"type":101,"value":1882,"toc":1883},[],{"title":104,"searchDepth":30,"depth":30,"links":1884},[],{},"\u002Fsql\u002Fschema\u002Fconstraints",[1888,1892,1896,1900,1904,1908,1912,1916,1920,1924,1928,1932,1936,1940,1944],{"id":1889,"difficulty":114,"q":1890,"a":1891},"what-is-constraint","What is a constraint and why use one?","A **constraint** is a rule enforced by the database engine that limits\nwhat values can be stored in a column or set of columns. Constraints catch\nbad data at write time — before it ever enters the database — so application\ncode never has to defensively re-validate what the schema already guarantees.\n\n```sql\nCREATE TABLE employees (\n  id         INT  GENERATED ALWAYS AS IDENTITY PRIMARY KEY, -- unique, not null\n  email      TEXT NOT NULL UNIQUE,                          -- no nulls, no dupes\n  salary     NUMERIC(10,2) CHECK (salary > 0),             -- must be positive\n  dept_id    INT  REFERENCES departments(id)               -- must exist in parent\n);\n```\n\n**Rule of thumb:** encode every invariant you can in the schema. A constraint\nthat runs in 0 ms at insert time is cheaper than debugging corrupt data in\nproduction.\n",{"id":1893,"difficulty":114,"q":1894,"a":1895},"primary-key","What is a PRIMARY KEY constraint?","A **PRIMARY KEY** uniquely identifies each row in a table. It is a\ncombination of two implied constraints: `UNIQUE` (no two rows share the same\nvalue) and `NOT NULL` (the key column(s) can never be `NULL`). Each table\ncan have **at most one** primary key.\n\n```sql\n-- Single-column PK\nCREATE TABLE users (\n  id   INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n  name TEXT NOT NULL\n);\n\n-- Composite PK (natural key)\nCREATE TABLE order_items (\n  order_id   INT NOT NULL,\n  product_id INT NOT NULL,\n  quantity   INT NOT NULL,\n  PRIMARY KEY (order_id, product_id)\n);\n```\n\nThe database automatically creates a **unique index** on the PK column(s),\nwhich makes lookups by PK fast.\n\n**Rule of thumb:** every table should have a primary key. Prefer a single\nsurrogate integer\u002FUUID PK; use composite PKs for pure join tables\n(many-to-many mappings).\n",{"id":1897,"difficulty":114,"q":1898,"a":1899},"foreign-key","What is a FOREIGN KEY constraint and what does it enforce?","A **FOREIGN KEY** (FK) constraint ensures that every non-NULL value in the\nreferencing column exists in the referenced column of the parent table.\nIt enforces **referential integrity** — you cannot have an `order` that\npoints to a `customer_id` that does not exist.\n\n```sql\nCREATE TABLE orders (\n  id          INT  GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n  customer_id INT  NOT NULL REFERENCES customers(id),\n  total       NUMERIC(10,2) NOT NULL\n);\n\n-- Explicit form with named constraint\nALTER TABLE orders\n  ADD CONSTRAINT fk_orders_customer\n  FOREIGN KEY (customer_id) REFERENCES customers(id);\n```\n\nOn insert, the DB checks that `customer_id` exists in `customers.id`. On\ndelete of a customer row, the FK's **referential action** decides what\nhappens (see CASCADE \u002F RESTRICT question).\n\n**Rule of thumb:** always add FK constraints on foreign-key columns — they\nare the database's guarantee that your data is consistent, not just a\ndocumentation comment.\n",{"id":1901,"difficulty":106,"q":1902,"a":1903},"referential-actions","What are ON DELETE \u002F ON UPDATE referential actions on a foreign key?","Referential actions define what happens to child rows when the parent row\nis deleted or its PK updated:\n\n| Action | Effect on child rows |\n|---|---|\n| `RESTRICT` | Raises an error — cannot delete\u002Fupdate if children exist |\n| `NO ACTION` | Like RESTRICT but checked at end of statement (default) |\n| `CASCADE` | Deletes (or updates) all matching child rows automatically |\n| `SET NULL` | Sets the FK column(s) to `NULL` in child rows |\n| `SET DEFAULT` | Sets the FK column(s) to their default value |\n\n```sql\nCREATE TABLE order_items (\n  id         INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n  order_id   INT NOT NULL\n    REFERENCES orders(id) ON DELETE CASCADE,  -- items gone when order deleted\n  product_id INT NOT NULL\n    REFERENCES products(id) ON DELETE RESTRICT -- block if items still reference product\n);\n```\n\n**Rule of thumb:** use `ON DELETE CASCADE` for owned child records (items,\nline-items, comments). Use `RESTRICT` for shared reference data you never\nwant to accidentally wipe. Avoid `SET NULL` unless `NULL` is semantically\nmeaningful in the child.\n",{"id":1905,"difficulty":114,"q":1906,"a":1907},"unique-constraint","What is a UNIQUE constraint and how does it differ from a PRIMARY KEY?","A **UNIQUE** constraint ensures that no two rows have the same value(s) in\nthe constrained column(s). Unlike `PRIMARY KEY`, a table can have **multiple**\nunique constraints, and unique columns **can** contain `NULL` (Postgres and\nSQL Server treat each `NULL` as distinct; MySQL treats `NULL = NULL`).\n\n```sql\nCREATE TABLE users (\n  id       INT  GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n  email    TEXT NOT NULL UNIQUE,\n  username TEXT NOT NULL UNIQUE,\n  phone    TEXT UNIQUE          -- nullable; two users can have NULL phone\n);\n\n-- Multi-column unique constraint\nALTER TABLE team_members\n  ADD CONSTRAINT uq_team_user UNIQUE (team_id, user_id);\n```\n\nA `UNIQUE` constraint creates a **unique index** automatically, so it also\nspeeds up equality lookups on those columns.\n\n**Rule of thumb:** add `UNIQUE` to every natural business key (email,\nusername, SSN) in addition to the surrogate PK — it is the database's\nguarantee that your deduplication logic is not bypassed.\n",{"id":1909,"difficulty":106,"q":1910,"a":1911},"check-constraint","What is a CHECK constraint and what can it express?","A **CHECK** constraint specifies a boolean expression that every row must\nsatisfy. The insert or update is rejected if the expression evaluates to\n`FALSE`. (`NULL` evaluates to `UNKNOWN` and is allowed through by default.)\n\n```sql\nCREATE TABLE products (\n  id       INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n  price    NUMERIC(10,2) NOT NULL CHECK (price >= 0),\n  discount NUMERIC(5,2)  CHECK (discount BETWEEN 0 AND 100),\n  status   TEXT NOT NULL CHECK (status IN ('draft', 'active', 'archived'))\n);\n\n-- Cross-column check\nALTER TABLE bookings\n  ADD CONSTRAINT chk_dates CHECK (check_out > check_in);\n```\n\nLimitations: `CHECK` expressions cannot reference other tables (use triggers\nor application logic for cross-table validation). Some databases (MySQL pre-8.0)\nparsed but silently ignored `CHECK` constraints.\n\n**Rule of thumb:** use `CHECK` to enforce domain rules on a single row\n(non-negative price, valid status enum, date ordering). Prefer an `ENUM` type\nor a FK to a lookup table when the valid set of values is managed by\nthe business and may change.\n",{"id":1913,"difficulty":114,"q":1914,"a":1915},"not-null-constraint","What does NOT NULL enforce and when should you omit it?","`NOT NULL` prevents the column from storing a `NULL`. An attempt to insert\nor update a row with `NULL` in that column raises an error.\n\n```sql\nCREATE TABLE events (\n  id          BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n  name        TEXT    NOT NULL,            -- must always be present\n  description TEXT,                        -- optional (nullable)\n  occurred_at TIMESTAMPTZ NOT NULL DEFAULT now()\n);\n```\n\nAdd `NOT NULL` when:\n- The value is required for every row to be meaningful.\n- You need aggregate functions (like `SUM`, `AVG`) to behave predictably\n  (`NULL` is skipped by aggregates).\n\nOmit `NOT NULL` (allow `NULL`) when:\n- The value is genuinely optional (\"middle name\", \"description\").\n- You want to distinguish \"not yet provided\" from a default value.\n\n**Rule of thumb:** default to `NOT NULL`; make a column nullable only when\nthe absence of a value has a distinct meaning you intend to query for.\n",{"id":1917,"difficulty":127,"q":1918,"a":1919},"deferrable-constraints","What are deferrable constraints and when do you need them?","By default, constraints are checked **immediately** after each statement.\nA **deferrable constraint** can be postponed until `COMMIT`, which is\nnecessary when two statements in the same transaction temporarily violate\nthe constraint before reaching a consistent state.\n\n```sql\n-- Postgres: declare the FK as deferrable\nALTER TABLE nodes\n  ADD CONSTRAINT fk_parent\n  FOREIGN KEY (parent_id) REFERENCES nodes(id)\n  DEFERRABLE INITIALLY DEFERRED;\n\n-- Now you can insert in any order within a transaction\nBEGIN;\n  INSERT INTO nodes (id, parent_id) VALUES (2, 1);  -- parent 1 doesn't exist yet\n  INSERT INTO nodes (id, parent_id) VALUES (1, NULL);\nCOMMIT;  -- FK checked here — both rows now exist, so it passes\n```\n\nCommon use case: self-referential trees, circular FK graphs, or bulk\nimports where you cannot guarantee insert order.\n\n**Rule of thumb:** use `DEFERRABLE INITIALLY DEFERRED` for FK constraints\nin self-referential or circular relationship tables. Leave constraints\n`NOT DEFERRABLE` by default — immediate checking catches bugs faster.\n",{"id":1921,"difficulty":127,"q":1922,"a":1923},"exclusion-constraint","What is an exclusion constraint in Postgres?","An **exclusion constraint** (Postgres-specific) generalizes `UNIQUE` to\narbitrary operators, not just equality. It ensures that for any two rows,\nat least one of the specified conditions is false. The classic use case is\npreventing **overlapping time ranges**.\n\n```sql\n-- Requires btree_gist extension for mixed operator support\nCREATE EXTENSION IF NOT EXISTS btree_gist;\n\nCREATE TABLE room_bookings (\n  room_id    INT  NOT NULL,\n  reserved   TSTZRANGE NOT NULL,  -- time range type\n  EXCLUDE USING GIST (\n    room_id  WITH =,              -- same room\n    reserved WITH &&              -- overlapping periods\n  )\n);\n\n-- This pair of inserts will succeed\nINSERT INTO room_bookings VALUES (1, '[2026-06-01, 2026-06-03)');\nINSERT INTO room_bookings VALUES (1, '[2026-06-05, 2026-06-07)');\n\n-- This overlaps and will fail the exclusion constraint\nINSERT INTO room_bookings VALUES (1, '[2026-06-02, 2026-06-06)');\n```\n\n**Rule of thumb:** use exclusion constraints for scheduling, resource\nallocation, or any domain where overlapping intervals must be prevented.\nThey are more expressive than triggers for this pattern.\n",{"id":1925,"difficulty":106,"q":1926,"a":1927},"naming-constraints","Why should you name your constraints explicitly?","When you omit a name, the database generates one automatically\n(`orders_customer_id_fkey` in Postgres, a UUID-based name in SQL Server).\nUnnamed constraints are hard to reference in migrations, error messages, and\napplication code.\n\n```sql\n-- Anonymous (avoid)\nALTER TABLE orders ADD FOREIGN KEY (customer_id) REFERENCES customers(id);\n\n-- Named (preferred)\nALTER TABLE orders\n  ADD CONSTRAINT fk_orders_customer\n  FOREIGN KEY (customer_id) REFERENCES customers(id);\n\n-- Now you can drop it cleanly\nALTER TABLE orders DROP CONSTRAINT fk_orders_customer;\n```\n\nA consistent naming convention (`fk_\u003Ctable>_\u003Ccol>`, `uq_\u003Ctable>_\u003Ccol>`,\n`chk_\u003Ctable>_\u003Crule>`) makes the schema self-documenting and makes migration\nscripts reliable across environments.\n\n**Rule of thumb:** always name constraints explicitly with a predictable\nconvention. Anonymous constraints force you to query the system catalog every\ntime you need to alter or drop one.\n",{"id":1929,"difficulty":127,"q":1930,"a":1931},"partial-unique-index","What is a partial unique index and when does it replace a constraint?","A **partial unique index** applies the uniqueness guarantee only to rows\nthat satisfy a `WHERE` condition. This is useful when uniqueness should apply\nonly to a subset of rows — for example, only *active* records.\n\n```sql\n-- Only one active record per user (soft-delete pattern)\nCREATE UNIQUE INDEX uq_subscriptions_active_user\n  ON subscriptions (user_id)\n  WHERE cancelled_at IS NULL;\n\n-- Two rows for user 1, one cancelled and one active — both allowed\nINSERT INTO subscriptions (user_id, cancelled_at) VALUES (1, '2025-01-01');\nINSERT INTO subscriptions (user_id, cancelled_at) VALUES (1, NULL);  -- OK\n\n-- A second active row for user 1 — fails\nINSERT INTO subscriptions (user_id, cancelled_at) VALUES (1, NULL);  -- ERROR\n```\n\n**Rule of thumb:** use a partial unique index when the uniqueness rule\napplies to a subset of rows (e.g., non-deleted, active-status). Full `UNIQUE`\nconstraints cannot express this; partial indexes can.\n",{"id":1933,"difficulty":106,"q":1934,"a":1935},"disable-enable-constraint","Can you disable a constraint temporarily and should you?","In **SQL Server** you can `DISABLE` a foreign key or check constraint and\nlater `ENABLE` it, optionally with `WITH NOCHECK` (skip validation of\nexisting data) or `WITH CHECK` (validate existing data on re-enable).\n\nIn **Postgres**, you can `SET CONSTRAINTS ALL DEFERRED` for the current\ntransaction, or `ALTER TABLE … DISABLE TRIGGER ALL` to disable trigger-based\nconstraints. You can also drop and recreate constraints for bulk loads.\n\nIn **MySQL**, `SET FOREIGN_KEY_CHECKS = 0` disables FK checks session-wide.\n\n```sql\n-- SQL Server bulk load workaround\nALTER TABLE orders NOCHECK CONSTRAINT fk_orders_customer;\nBULK INSERT orders FROM 'orders.csv';\nALTER TABLE orders WITH CHECK CHECK CONSTRAINT fk_orders_customer;\n\n-- MySQL bulk load\nSET FOREIGN_KEY_CHECKS = 0;\nLOAD DATA INFILE 'orders.csv' INTO TABLE orders;\nSET FOREIGN_KEY_CHECKS = 1;\n```\n\n**Rule of thumb:** disabling constraints is acceptable for controlled bulk\nloads, but always re-enable and validate immediately. Never disable\nconstraints permanently — you will eventually have corrupt data.\n",{"id":1937,"difficulty":106,"q":1938,"a":1939},"index-vs-constraint","How does a UNIQUE constraint relate to a unique index?","A `UNIQUE` constraint is enforced by a **unique index** under the hood.\nIn Postgres and SQL Server, a `UNIQUE` constraint and a `CREATE UNIQUE INDEX`\non the same column are nearly equivalent — the constraint simply gives the\nindex a constraint-associated name.\n\n```sql\n-- These two are functionally equivalent in Postgres:\nALTER TABLE users ADD CONSTRAINT uq_users_email UNIQUE (email);\n\nCREATE UNIQUE INDEX uq_users_email ON users (email);\n```\n\nDifferences in Postgres:\n- Constraints can be `DEFERRABLE`; indexes cannot.\n- You can `DROP CONSTRAINT` but must `DROP INDEX` for a standalone index.\n- Partial unique indexes cannot be declared as a `UNIQUE` constraint\n  (you need the `CREATE UNIQUE INDEX … WHERE` form).\n\n**Rule of thumb:** use the `CONSTRAINT … UNIQUE` form when you need\ndeferred checking or want the DDL to be clearly declarative. Use\n`CREATE UNIQUE INDEX` when you need a partial unique condition.\n",{"id":1941,"difficulty":114,"q":1942,"a":1943},"constraint-violations-errors","What error do you get when a constraint is violated, and how do you handle it?","Each constraint violation raises a specific error that application code can\ncatch and present meaningfully:\n\n| Constraint | Postgres SQLSTATE | Typical message |\n|---|---|---|\n| NOT NULL | 23502 | null value in column \"x\" violates not-null constraint |\n| UNIQUE | 23505 | duplicate key value violates unique constraint \"uq_…\" |\n| FOREIGN KEY | 23503 | insert or update on table \"x\" violates foreign key constraint |\n| CHECK | 23514 | new row for relation \"x\" violates check constraint \"chk_…\" |\n\n```python\n# Python + psycopg2 example\nfrom psycopg2 import errors\ntry:\n    cur.execute(\"INSERT INTO users (email) VALUES (%s)\", (email,))\nexcept errors.UniqueViolation:\n    return {\"error\": \"That email is already registered\"}\n```\n\n**Rule of thumb:** catch constraint violations by **SQLSTATE code**, not by\nparsing the error message string — message text can change between database\nversions. Map each violation to a user-friendly message in the application layer.\n",{"id":1945,"difficulty":114,"q":1946,"a":1947},"table-vs-column-constraint","What is the difference between a column-level and a table-level constraint?","A **column-level constraint** is declared inline with the column definition\nand can only reference that single column. A **table-level constraint** is\ndeclared separately (after all column definitions) and can reference multiple\ncolumns.\n\n```sql\nCREATE TABLE order_items (\n  order_id   INT NOT NULL,               -- column-level NOT NULL\n  product_id INT NOT NULL,               -- column-level NOT NULL\n  quantity   INT NOT NULL CHECK (quantity > 0),  -- column-level CHECK\n\n  -- Table-level: composite PK (must reference both columns — impossible column-level)\n  PRIMARY KEY (order_id, product_id),\n\n  -- Table-level: cross-column check\n  CONSTRAINT chk_valid_quantity CHECK (quantity \u003C= 1000 OR order_id \u003C 1000)\n);\n```\n\n**Rule of thumb:** use column-level constraints for single-column rules\n(NOT NULL, UNIQUE, CHECK on one column). Use table-level constraints\nfor composite keys, composite unique indexes, and cross-column CHECK\nexpressions.\n",{"description":104},"SQL constraints interview questions — PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL, referential actions, deferrable constraints, and constraint best practices across Postgres, MySQL, and SQL Server.","sql\u002Fschema\u002Fconstraints","Constraints & Integrity","rq8jkLiLHCb4HPydx8hdz0jqAzhIQYLbWY0jcAKimbE",{"id":1954,"title":1955,"body":1956,"description":104,"difficulty":127,"extension":107,"framework":10,"frameworkSlug":12,"meta":2017,"navigation":109,"order":39,"path":2018,"questions":2019,"questionsCount":946,"related":247,"seo":2092,"seoDescription":2093,"stem":2094,"subtopic":2095,"topic":38,"topicSlug":40,"updated":328,"__hash__":2096},"qa\u002Fsql\u002Fwindow-functions\u002Fframes-and-offsets.md","Frames And Offsets",{"type":101,"value":1957,"toc":2014},[1958,1962],[639,1959,1961],{"id":1960},"about-frames-offset-functions","About Frames & Offset Functions",[644,1963,1964,1965,853,1968,1971,1972,1606,1975,1619,1978,1606,1981,1606,1984,1987,1988,1996,1997,2000,2001,2004,2005,2009,2010,2013],{},"Frame clauses (",[653,1966,1967],{},"ROWS",[653,1969,1970],{},"RANGE BETWEEN ...",") and offset functions (",[653,1973,1974],{},"LAG",[653,1976,1977],{},"LEAD",[653,1979,1980],{},"FIRST_VALUE",[653,1982,1983],{},"LAST_VALUE",[653,1985,1986],{},"NTH_VALUE",") power SQL's most advanced analytics — moving\naverages, period-over-period deltas, and gaps-and-islands. The highest-value interview\ntraps are the ",[648,1989,1990,1992,1993],{},[653,1991,1967],{}," vs ",[653,1994,1995],{},"RANGE"," distinction, the ",[648,1998,1999],{},"default frame"," (",[653,2002,2003],{},"RANGE ... CURRENT ROW","), and the ",[648,2006,2007],{},[653,2008,1983],{}," gotcha that needs an explicit ",[653,2011,2012],{},"UNBOUNDED FOLLOWING"," frame.",{"title":104,"searchDepth":30,"depth":30,"links":2015},[2016],{"id":1960,"depth":30,"text":1961},{},"\u002Fsql\u002Fwindow-functions\u002Fframes-and-offsets",[2020,2024,2028,2032,2036,2040,2044,2048,2052,2056,2060,2064,2068,2072,2076,2080,2084,2088],{"id":2021,"difficulty":106,"q":2022,"a":2023},"what-is-a-window-frame","What is a window frame?","A **window frame** narrows the window to a **subset of rows around the current\nrow**, within its partition. It's defined with a `ROWS` or `RANGE` clause inside\n`OVER`, and controls exactly which rows an aggregate like `SUM` or `AVG` sees.\n\n```sql\n-- 3-row moving average: current row + the two before it\nSELECT day, amount,\n       AVG(amount) OVER (ORDER BY day\n                         ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg\nFROM   sales;\n```\n\nFrames enable **moving averages, running totals over a sliding window, and\nlookback\u002Flookahead aggregates**.\n\nRule of thumb: a frame is the sliding sub-window an aggregate operates on — set it\nwith `ROWS`\u002F`RANGE BETWEEN ... AND ...`.\n",{"id":2025,"difficulty":127,"q":2026,"a":2027},"rows-vs-range","What is the difference between ROWS and RANGE in a frame?","Both define the frame boundaries, but they count differently:\n\n- `ROWS` works on **physical row positions** — \"the 2 rows before this one,\"\n  regardless of their values.\n- `RANGE` works on **value ranges** of the `ORDER BY` column — rows whose ordering\n  value falls within an offset, treating **ties (peers) as a unit**.\n\n```sql\n-- ROWS: exactly 3 physical rows\nSUM(x) OVER (ORDER BY day ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)\n\n-- RANGE: all rows with the same day are included together\nSUM(x) OVER (ORDER BY day RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)\n```\n\nWith duplicate `ORDER BY` values, `RANGE` includes **all** peer rows; `ROWS`\ncounts each physically.\n\nRule of thumb: `ROWS` = count physical rows; `RANGE` = include all rows with values\nin range (peers grouped).\n",{"id":2029,"difficulty":127,"q":2030,"a":2031},"default-frame","What is the default window frame when you specify ORDER BY but no frame?","When you add `ORDER BY` to an aggregate window **without** an explicit frame, the\ndefault is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. This gives a\n**running (cumulative)** aggregate — but because it's `RANGE`, **tied rows are\nincluded together** through the current row.\n\n```sql\n-- default frame = running total, but RANGE means peers are summed together\nSUM(amount) OVER (ORDER BY day)\n```\n\nThis trips people up: with duplicate `day` values, each tied row shows the **same**\ncumulative total (all peers included). Use `ROWS` for true row-by-row accumulation.\n\nRule of thumb: `ORDER BY` with no frame = `RANGE ... UNBOUNDED PRECEDING TO CURRENT\nROW`; switch to `ROWS` to avoid peer-grouping surprises.\n",{"id":2033,"difficulty":106,"q":2034,"a":2035},"no-order-by-frame","What is the window frame when there is no ORDER BY?","With **no `ORDER BY`** in the `OVER` clause, the frame is the **entire partition** —\nevery row in the partition sees all of them. The aggregate is the same for all rows\nin that partition.\n\n```sql\n-- every row gets the department total (whole partition)\nSUM(salary) OVER (PARTITION BY dept_id) AS dept_total\n```\n\nThis is why `SUM(x) OVER (PARTITION BY g)` gives a group total on each row, while\nadding `ORDER BY` turns it into a running total.\n\nRule of thumb: no `ORDER BY` → frame is the full partition (a flat group aggregate);\nadding `ORDER BY` makes it cumulative.\n",{"id":2037,"difficulty":127,"q":2038,"a":2039},"frame-boundaries","What are the possible frame boundary keywords?","A frame is `BETWEEN \u003Cstart> AND \u003Cend>`, where each bound is one of:\n\n- `UNBOUNDED PRECEDING` — the first row of the partition.\n- `n PRECEDING` — n rows (or values) before the current row.\n- `CURRENT ROW` — the current row (or its peers under `RANGE`).\n- `n FOLLOWING` — n rows (or values) after the current row.\n- `UNBOUNDED FOLLOWING` — the last row of the partition.\n\n```sql\n-- centered 3-row average: one before, current, one after\nAVG(x) OVER (ORDER BY day ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)\n\n-- whole partition\nSUM(x) OVER (ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)\n```\n\nRule of thumb: combine `UNBOUNDED`\u002F`n PRECEDING`, `CURRENT ROW`, `n FOLLOWING`\u002F\n`UNBOUNDED` to frame any sliding or anchored window.\n",{"id":2041,"difficulty":106,"q":2042,"a":2043},"moving-average","How do you compute a moving average?","Use `AVG` with a `ROWS` frame spanning the desired window of rows around the\ncurrent row.\n\n```sql\n-- 7-day moving average (current day + previous 6)\nSELECT day, amount,\n       AVG(amount) OVER (ORDER BY day\n                         ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma_7\nFROM   sales;\n```\n\nUse `ROWS` (not `RANGE`) for a fixed count of rows. For a **centered** average,\nframe `BETWEEN 3 PRECEDING AND 3 FOLLOWING`. Early rows average fewer rows unless\nyou handle warm-up.\n\nRule of thumb: moving average = `AVG(x) OVER (ORDER BY t ROWS BETWEEN n PRECEDING\nAND CURRENT ROW)`.\n",{"id":2045,"difficulty":114,"q":2046,"a":2047},"lag-function","What does the LAG() function do?","`LAG(col, offset, default)` returns the value of `col` from a **previous row**\nwithin the partition — `offset` rows back (default 1). If there's no such row, it\nreturns `default` (or `NULL`). It's the classic tool for **comparing a row to the\nprior one**.\n\n```sql\n-- compare each day's sales to the previous day\nSELECT day, amount,\n       LAG(amount, 1, 0) OVER (ORDER BY day) AS prev_amount,\n       amount - LAG(amount) OVER (ORDER BY day) AS day_over_day\nFROM   sales;\n```\n\nRule of thumb: `LAG()` looks **backward** to the previous row — perfect for\nperiod-over-period deltas.\n",{"id":2049,"difficulty":114,"q":2050,"a":2051},"lead-function","What does the LEAD() function do?","`LEAD(col, offset, default)` is the mirror of `LAG` — it returns a value from a\n**following row**, `offset` rows ahead (default 1), or `default`\u002F`NULL` past the\nend. Use it to look **forward**.\n\n```sql\n-- gap until the customer's next order\nSELECT customer_id, order_date,\n       LEAD(order_date) OVER (PARTITION BY customer_id\n                              ORDER BY order_date) AS next_order,\n       LEAD(order_date) OVER (PARTITION BY customer_id\n                              ORDER BY order_date) - order_date AS days_gap\nFROM   orders;\n```\n\nRule of thumb: `LEAD()` looks **forward** to the next row — use it for \"time until\nnext event\" or next-value comparisons.\n",{"id":2053,"difficulty":106,"q":2054,"a":2055},"lag-lead-offset-default","What are the optional arguments of LAG() and LEAD()?","Both take three arguments: `LAG(expr [, offset [, default]])`.\n\n- `expr` — the column\u002Fexpression to fetch.\n- `offset` — how many rows back\u002Fforward (default `1`).\n- `default` — value returned when the offset falls outside the partition\n  (default `NULL`).\n\n```sql\n-- value 2 rows back; if none, use 0 instead of NULL\nLAG(amount, 2, 0) OVER (ORDER BY day)\n```\n\nSupplying a `default` is the clean way to avoid `NULL`s at partition edges (e.g.\nthe first row's \"previous\" value).\n\nRule of thumb: pass `offset` to jump multiple rows and `default` to replace the\nedge `NULL`s.\n",{"id":2057,"difficulty":106,"q":2058,"a":2059},"first-value-last-value","What do FIRST_VALUE() and LAST_VALUE() return?","`FIRST_VALUE(col)` returns `col` from the **first row of the frame**;\n`LAST_VALUE(col)` from the **last row of the frame**. They're handy for putting a\npartition's boundary value on every row.\n\n```sql\nSELECT name, dept_id, salary,\n       FIRST_VALUE(name) OVER (PARTITION BY dept_id\n                               ORDER BY salary DESC) AS highest_paid\nFROM   employees;\n```\n\n**Gotcha:** with the default frame (`... CURRENT ROW`), `LAST_VALUE` returns the\ncurrent row, not the partition's true last — you must widen the frame (see next\nquestion).\n\nRule of thumb: `FIRST_VALUE`\u002F`LAST_VALUE` grab a frame boundary value — mind the\nframe, especially for `LAST_VALUE`.\n",{"id":2061,"difficulty":127,"q":2062,"a":2063},"last-value-frame-gotcha","Why does LAST_VALUE() often return the current row instead of the last?","Because the default frame with `ORDER BY` is `RANGE BETWEEN UNBOUNDED PRECEDING AND\nCURRENT ROW` — the frame **ends at the current row**, so `LAST_VALUE` sees the\ncurrent row as the last. To get the partition's true last value, **extend the frame\nto `UNBOUNDED FOLLOWING`**.\n\n```sql\n-- WRONG: returns current row's value\nLAST_VALUE(salary) OVER (PARTITION BY dept_id ORDER BY salary)\n\n-- RIGHT: frame covers the whole partition\nLAST_VALUE(salary) OVER (PARTITION BY dept_id ORDER BY salary\n                         ROWS BETWEEN UNBOUNDED PRECEDING\n                                  AND UNBOUNDED FOLLOWING)\n```\n\nRule of thumb: always set an explicit `... UNBOUNDED FOLLOWING` frame with\n`LAST_VALUE`, or it just returns the current row.\n",{"id":2065,"difficulty":106,"q":2066,"a":2067},"nth-value","What does NTH_VALUE() do?","`NTH_VALUE(col, n)` returns `col` from the **nth row of the frame** (1-based). Like\n`LAST_VALUE`, it's frame-sensitive, so widen the frame to see the whole partition.\n\n```sql\n-- the 2nd highest-paid employee's name, shown on every row of the dept\nSELECT name, dept_id,\n       NTH_VALUE(name, 2) OVER (PARTITION BY dept_id ORDER BY salary DESC\n                                ROWS BETWEEN UNBOUNDED PRECEDING\n                                         AND UNBOUNDED FOLLOWING) AS second_top\nFROM   employees;\n```\n\nRows before the nth position return `NULL`. (Supported in PostgreSQL, MySQL 8+;\nnot in SQL Server.)\n\nRule of thumb: `NTH_VALUE(col, n)` fetches the nth frame row — widen the frame to\ntarget the whole partition.\n",{"id":2069,"difficulty":106,"q":2070,"a":2071},"running-total-vs-frame","How does the frame turn an aggregate into a running vs windowed total?","The frame decides what an aggregate accumulates:\n\n- **Running total** — `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`\n  (everything up to now).\n- **Sliding window total** — `ROWS BETWEEN n PRECEDING AND CURRENT ROW`\n  (last n+1 rows).\n- **Full partition total** — `ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED\n  FOLLOWING` (or omit `ORDER BY`).\n\n```sql\nSUM(x) OVER (ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -- running\nSUM(x) OVER (ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)         -- last 7\n```\n\nRule of thumb: the frame's start bound sets the accumulation window — `UNBOUNDED\nPRECEDING` for running, `n PRECEDING` for sliding.\n",{"id":2073,"difficulty":106,"q":2074,"a":2075},"lag-lead-vs-self-join","Why use LAG\u002FLEAD instead of a self-join for row comparisons?","Comparing each row to its neighbor with a **self-join** is verbose, needs a way to\ndefine \"previous\" (often a correlated subquery with `MAX(... \u003C current)`), and is\nslow. `LAG`\u002F`LEAD` do it in a **single ordered pass**, clearly and efficiently.\n\n```sql\n-- self-join approach (awkward, slower)\nSELECT a.day, a.amount - b.amount AS delta\nFROM sales a LEFT JOIN sales b ON b.day = a.day - 1;\n\n-- LAG approach (clean, one pass)\nSELECT day, amount - LAG(amount) OVER (ORDER BY day) AS delta FROM sales;\n```\n\nRule of thumb: prefer `LAG`\u002F`LEAD` over self-joins for previous\u002Fnext comparisons —\nsimpler and faster.\n",{"id":2077,"difficulty":127,"q":2078,"a":2079},"gaps-and-islands","How do offset\u002Franking functions solve gaps-and-islands problems?","**Gaps-and-islands** problems (finding consecutive runs or missing ranges) are\nsolved by spotting where sequences break — using `LAG`\u002F`LEAD` to detect gaps, or the\n\"`ROW_NUMBER` difference\" trick to label islands.\n\n```sql\n-- group consecutive login days into \"islands\"\nWITH marked AS (\n    SELECT user_id, login_date,\n           login_date - (ROW_NUMBER() OVER (PARTITION BY user_id\n                                            ORDER BY login_date)) * INTERVAL '1 day'\n           AS grp\n    FROM logins\n)\nSELECT user_id, MIN(login_date) AS streak_start, MAX(login_date) AS streak_end,\n       COUNT(*) AS streak_len\nFROM marked GROUP BY user_id, grp;\n```\n\nRows in the same run share a constant `grp` value because the date and row number\nadvance together.\n\nRule of thumb: detect gaps with `LAG`\u002F`LEAD`; identify islands with the\ndate-minus-`ROW_NUMBER` constant-group trick.\n",{"id":2081,"difficulty":127,"q":2082,"a":2083},"frame-with-range-interval","Can RANGE frames use intervals (e.g. time-based windows)?","Yes — `RANGE` frames can use a **value offset**, including time intervals, so the\nwindow is defined by the `ORDER BY` value rather than a row count. This gives true\ntime-based windows even when rows are irregularly spaced.\n\n```sql\n-- sum of sales in the trailing 7 days by date value (not row count)\nSELECT day, amount,\n       SUM(amount) OVER (ORDER BY day\n                         RANGE BETWEEN INTERVAL '7 days' PRECEDING\n                                   AND CURRENT ROW) AS rolling_7d\nFROM sales;\n```\n\nUnlike `ROWS`, this correctly handles missing days and multiple rows per day.\n(PostgreSQL and modern engines support `RANGE` with numeric\u002Finterval offsets.)\n\nRule of thumb: use `RANGE` with an interval\u002Fnumeric offset for value-based windows\n(e.g. \"last 7 days\") that don't depend on row counts.\n",{"id":2085,"difficulty":106,"q":2086,"a":2087},"offset-functions-null-handling","How do LAG\u002FLEAD and FIRST_VALUE handle NULLs and partition edges?","At **partition edges**, `LAG`\u002F`LEAD` return their `default` argument (or `NULL`) —\ne.g. the first row has no previous row. They also return the **actual stored value**\neven if it's `NULL`; they don't skip `NULL` data values.\n\n```sql\n-- first row's prev is 0 (the default), not an error\nLAG(amount, 1, 0) OVER (ORDER BY day)\n```\n\nSome databases (Oracle, and via `IGNORE NULLS` in standard SQL \u002F newer engines)\nsupport `IGNORE NULLS` to skip `NULL` data values and fetch the nearest non-null.\n\nRule of thumb: edges yield the `default`\u002F`NULL`; supply a `default` to clean them\nup, and use `IGNORE NULLS` (where supported) to skip null data.\n",{"id":2089,"difficulty":106,"q":2090,"a":2091},"combining-frames-with-partition","How do frames interact with PARTITION BY?","A frame is always **bounded by the partition** — it never crosses partition\nboundaries. `UNBOUNDED PRECEDING` means the first row **of the current partition**,\nand `LAG`\u002F`LEAD` stop at the partition edge.\n\n```sql\n-- running total resets at each customer; frame stays within the partition\nSELECT customer_id, order_date, amount,\n       SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date\n                         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS rt\nFROM orders;\n```\n\nSo per-customer running totals start fresh for each customer automatically.\n\nRule of thumb: frames live inside their partition — `UNBOUNDED` and offsets are\nrelative to the partition, not the whole table.\n",{"description":104},"SQL window frame interview questions — ROWS vs RANGE, frame boundaries, LAG\u002FLEAD, FIRST_VALUE\u002FLAST_VALUE, moving averages, and default frame gotchas.","sql\u002Fwindow-functions\u002Fframes-and-offsets","Frames & Offset Functions","3jC4V3JmOMKHMW1anUBJYQ68qbQi1ZbUGVcd0ktsBo4",{"id":2098,"title":2099,"body":2100,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":2104,"navigation":109,"order":11,"path":2105,"questions":2106,"questionsCount":1030,"related":247,"seo":2175,"seoDescription":2176,"stem":2177,"subtopic":2178,"topic":21,"topicSlug":22,"updated":328,"__hash__":2179},"qa\u002Fsql\u002Fbasics\u002Faggregation.md","Aggregation",{"type":101,"value":2101,"toc":2102},[],{"title":104,"searchDepth":30,"depth":30,"links":2103},[],{},"\u002Fsql\u002Fbasics\u002Faggregation",[2107,2111,2115,2119,2123,2127,2131,2135,2139,2143,2147,2151,2155,2159,2163,2167,2171],{"id":2108,"difficulty":114,"q":2109,"a":2110},"what-is-aggregation","What is an aggregate function?","An **aggregate function** collapses **many rows into a single value** —\n`COUNT`, `SUM`, `AVG`, `MIN`, `MAX`. Used alone, they summarize the whole table;\nwith `GROUP BY`, they summarize each group.\n\n```sql\nSELECT COUNT(*)   AS total_orders,\n       SUM(total) AS revenue,\n       AVG(total) AS avg_order\nFROM orders;\n```\n\nAggregates run **after `WHERE`** (on the filtered rows) and **before `HAVING`**.\nWithout `GROUP BY`, an aggregate over the whole result returns exactly one row.\n\nRule of thumb: aggregates turn a set of rows into one summary value.\n",{"id":2112,"difficulty":114,"q":2113,"a":2114},"group-by","What does GROUP BY do?","`GROUP BY` **partitions rows into groups** that share the same values in the grouping\ncolumns, then computes one aggregate result **per group**. The result has one row per\ndistinct group.\n\n```sql\nSELECT user_id, COUNT(*) AS orders, SUM(total) AS spent\nFROM orders\nGROUP BY user_id;          -- one row per user\n```\n\nYou can group by multiple columns (`GROUP BY country, city`) — groups are then the\ndistinct **combinations**. Grouping happens after `WHERE` filters the rows.\n\nRule of thumb: `GROUP BY` defines what \"one row of the result\" means; aggregates\nsummarize each.\n",{"id":2116,"difficulty":106,"q":2117,"a":2118},"group-by-rule","Why must non-aggregated SELECT columns appear in GROUP BY?","In standard SQL, every column in the `SELECT` list must either be **inside an\naggregate** or **listed in `GROUP BY`**. Otherwise the column has many values per\ngroup and the database can't pick one.\n\n```sql\n-- ERROR in standard SQL: name isn't grouped or aggregated\nSELECT user_id, name, COUNT(*) FROM orders GROUP BY user_id;\n\n-- fix: group by it too, or aggregate it\nSELECT user_id, MAX(name) AS name, COUNT(*) FROM orders GROUP BY user_id;\n```\n\nMySQL historically allowed this (returning an arbitrary value) but now rejects it by\ndefault under `ONLY_FULL_GROUP_BY`. Postgres allows ungrouped columns only if they're\nfunctionally dependent on the primary key.\n\nRule of thumb: if it's in `SELECT` and not aggregated, it must be in `GROUP BY`.\n",{"id":2120,"difficulty":106,"q":2121,"a":2122},"count-star-vs-col","What is the difference between COUNT(*) and COUNT(column)?","`COUNT(*)` counts **rows**; `COUNT(column)` counts rows where that column is **not\nNULL**. The difference shows up whenever the column has NULLs.\n\n```sql\nSELECT COUNT(*)        AS rows,        -- every row\n       COUNT(phone)    AS with_phone,  -- rows where phone IS NOT NULL\n       COUNT(DISTINCT country) AS countries\nFROM users;\n```\n\n`COUNT(DISTINCT col)` counts distinct non-NULL values. This matters after outer\njoins, where `COUNT(*)` would count the NULL-filled placeholder row.\n\nRule of thumb: `COUNT(*)` for rows, `COUNT(col)` for non-NULL values, `COUNT(DISTINCT\ncol)` for unique values.\n",{"id":2124,"difficulty":106,"q":2125,"a":2126},"aggregates-ignore-null","How do aggregate functions handle NULLs?","All aggregates **ignore NULLs** (except `COUNT(*)`). `SUM`, `AVG`, `MIN`, `MAX`, and\n`COUNT(col)` skip rows where the value is NULL — they don't treat NULL as zero.\n\n```sql\n-- AVG divides by the count of NON-NULL scores, not all rows\nSELECT AVG(score) FROM tests;       -- NULL scores excluded entirely\n```\n\nThis is usually right, but watch `AVG`: if you want NULLs counted as 0, convert them\nfirst with `AVG(COALESCE(score, 0))`. `SUM` of all-NULL (or no) rows returns `NULL`,\nnot 0.\n\nRule of thumb: aggregates skip NULLs; use `COALESCE` first if NULLs should count as\nzero.\n",{"id":2128,"difficulty":106,"q":2129,"a":2130},"having-clause","What does the HAVING clause do?","`HAVING` filters **groups after aggregation**, the way `WHERE` filters rows before\nit. It's the only place you can filter on an aggregate's result.\n\n```sql\nSELECT user_id, COUNT(*) AS orders\nFROM orders\nGROUP BY user_id\nHAVING COUNT(*) >= 10;     -- only users with 10+ orders\n```\n\n`HAVING` can reference aggregates and grouping columns. Put per-row conditions in\n`WHERE` (cheaper, runs first) and reserve `HAVING` for conditions on the aggregates.\n\nRule of thumb: filter rows in `WHERE`, filter aggregated groups in `HAVING`.\n",{"id":2132,"difficulty":106,"q":2133,"a":2134},"where-vs-having-agg","Why can't you use an aggregate in WHERE?","`WHERE` runs **before** grouping and aggregation, so the aggregate values don't\nexist yet. Referencing `COUNT()`\u002F`SUM()` in `WHERE` is an error; those belong in\n`HAVING`, which runs after.\n\n```sql\n-- ERROR: aggregate not allowed in WHERE\nSELECT user_id FROM orders WHERE COUNT(*) > 5 GROUP BY user_id;\n\n-- correct\nSELECT user_id FROM orders GROUP BY user_id HAVING COUNT(*) > 5;\n```\n\nThe logical order `WHERE → GROUP BY → HAVING` is the whole reason for the split.\n\nRule of thumb: aggregate condition → `HAVING`; raw-column condition → `WHERE`.\n",{"id":2136,"difficulty":127,"q":2137,"a":2138},"conditional-aggregation-basics","How do you aggregate conditionally (pivot)?","Wrap a `CASE` inside the aggregate so it only counts\u002Fsums rows meeting a condition.\nThis produces a **pivot** — multiple conditional columns in one pass.\n\n```sql\nSELECT user_id,\n       COUNT(*) FILTER (WHERE status = 'paid')      AS paid,     -- Postgres\n       SUM(CASE WHEN status = 'refunded' THEN 1 ELSE 0 END) AS refunded\nFROM orders\nGROUP BY user_id;\n```\n\nThe portable form is `SUM(CASE WHEN ... THEN 1 ELSE 0 END)`; Postgres\u002FSQLite offer\nthe cleaner `FILTER (WHERE ...)` clause. One scan yields several conditional totals.\n\nRule of thumb: `CASE` inside an aggregate (or `FILTER`) turns rows into pivoted\ncolumns.\n",{"id":2140,"difficulty":106,"q":2141,"a":2142},"avg-integer-trap","Why might AVG return a truncated value?","If the column is an **integer type**, some databases compute `AVG` (or the underlying\n`SUM`\u002Fcount division) using integer arithmetic and truncate the fraction. You get\n`3` instead of `3.5`.\n\n```sql\n-- cast to numeric to keep the fraction\nSELECT AVG(rating)          AS maybe_truncated,\n       AVG(rating::numeric) AS exact            -- Postgres cast\nFROM reviews;\n```\n\nPostgres's `AVG` actually returns numeric for integer input, but manual `SUM(x)\u002F\nCOUNT(x)` will truncate. MySQL\u002FSQL Server can truncate depending on types.\n\nRule of thumb: cast integer columns to decimal before averaging or manually\ndividing.\n",{"id":2144,"difficulty":114,"q":2145,"a":2146},"grouping-multiple-columns","How do you group by more than one column?","List several columns in `GROUP BY`; groups become the distinct **combinations** of\nthose columns. The result has one row per unique combination.\n\n```sql\nSELECT country, city, COUNT(*) AS users\nFROM users\nGROUP BY country, city\nORDER BY country, users DESC;\n```\n\nAdding a column to `GROUP BY` makes the groups **finer** (more, smaller groups).\nEvery grouped column may appear bare in `SELECT`.\n\nRule of thumb: grouping by more columns = more, smaller groups.\n",{"id":2148,"difficulty":106,"q":2149,"a":2150},"count-distinct","How do you count distinct values?","Use `COUNT(DISTINCT col)`, which counts unique **non-NULL** values. You can combine\nit with grouping to count distinct values per group.\n\n```sql\n-- distinct products bought by each user\nSELECT user_id, COUNT(DISTINCT product_id) AS unique_products\nFROM order_items\nGROUP BY user_id;\n```\n\n`COUNT(DISTINCT ...)` is more expensive than `COUNT(*)` because it must deduplicate.\nFor huge tables, approximate counts (`APPROX_COUNT_DISTINCT`, Postgres `HLL`) trade\naccuracy for speed.\n\nRule of thumb: `COUNT(DISTINCT col)` for unique counts; consider approximate\nvariants at scale.\n",{"id":2152,"difficulty":114,"q":2153,"a":2154},"min-max","What do MIN and MAX return on different types?","`MIN`\u002F`MAX` return the smallest\u002Flargest value by the type's natural ordering: numeric\norder for numbers, chronological for dates, and lexicographic for strings. They\nignore NULLs.\n\n```sql\nSELECT MIN(created_at) AS first_signup,\n       MAX(total)      AS biggest_order,\n       MIN(name)       AS alphabetically_first\nFROM orders;\n```\n\nA common pattern is `MAX(created_at)` per group to find the latest event time — but\nthat only gives the *time*, not the *whole row* (use a window function or\n`DISTINCT ON` for that).\n\nRule of thumb: `MIN`\u002F`MAX` give the extreme **value**, not the row it came from.\n",{"id":2156,"difficulty":106,"q":2157,"a":2158},"sum-null-result","Why does SUM sometimes return NULL instead of 0?","`SUM` over **zero rows** (or all-NULL values) returns `NULL`, not `0`, because\nthere's nothing to add. This bites after filters or outer joins that leave a group\nempty.\n\n```sql\n-- returns NULL if the user has no matching orders\nSELECT COALESCE(SUM(total), 0) AS spent\nFROM orders\nWHERE user_id = 42 AND status = 'paid';\n```\n\nWrap with `COALESCE(SUM(x), 0)` whenever a missing\u002Fempty group should read as zero.\n\nRule of thumb: `COALESCE(SUM(x), 0)` to turn \"no rows\" into a 0 instead of NULL.\n",{"id":2160,"difficulty":106,"q":2161,"a":2162},"group-by-expression","Can you group by a computed expression?","Yes — `GROUP BY` accepts expressions, not just bare columns. This is how you bucket\ncontinuous values (by month, by range, by derived category).\n\n```sql\n-- orders per month\nSELECT DATE_TRUNC('month', created_at) AS month, COUNT(*) AS n\nFROM orders\nGROUP BY DATE_TRUNC('month', created_at)\nORDER BY month;\n```\n\nRepeat the same expression in `GROUP BY` and `SELECT`. Some dialects let you group by\nthe alias or position number, but repeating the expression is the portable form.\n\nRule of thumb: group by the same expression you select to bucket continuous data.\n",{"id":2164,"difficulty":127,"q":2165,"a":2166},"rollup-cube","What do GROUPING SETS, ROLLUP and CUBE do?","They compute **multiple grouping levels in one query**, adding subtotal\u002Fgrand-total\nrows. `ROLLUP` makes hierarchical subtotals; `CUBE` makes every combination;\n`GROUPING SETS` lists exactly the groupings you want.\n\n```sql\n-- subtotals per (country, city), per country, and a grand total\nSELECT country, city, SUM(total)\nFROM sales\nGROUP BY ROLLUP (country, city);\n```\n\nThe subtotal rows have `NULL` in the rolled-up columns; `GROUPING()` distinguishes a\n\"real NULL\" from a subtotal marker.\n\nRule of thumb: `ROLLUP`\u002F`CUBE` add subtotal rows without multiple `UNION`ed queries.\n",{"id":2168,"difficulty":106,"q":2169,"a":2170},"filter-clause","What is the FILTER clause on an aggregate?","`FILTER (WHERE condition)` restricts an aggregate to rows matching the condition —\nthe standard, readable alternative to `SUM(CASE WHEN ...)`. Supported in\nPostgres and SQLite.\n\n```sql\nSELECT\n  COUNT(*)                               AS total,\n  COUNT(*) FILTER (WHERE status = 'paid') AS paid,\n  AVG(total) FILTER (WHERE total > 0)     AS avg_nonzero\nFROM orders;\n```\n\nEach aggregate can have its own `FILTER`, so one query produces several differently\nfiltered metrics. Where unsupported, fall back to `CASE` inside the aggregate.\n\nRule of thumb: prefer `FILTER (WHERE ...)` over `CASE`-inside-aggregate where your\ndatabase supports it.\n",{"id":2172,"difficulty":127,"q":2173,"a":2174},"aggregate-over-join","How do you avoid double-counting when aggregating over joins?","Joining a parent to a one-to-many child multiplies the parent's rows, so\n`SUM`\u002F`COUNT` over the joined result **double-counts**. Pre-aggregate the child in a\nsubquery first, then join.\n\n```sql\nSELECT u.name, o.order_count, o.revenue\nFROM users u\nLEFT JOIN (\n  SELECT user_id, COUNT(*) AS order_count, SUM(total) AS revenue\n  FROM orders GROUP BY user_id\n) o ON o.user_id = u.id;\n```\n\nJoining two different one-to-many children to the same parent is the classic \"fan\ntrap\" that inflates sums — pre-aggregate each child separately.\n\nRule of thumb: pre-aggregate one-to-many children before joining to avoid inflated\ntotals.\n",{"description":104},"SQL aggregation interview questions — COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING, NULL handling, COUNT(*) vs COUNT(col) and conditional aggregation.","sql\u002Fbasics\u002Faggregation","Aggregation & GROUP BY","yqRLOYzJKFw3JP7IDpqvPVRPdqSrjTrYXCTLbGafvl4",{"id":2181,"title":2182,"body":2183,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":2187,"navigation":109,"order":11,"path":2188,"questions":2189,"questionsCount":323,"related":247,"seo":2250,"seoDescription":2251,"stem":2252,"subtopic":2182,"topic":47,"topicSlug":48,"updated":328,"__hash__":2253},"qa\u002Fsql\u002Fschema\u002Fnormalization.md","Normalization",{"type":101,"value":2184,"toc":2185},[],{"title":104,"searchDepth":30,"depth":30,"links":2186},[],{},"\u002Fsql\u002Fschema\u002Fnormalization",[2190,2194,2198,2202,2206,2210,2214,2218,2222,2226,2230,2234,2238,2242,2246],{"id":2191,"difficulty":114,"q":2192,"a":2193},"what-is-normalization","What is database normalization and why does it matter?","**Normalization** is the process of organizing a relational schema to\n**reduce data redundancy** and **prevent update anomalies**. The theory\nwas introduced by E.F. Codd and is expressed as a series of **normal forms**\n(1NF, 2NF, 3NF, BCNF …) — each one stricter than the last.\n\n```sql\n-- Unnormalized: storing multiple values in one cell (violates 1NF)\n-- orders: | id | customer | items              |\n-- data:   | 1  | Alice    | 'Pen, Pencil, Pad' |\n\n-- After normalization: separate tables, one fact per cell\n-- orders: | id | customer_id |\n-- order_items: | order_id | product_id | qty |\n```\n\nBenefits: no redundant copies to keep in sync, constraints are enforceable,\nqueries are composable.\n\n**Rule of thumb:** normalize to at least **3NF** for transactional\n(OLTP) schemas; selectively denormalize for read-heavy reporting (OLAP) only\nwhen profiling proves it necessary.\n",{"id":2195,"difficulty":106,"q":2196,"a":2197},"update-anomalies","What are the three update anomalies normalization prevents?","Un-normalized schemas suffer from three anomalies that make data unreliable:\n\n1. **Insertion anomaly** — you cannot store a fact without also storing\n   another unrelated fact. E.g., you cannot record a new department unless\n   you also have an employee for it.\n2. **Update anomaly** — the same fact appears in multiple rows. Changing a\n   manager's name requires updating every row for every employee in that\n   department. Miss one row → inconsistent data.\n3. **Deletion anomaly** — deleting a row removes more facts than intended.\n   Delete the last employee in a department and you lose the department's\n   name\u002Flocation too.\n\n```sql\n-- Bad: employee table stores department info in every row\n-- | emp_id | emp_name | dept_id | dept_name  | dept_location |\n-- If you rename the dept, you must update EVERY employee row.\n\n-- Fixed (normalized): dept facts live in one place\nCREATE TABLE departments (id INT PRIMARY KEY, name TEXT, location TEXT);\nCREATE TABLE employees (id INT PRIMARY KEY, name TEXT, dept_id INT REFERENCES departments(id));\n```\n\n**Rule of thumb:** if the same value must be updated in more than one row\nto keep the data consistent, you have a normalization problem.\n",{"id":2199,"difficulty":114,"q":2200,"a":2201},"first-normal-form","What is First Normal Form (1NF)?","A table is in **1NF** when:\n1. Every column contains **atomic** (indivisible) values — no sets, lists,\n   or repeating groups inside a single cell.\n2. Every column contains values of a **single type**.\n3. Each row is **uniquely identifiable** (there is a primary key).\n\n```sql\n-- Violates 1NF: multiple phone numbers in one column\n-- | id | name  | phones               |\n-- | 1  | Alice | '555-1234, 555-5678' |\n\n-- 1NF compliant: separate table for multi-valued attribute\nCREATE TABLE contacts (id INT PRIMARY KEY, name TEXT);\nCREATE TABLE contact_phones (\n  contact_id INT REFERENCES contacts(id),\n  phone      TEXT NOT NULL,\n  PRIMARY KEY (contact_id, phone)\n);\n```\n\n**Rule of thumb:** if a cell contains comma-separated values or you find\nyourself doing `LIKE '%value%'` to search within a column, the table\nviolates 1NF and needs to be split.\n",{"id":2203,"difficulty":106,"q":2204,"a":2205},"functional-dependency","What is a functional dependency?","A **functional dependency** (FD) `A → B` means that knowing the value of\n`A` uniquely determines the value of `B`. In a table, a functional\ndependency is a constraint on which combinations of values are valid.\n\n```\n-- In an orders table:\norder_id → customer_id      (each order has exactly one customer)\norder_id → order_date\n(order_id, product_id) → quantity   (composite key determines quantity)\n\n-- Problematic FD in an unnormalized table:\ndept_id → dept_name         (department name depends only on dept_id,\n                             not on the full PK of the employee row)\n```\n\nUnderstanding FDs is the foundation of 2NF and 3NF: each normal form\nremoves a class of problematic FDs from the schema.\n\n**Rule of thumb:** draw out the FDs before designing a schema. Every\nnon-key column should depend on **the whole key and nothing but the key**.\n(This is essentially the definition of 3NF in plain English.)\n",{"id":2207,"difficulty":106,"q":2208,"a":2209},"second-normal-form","What is Second Normal Form (2NF) and what does it fix?","A table is in **2NF** when it is in 1NF and every non-key column is\n**fully functionally dependent on the whole primary key** — not just part\nof it. 2NF only matters when the PK is composite.\n\n```sql\n-- Violates 2NF: PK is (order_id, product_id) but product_name\n-- depends only on product_id (partial dependency)\n-- | order_id | product_id | product_name | quantity |\n\n-- Fix: split product facts into their own table\nCREATE TABLE products (\n  id   INT PRIMARY KEY,\n  name TEXT NOT NULL\n);\nCREATE TABLE order_items (\n  order_id   INT REFERENCES orders(id),\n  product_id INT REFERENCES products(id),\n  quantity   INT NOT NULL,\n  PRIMARY KEY (order_id, product_id)\n);\n```\n\n**Rule of thumb:** if any non-key column depends on *part* of a composite\nprimary key, move those columns to a table where that partial key *is* the\nfull primary key.\n",{"id":2211,"difficulty":106,"q":2212,"a":2213},"third-normal-form","What is Third Normal Form (3NF) and what does it eliminate?","A table is in **3NF** when it is in 2NF and **no non-key column determines\nanother non-key column** (no transitive dependencies).\n\n```sql\n-- Violates 3NF: zip_code → city (transitive: emp_id → zip_code → city)\n-- | emp_id | name  | zip_code | city     |\n\n-- Fix: move the transitive dependency to its own table\nCREATE TABLE zip_codes (zip TEXT PRIMARY KEY, city TEXT NOT NULL);\nCREATE TABLE employees (\n  id       INT PRIMARY KEY,\n  name     TEXT NOT NULL,\n  zip_code TEXT REFERENCES zip_codes(zip)\n);\n```\n\nThe classic mnemonic: **\"Every non-key attribute must depend on the key,\nthe whole key, and nothing but the key — so help me Codd.\"**\n\n**Rule of thumb:** if changing one non-key value (like a zip code) should\nautomatically update another non-key value (the city), those values belong\nin a separate table joined by a foreign key.\n",{"id":2215,"difficulty":127,"q":2216,"a":2217},"bcnf","What is Boyce-Codd Normal Form (BCNF) and how does it differ from 3NF?","**BCNF** (sometimes called 3.5NF) is a stricter version of 3NF. A table\nis in BCNF if, for every non-trivial functional dependency `A → B`, `A`\nis a **superkey** (a set of columns that uniquely identifies a row).\n\nBCNF and 3NF differ only when a table has **multiple overlapping candidate\nkeys**. 3NF allows a non-key column to determine part of another candidate\nkey; BCNF does not.\n\n```\n-- Classic BCNF violation: Tutors(student, subject, tutor)\n-- Candidate keys: (student, subject) and (student, tutor)\n-- FD: tutor → subject (a tutor teaches exactly one subject)\n-- This violates BCNF because 'tutor' is not a superkey.\n\n-- Fix (decompose):\n-- TutorSubject(tutor PK, subject)\n-- StudentTutor(student, tutor, FK tutor → TutorSubject)\n```\n\n**Rule of thumb:** BCNF matters in schemas with multiple candidate keys.\nIn practice, 3NF is the target for most applications; BCNF is pursued when\nredundancy in multi-key tables causes real anomalies.\n",{"id":2219,"difficulty":106,"q":2220,"a":2221},"denormalization","What is denormalization and when is it justified?","**Denormalization** intentionally introduces redundancy into a schema to\nimprove **read performance** — typically by precomputing joins or\naggregations and caching their results in additional columns or tables.\n\n```sql\n-- Normalized: count must be computed with a JOIN every time\nSELECT u.id, COUNT(o.id) AS order_count\nFROM users u LEFT JOIN orders o ON o.user_id = u.id\nGROUP BY u.id;\n\n-- Denormalized: cache the count in the users table\nALTER TABLE users ADD COLUMN order_count INT NOT NULL DEFAULT 0;\n\n-- Maintain via trigger or application logic on each insert\u002Fdelete\nUPDATE users SET order_count = order_count + 1 WHERE id = NEW.user_id;\n```\n\nTrade-offs:\n- ✅ Faster reads, simpler queries, reduced join cost.\n- ❌ Write complexity — must keep redundant copies in sync.\n- ❌ Risk of inconsistency if update logic is missed.\n\n**Rule of thumb:** normalize first; denormalize only after profiling shows\nthat a specific query is a bottleneck *and* the added write complexity is\nworth the read gain. Document every denormalized column with a comment\nexplaining what it caches and how it is maintained.\n",{"id":2223,"difficulty":106,"q":2224,"a":2225},"star-schema","What is a star schema and how does it differ from a normalized OLTP schema?","A **star schema** is a denormalized dimensional model used in **data\nwarehouses (OLAP)**. It centers on a large **fact table** (events\u002Ftransactions)\nsurrounded by smaller **dimension tables** (descriptive attributes). Dimensions\nare intentionally denormalized for fast, simple queries.\n\n```sql\n-- Fact table: one row per sale event\nCREATE TABLE fact_sales (\n  sale_id     BIGINT PRIMARY KEY,\n  date_key    INT REFERENCES dim_date(date_key),\n  product_key INT REFERENCES dim_product(product_key),\n  store_key   INT REFERENCES dim_store(store_key),\n  quantity    INT NOT NULL,\n  revenue     NUMERIC(12,2) NOT NULL\n);\n\n-- Dimension: denormalized (city + country in same row, no 3NF)\nCREATE TABLE dim_store (\n  store_key INT PRIMARY KEY,\n  name      TEXT,\n  city      TEXT,\n  country   TEXT\n);\n```\n\nOLTP schemas normalize to avoid write anomalies. OLAP star schemas\ndenormalize to minimize joins and maximize scan throughput for analytics.\n\n**Rule of thumb:** use a normalized 3NF schema for transactional\napplications; use a star or snowflake schema for analytical\u002FBI workloads.\nDon't mix them — ETL pipelines transform data between the two.\n",{"id":2227,"difficulty":106,"q":2228,"a":2229},"normalization-vs-performance","How do you balance normalization and query performance in practice?","Fully normalized schemas can require many joins, which hurts read\nperformance on large datasets. Common pragmatic trade-offs:\n\n1. **Add indexes before denormalizing** — a join on indexed FKs is fast.\n   Denormalization should only be considered after indexes fail to help.\n2. **Materialized views \u002F summary tables** — precompute expensive aggregates\n   without changing the base schema.\n3. **Selective redundancy** — add a `cached_count` column or a denormalized\n   `status` flag where read frequency vastly exceeds write frequency.\n4. **Separate OLAP schema** — replicate data nightly into a star schema for\n   reporting; keep OLTP tables normalized.\n\n```sql\n-- Before denormalizing: try an index on the join column\nCREATE INDEX idx_orders_customer_id ON orders (customer_id);\n-- EXPLAIN ANALYZE to verify it is used before adding redundant columns\nEXPLAIN ANALYZE\n  SELECT u.name, COUNT(o.id)\n  FROM users u JOIN orders o ON o.user_id = u.id\n  GROUP BY u.id;\n```\n\n**Rule of thumb:** profile with real data before denormalizing. A missing\nindex is the most common \"normalization performance problem\" — and it is\na 30-second fix compared to the ongoing cost of maintaining denormalized data.\n",{"id":2231,"difficulty":106,"q":2232,"a":2233},"surrogate-vs-natural-key","What is the difference between a surrogate key and a natural key?","- **Natural key**: a column (or set of columns) from the real-world domain\n  that uniquely identifies an entity — e.g., email address, ISBN, social\n  security number. Natural keys carry meaning but can change over time.\n- **Surrogate key**: a system-generated identifier with no business meaning —\n  e.g., an auto-increment `id` or UUID. It never changes and has no\n  domain semantics.\n\n```sql\n-- Natural key PK (email can change → cascading updates on all FKs)\nCREATE TABLE users (email TEXT PRIMARY KEY, name TEXT);\n\n-- Surrogate PK (id never changes; email change is isolated to one row)\nCREATE TABLE users (\n  id    INT  GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n  email TEXT NOT NULL UNIQUE,\n  name  TEXT NOT NULL\n);\n```\n\n**Rule of thumb:** use a surrogate PK for every table; declare natural keys\nas `UNIQUE` constraints alongside the surrogate. This gives you the integrity\nguarantee of natural keys without the cascade pain of changing a PK.\n",{"id":2235,"difficulty":127,"q":2236,"a":2237},"fourth-normal-form","What is Fourth Normal Form (4NF) and what does it address?","**4NF** eliminates **multi-valued dependencies (MVDs)** — independent\nmany-to-many relationships stored in a single table, which causes\nmultiplicative row explosion.\n\n```\n-- Violates 4NF: Employee can have many Skills AND many Projects, independently.\n-- | emp_id | skill      | project   |\n-- | 1      | SQL        | Alpha     |\n-- | 1      | SQL        | Beta      |  ← duplicating the SQL row for Beta\n-- | 1      | Python     | Alpha     |  ← duplicating Alpha row for Python\n-- | 1      | Python     | Beta      |\n\n-- 4NF: split into two separate tables\n-- EmployeeSkills(emp_id, skill)\n-- EmployeeProjects(emp_id, project)\n```\n\nAdding a new skill in the original table requires adding rows for every\nproject combination, creating redundancy and anomalies.\n\n**Rule of thumb:** when two many-to-many relationships are independent of\neach other but share the same entity, split them into two separate\njoin tables rather than combining them into one.\n",{"id":2239,"difficulty":114,"q":2240,"a":2241},"normalization-checklist","How do you quickly check if a table is in 3NF?","Run through this checklist:\n\n1. **1NF check**: every cell contains one atomic value; there is a primary key.\n2. **2NF check**: if the PK is composite, every non-key column depends on the\n   *entire* PK, not just part of it.\n3. **3NF check**: no non-key column determines another non-key column\n   (no transitive dependencies — e.g., `zip → city` when `zip` is not the PK).\n\n```sql\n-- Red flags that indicate a 3NF violation:\n-- 1. A column stores concatenated values ('red,blue,green')\n-- 2. Repeated column groups (phone1, phone2, phone3)\n-- 3. A non-PK column appears in WHERE of a JOIN as if it were a PK\n-- 4. Updating one row's value requires updating dozens of other rows\n-- 5. You cannot add a fact without inserting an unrelated row\n```\n\n**Rule of thumb:** if you can answer \"what does each column tell you about?\"\nand the answer is always \"it tells you something about the primary key (and\nonly the primary key)\", the table is in 3NF.\n",{"id":2243,"difficulty":114,"q":2244,"a":2245},"junction-table","What is a junction (bridge) table and when do you use one?","A **junction table** (also called a bridge or associative table) resolves a\n**many-to-many relationship** between two entities into two one-to-many\nrelationships. It stores the *association* as rows rather than as repeated\ncolumns.\n\n```sql\n-- Many students can enroll in many courses\nCREATE TABLE students (id INT PRIMARY KEY, name TEXT NOT NULL);\nCREATE TABLE courses  (id INT PRIMARY KEY, title TEXT NOT NULL);\n\n-- Junction table: one row per (student, course) pair\nCREATE TABLE enrollments (\n  student_id  INT NOT NULL REFERENCES students(id) ON DELETE CASCADE,\n  course_id   INT NOT NULL REFERENCES courses(id)  ON DELETE CASCADE,\n  enrolled_at DATE NOT NULL DEFAULT CURRENT_DATE,\n  PRIMARY KEY (student_id, course_id)\n);\n```\n\nThe junction table can carry **payload columns** (enrolled_at, grade) that\ndescribe the relationship itself — something impossible to store on either\nside alone.\n\n**Rule of thumb:** whenever two entities have a many-to-many relationship,\nalways model it with a junction table. Never store comma-separated IDs in a\nsingle column as an alternative.\n",{"id":2247,"difficulty":106,"q":2248,"a":2249},"when-not-to-normalize","When should you deliberately stop normalizing?","Normalization is not always the right answer. Practical reasons to stop\nbefore reaching 3NF or BCNF:\n\n1. **Read performance is the primary concern** — heavily queried analytical\n   tables benefit from fewer joins, even at the cost of redundancy.\n2. **The relationship is stable** — if a denormalized value (e.g., a country\n   name embedded in every row) will almost never change, the update anomaly\n   risk is negligible.\n3. **External schema constraints** — integrating with a vendor schema or\n   legacy system you cannot change.\n4. **Simplicity for small, short-lived data** — a temporary staging table or\n   a one-off report table does not need 3NF rigor.\n\n```sql\n-- Acceptable denormalization: reporting snapshot\n-- Copies customer_name at the time of the order; intentionally redundant\n-- so historical reports are stable even if the customer renames.\nCREATE TABLE order_snapshots (\n  order_id      BIGINT PRIMARY KEY,\n  customer_id   INT    NOT NULL,\n  customer_name TEXT   NOT NULL,   -- denormalized snapshot\n  total         NUMERIC(12,2) NOT NULL,\n  snapped_at    TIMESTAMPTZ NOT NULL DEFAULT now()\n);\n```\n\n**Rule of thumb:** normalize transactional data to 3NF by default. Deviate\ndeliberately, document the reason, and ensure the update path (trigger, ETL,\napplication code) is clearly owned and tested.\n",{"description":104},"SQL normalization interview questions — 1NF through 3NF\u002FBCNF, functional dependencies, update anomalies, denormalization trade-offs, and practical schema design patterns.","sql\u002Fschema\u002Fnormalization","yogAW6WtW466-6r7gxd1LZhJEPh_v0Vd470WBErGp4Y",{"id":2255,"title":2256,"body":2257,"description":104,"difficulty":106,"extension":107,"framework":10,"frameworkSlug":12,"meta":2261,"navigation":109,"order":56,"path":2262,"questions":2263,"questionsCount":323,"related":247,"seo":2324,"seoDescription":2325,"stem":2326,"subtopic":2256,"topic":21,"topicSlug":22,"updated":328,"__hash__":2327},"qa\u002Fsql\u002Fbasics\u002Fset-operations.md","Set Operations",{"type":101,"value":2258,"toc":2259},[],{"title":104,"searchDepth":30,"depth":30,"links":2260},[],{},"\u002Fsql\u002Fbasics\u002Fset-operations",[2264,2268,2272,2276,2280,2284,2288,2292,2296,2300,2304,2308,2312,2316,2320],{"id":2265,"difficulty":114,"q":2266,"a":2267},"what-are-set-operations","What are SQL set operations?","Set operations combine the results of **two or more `SELECT` queries vertically** —\nstacking or comparing rows rather than joining columns. The three are `UNION`\n(combine), `INTERSECT` (common rows), and `EXCEPT`\u002F`MINUS` (difference).\n\n```sql\nSELECT id FROM current_users\nUNION\nSELECT id FROM archived_users;   -- all ids from either table\n```\n\nBoth queries must produce **compatible columns** (same count, compatible types).\nSet operations work on whole rows, like mathematical set algebra.\n\nRule of thumb: joins combine tables side-by-side; set operations stack them\ntop-to-bottom.\n",{"id":2269,"difficulty":114,"q":2270,"a":2271},"union-vs-union-all","What is the difference between UNION and UNION ALL?","`UNION` combines the rows of both queries and **removes duplicates**; `UNION ALL`\nkeeps **every row**, including duplicates. Because dedup requires a sort\u002Fhash,\n`UNION` is slower.\n\n```sql\n-- deduped: a user in both lists appears once\nSELECT email FROM newsletter UNION SELECT email FROM customers;\n\n-- all rows kept; faster, but duplicates remain\nSELECT email FROM newsletter UNION ALL SELECT email FROM customers;\n```\n\nUse `UNION ALL` when you know the inputs are disjoint or duplicates are fine — it\nskips the expensive dedup step.\n\nRule of thumb: default to `UNION ALL` unless you specifically need duplicates\nremoved.\n",{"id":2273,"difficulty":106,"q":2274,"a":2275},"column-compatibility","What rules must the queries in a UNION satisfy?","Each query must have the **same number of columns**, in the **same order**, with\n**compatible data types** positionally. Column *names* needn't match — the result\ntakes its names from the **first** query.\n\n```sql\nSELECT id, name        FROM employees\nUNION ALL\nSELECT id, full_name   FROM contractors;   -- result columns: id, name\n```\n\nType mismatches either error or force an implicit cast. If a query has fewer\ncolumns, add literals\u002F`NULL`s to line them up.\n\nRule of thumb: align column count, order, and types; names come from the first\nquery.\n",{"id":2277,"difficulty":106,"q":2278,"a":2279},"intersect","What does INTERSECT do?","`INTERSECT` returns only the rows that appear in **both** result sets, deduplicated.\nIt's the set intersection of the two queries.\n\n```sql\n-- users who are both newsletter subscribers AND customers\nSELECT email FROM newsletter\nINTERSECT\nSELECT email FROM customers;\n```\n\nLike `UNION`, it dedupes by default; `INTERSECT ALL` (where supported) keeps the\nminimum count of duplicates. MySQL gained `INTERSECT` in 8.0.31; older versions\nemulate it with `IN`\u002F`EXISTS`.\n\nRule of thumb: `INTERSECT` = \"rows present in both queries.\"\n",{"id":2281,"difficulty":106,"q":2282,"a":2283},"except-minus","What do EXCEPT and MINUS do?","`EXCEPT` (Postgres\u002FSQL Server\u002Fstandard) and `MINUS` (Oracle) return rows from the\n**first** query that are **not** in the second — set difference. They're the same\noperation under two names.\n\n```sql\n-- users who signed up but never ordered\nSELECT id FROM users\nEXCEPT\nSELECT user_id FROM orders;\n```\n\nOrder matters: `A EXCEPT B` ≠ `B EXCEPT A`. It dedupes by default; `EXCEPT ALL`\nkeeps duplicate multiplicity. It's a clean way to express an anti-join when you only\nneed the keys.\n\nRule of thumb: `EXCEPT`\u002F`MINUS` = \"rows in the first query but not the second.\"\n",{"id":2285,"difficulty":106,"q":2286,"a":2287},"union-null-handling","How do set operations treat NULLs in deduplication?","For deduplication, set operations treat two `NULL`s as **equal** (unlike `=`, where\n`NULL = NULL` is unknown). So `UNION`\u002F`INTERSECT`\u002F`EXCEPT` collapse duplicate\nNULL-containing rows.\n\n```sql\n-- the two NULL rows are treated as duplicates and collapse to one\nSELECT NULL AS x UNION SELECT NULL;   -- returns a single NULL row\n```\n\nThis \"NULL-as-equal\" rule is why `EXCEPT` works as an anti-join even with NULLs,\nwhereas `NOT IN` famously breaks on them.\n\nRule of thumb: set operations consider NULLs equal for dedup; row comparisons (`=`)\ndon't.\n",{"id":2289,"difficulty":106,"q":2290,"a":2291},"order-by-with-union","How do you sort or limit the result of a UNION?","Apply `ORDER BY` \u002F `LIMIT` to the **whole combined result**, once, at the very end —\nnot to the individual `SELECT`s. The sort refers to the result columns (by name or\nposition).\n\n```sql\nSELECT name, created_at FROM users\nUNION ALL\nSELECT name, created_at FROM archived_users\nORDER BY created_at DESC\nLIMIT 20;\n```\n\nTo order *within* a branch (e.g. limit each side), wrap each `SELECT` in a subquery\nor CTE first.\n\nRule of thumb: one trailing `ORDER BY`\u002F`LIMIT` applies to the entire set result.\n",{"id":2293,"difficulty":114,"q":2294,"a":2295},"union-vs-join","When do you use UNION instead of a JOIN?","Use `UNION` to **append rows** from sources with the **same shape** (current +\narchived orders, multiple regions' tables). Use a `JOIN` to **add columns** by\nrelating rows across tables on a key.\n\n```sql\n-- UNION: more rows, same columns\nSELECT id, total FROM orders_2025\nUNION ALL\nSELECT id, total FROM orders_2026;\n```\n\nA telltale sign you want `UNION` is \"combine these similarly-structured datasets into\none list.\"\n\nRule of thumb: same columns, more rows → `UNION`; related tables, more columns →\n`JOIN`.\n",{"id":2297,"difficulty":106,"q":2298,"a":2299},"union-performance","Why is UNION ALL faster than UNION?","`UNION` must **deduplicate** the combined result, which means sorting or hashing all\nrows — extra CPU and memory. `UNION ALL` simply concatenates the inputs and streams\nthem out, with no dedup step.\n\n```sql\n-- no dedup work; fastest when you know rows can't overlap\nSELECT id FROM a UNION ALL SELECT id FROM b;\n```\n\nOn large datasets the difference is significant. Only pay for `UNION` when duplicates\nare actually possible and unwanted.\n\nRule of thumb: prefer `UNION ALL` and only upgrade to `UNION` when dedup is truly\nneeded.\n",{"id":2301,"difficulty":127,"q":2302,"a":2303},"emulate-intersect-mysql","How do you emulate INTERSECT or EXCEPT without native support?","Where `INTERSECT`\u002F`EXCEPT` aren't available (older MySQL), use `IN`\u002F`EXISTS` for\nintersection and `NOT EXISTS`\u002F`LEFT JOIN ... IS NULL` for difference.\n\n```sql\n-- INTERSECT emulation\nSELECT DISTINCT email FROM newsletter n\nWHERE EXISTS (SELECT 1 FROM customers c WHERE c.email = n.email);\n\n-- EXCEPT emulation\nSELECT DISTINCT id FROM users u\nWHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);\n```\n\nPrefer `EXISTS`\u002F`NOT EXISTS` over `IN`\u002F`NOT IN` here to stay NULL-safe.\n\nRule of thumb: emulate `INTERSECT` with `EXISTS`, `EXCEPT` with `NOT EXISTS`.\n",{"id":2305,"difficulty":127,"q":2306,"a":2307},"precedence-set-ops","What is the precedence of set operators when you chain them?","The SQL standard gives `INTERSECT` **higher precedence** than `UNION` and `EXCEPT`,\nso `INTERSECT` binds first. When mixing them, use **parentheses** to make the\nintended grouping explicit.\n\n```sql\n-- ambiguous to readers — parenthesize instead\n(SELECT id FROM a UNION SELECT id FROM b)\nEXCEPT\nSELECT id FROM c;\n```\n\nDifferent databases have followed the rule inconsistently over the years, so never\nrely on implicit precedence in a chain.\n\nRule of thumb: parenthesize any query that mixes `UNION`\u002F`INTERSECT`\u002F`EXCEPT`.\n",{"id":2309,"difficulty":114,"q":2310,"a":2311},"union-distinct-keyword","Is there a difference between UNION and UNION DISTINCT?","No — `UNION` and `UNION DISTINCT` are **identical**; `DISTINCT` is just the explicit\ndefault. Some teams write `UNION DISTINCT` to make the dedup intent obvious next to\n`UNION ALL`.\n\n```sql\nSELECT id FROM a UNION DISTINCT SELECT id FROM b;  -- same as plain UNION\n```\n\nLikewise `INTERSECT`\u002F`EXCEPT` default to `DISTINCT`, with optional `ALL` variants.\n\nRule of thumb: `UNION` already means `UNION DISTINCT`; write `ALL` when you want\nduplicates.\n",{"id":2313,"difficulty":106,"q":2314,"a":2315},"combining-different-tables","How do you tag which source each UNION row came from?","Add a **literal column** in each branch to label the source. After combining, that\ncolumn tells you where each row originated.\n\n```sql\nSELECT id, total, 'online'  AS channel FROM online_orders\nUNION ALL\nSELECT id, total, 'instore' AS channel FROM instore_orders;\n```\n\nThis is handy for merging similar feeds while keeping provenance, and lets you later\nfilter or group by `channel`.\n\nRule of thumb: add a constant label column per branch to track each row's source.\n",{"id":2317,"difficulty":106,"q":2318,"a":2319},"distinct-vs-union","Is SELECT DISTINCT the same as UNION for one query?","They overlap but aren't the same. `UNION` dedups across **two** queries; for a\n**single** query, `SELECT DISTINCT` dedups its rows. `SELECT ... UNION SELECT ...`\nwith one branch is just a slow `DISTINCT`.\n\n```sql\n-- these return the same rows; the second is clearer\nSELECT city FROM users UNION SELECT city FROM users;\nSELECT DISTINCT city FROM users;\n```\n\nUse `DISTINCT` for single-query dedup and reserve `UNION` for combining separate\nresult sets.\n\nRule of thumb: dedup one query with `DISTINCT`; combine queries with `UNION`.\n",{"id":2321,"difficulty":127,"q":2322,"a":2323},"type-coercion-union","What happens when column types differ across a UNION?","The database tries to find a **common type** and implicitly casts both sides to it\n(e.g. `int` and `numeric` → `numeric`). If no safe common type exists (text vs date),\nit raises an error.\n\n```sql\n-- int + numeric unify to numeric; fine\nSELECT 1 AS n UNION ALL SELECT 2.5;\n\n-- text + date: cast explicitly to avoid surprises\u002Ferrors\nSELECT created_at::text FROM a UNION ALL SELECT label FROM b;\n```\n\nImplicit coercion can also change precision\u002Fformatting unexpectedly, so cast\nexplicitly when the types aren't obviously compatible.\n\nRule of thumb: cast mismatched columns explicitly so the unified type is intentional.\n",{"description":104},"SQL set operation interview questions — UNION vs UNION ALL, INTERSECT, EXCEPT, column compatibility rules, deduplication cost and dialect differences.","sql\u002Fbasics\u002Fset-operations","DS1hmRt82KqRfZsGCop5-Jpe-0c6XL0qn_l2POiZR3I",1782244097882]