[{"data":1,"prerenderedAt":110},["ShallowReactive",2],{"qa-\u002Fsql\u002Fschema\u002Fnormalization":3},{"page":4,"siblings":94,"blog":107},{"id":5,"title":6,"body":7,"description":11,"difficulty":14,"extension":15,"framework":16,"frameworkSlug":17,"meta":18,"navigation":19,"order":20,"path":21,"questions":22,"questionsCount":85,"related":86,"seo":87,"seoDescription":88,"stem":89,"subtopic":6,"topic":90,"topicSlug":91,"updated":92,"__hash__":93},"qa\u002Fsql\u002Fschema\u002Fnormalization.md","Normalization",{"type":8,"value":9,"toc":10},"minimark",[],{"title":11,"searchDepth":12,"depth":12,"links":13},"",2,[],"medium","md","SQL","sql",{},true,4,"\u002Fsql\u002Fschema\u002Fnormalization",[23,28,32,36,40,44,48,53,57,61,65,69,73,77,81],{"id":24,"difficulty":25,"q":26,"a":27},"what-is-normalization","easy","What is database normalization and why does it matter?","**Normalization** is the process of organizing a relational schema to\n**reduce data redundancy** and **prevent update anomalies**. The theory\nwas introduced by E.F. Codd and is expressed as a series of **normal forms**\n(1NF, 2NF, 3NF, BCNF …) — each one stricter than the last.\n\n```sql\n-- Unnormalized: storing multiple values in one cell (violates 1NF)\n-- orders: | id | customer | items              |\n-- data:   | 1  | Alice    | 'Pen, Pencil, Pad' |\n\n-- After normalization: separate tables, one fact per cell\n-- orders: | id | customer_id |\n-- order_items: | order_id | product_id | qty |\n```\n\nBenefits: no redundant copies to keep in sync, constraints are enforceable,\nqueries are composable.\n\n**Rule of thumb:** normalize to at least **3NF** for transactional\n(OLTP) schemas; selectively denormalize for read-heavy reporting (OLAP) only\nwhen profiling proves it necessary.\n",{"id":29,"difficulty":14,"q":30,"a":31},"update-anomalies","What are the three update anomalies normalization prevents?","Un-normalized schemas suffer from three anomalies that make data unreliable:\n\n1. **Insertion anomaly** — you cannot store a fact without also storing\n   another unrelated fact. E.g., you cannot record a new department unless\n   you also have an employee for it.\n2. **Update anomaly** — the same fact appears in multiple rows. Changing a\n   manager's name requires updating every row for every employee in that\n   department. Miss one row → inconsistent data.\n3. **Deletion anomaly** — deleting a row removes more facts than intended.\n   Delete the last employee in a department and you lose the department's\n   name\u002Flocation too.\n\n```sql\n-- Bad: employee table stores department info in every row\n-- | emp_id | emp_name | dept_id | dept_name  | dept_location |\n-- If you rename the dept, you must update EVERY employee row.\n\n-- Fixed (normalized): dept facts live in one place\nCREATE TABLE departments (id INT PRIMARY KEY, name TEXT, location TEXT);\nCREATE TABLE employees (id INT PRIMARY KEY, name TEXT, dept_id INT REFERENCES departments(id));\n```\n\n**Rule of thumb:** if the same value must be updated in more than one row\nto keep the data consistent, you have a normalization problem.\n",{"id":33,"difficulty":25,"q":34,"a":35},"first-normal-form","What is First Normal Form (1NF)?","A table is in **1NF** when:\n1. Every column contains **atomic** (indivisible) values — no sets, lists,\n   or repeating groups inside a single cell.\n2. Every column contains values of a **single type**.\n3. Each row is **uniquely identifiable** (there is a primary key).\n\n```sql\n-- Violates 1NF: multiple phone numbers in one column\n-- | id | name  | phones               |\n-- | 1  | Alice | '555-1234, 555-5678' |\n\n-- 1NF compliant: separate table for multi-valued attribute\nCREATE TABLE contacts (id INT PRIMARY KEY, name TEXT);\nCREATE TABLE contact_phones (\n  contact_id INT REFERENCES contacts(id),\n  phone      TEXT NOT NULL,\n  PRIMARY KEY (contact_id, phone)\n);\n```\n\n**Rule of thumb:** if a cell contains comma-separated values or you find\nyourself doing `LIKE '%value%'` to search within a column, the table\nviolates 1NF and needs to be split.\n",{"id":37,"difficulty":14,"q":38,"a":39},"functional-dependency","What is a functional dependency?","A **functional dependency** (FD) `A → B` means that knowing the value of\n`A` uniquely determines the value of `B`. In a table, a functional\ndependency is a constraint on which combinations of values are valid.\n\n```\n-- In an orders table:\norder_id → customer_id      (each order has exactly one customer)\norder_id → order_date\n(order_id, product_id) → quantity   (composite key determines quantity)\n\n-- Problematic FD in an unnormalized table:\ndept_id → dept_name         (department name depends only on dept_id,\n                             not on the full PK of the employee row)\n```\n\nUnderstanding FDs is the foundation of 2NF and 3NF: each normal form\nremoves a class of problematic FDs from the schema.\n\n**Rule of thumb:** draw out the FDs before designing a schema. Every\nnon-key column should depend on **the whole key and nothing but the key**.\n(This is essentially the definition of 3NF in plain English.)\n",{"id":41,"difficulty":14,"q":42,"a":43},"second-normal-form","What is Second Normal Form (2NF) and what does it fix?","A table is in **2NF** when it is in 1NF and every non-key column is\n**fully functionally dependent on the whole primary key** — not just part\nof it. 2NF only matters when the PK is composite.\n\n```sql\n-- Violates 2NF: PK is (order_id, product_id) but product_name\n-- depends only on product_id (partial dependency)\n-- | order_id | product_id | product_name | quantity |\n\n-- Fix: split product facts into their own table\nCREATE TABLE products (\n  id   INT PRIMARY KEY,\n  name TEXT NOT NULL\n);\nCREATE TABLE order_items (\n  order_id   INT REFERENCES orders(id),\n  product_id INT REFERENCES products(id),\n  quantity   INT NOT NULL,\n  PRIMARY KEY (order_id, product_id)\n);\n```\n\n**Rule of thumb:** if any non-key column depends on *part* of a composite\nprimary key, move those columns to a table where that partial key *is* the\nfull primary key.\n",{"id":45,"difficulty":14,"q":46,"a":47},"third-normal-form","What is Third Normal Form (3NF) and what does it eliminate?","A table is in **3NF** when it is in 2NF and **no non-key column determines\nanother non-key column** (no transitive dependencies).\n\n```sql\n-- Violates 3NF: zip_code → city (transitive: emp_id → zip_code → city)\n-- | emp_id | name  | zip_code | city     |\n\n-- Fix: move the transitive dependency to its own table\nCREATE TABLE zip_codes (zip TEXT PRIMARY KEY, city TEXT NOT NULL);\nCREATE TABLE employees (\n  id       INT PRIMARY KEY,\n  name     TEXT NOT NULL,\n  zip_code TEXT REFERENCES zip_codes(zip)\n);\n```\n\nThe classic mnemonic: **\"Every non-key attribute must depend on the key,\nthe whole key, and nothing but the key — so help me Codd.\"**\n\n**Rule of thumb:** if changing one non-key value (like a zip code) should\nautomatically update another non-key value (the city), those values belong\nin a separate table joined by a foreign key.\n",{"id":49,"difficulty":50,"q":51,"a":52},"bcnf","hard","What is Boyce-Codd Normal Form (BCNF) and how does it differ from 3NF?","**BCNF** (sometimes called 3.5NF) is a stricter version of 3NF. A table\nis in BCNF if, for every non-trivial functional dependency `A → B`, `A`\nis a **superkey** (a set of columns that uniquely identifies a row).\n\nBCNF and 3NF differ only when a table has **multiple overlapping candidate\nkeys**. 3NF allows a non-key column to determine part of another candidate\nkey; BCNF does not.\n\n```\n-- Classic BCNF violation: Tutors(student, subject, tutor)\n-- Candidate keys: (student, subject) and (student, tutor)\n-- FD: tutor → subject (a tutor teaches exactly one subject)\n-- This violates BCNF because 'tutor' is not a superkey.\n\n-- Fix (decompose):\n-- TutorSubject(tutor PK, subject)\n-- StudentTutor(student, tutor, FK tutor → TutorSubject)\n```\n\n**Rule of thumb:** BCNF matters in schemas with multiple candidate keys.\nIn practice, 3NF is the target for most applications; BCNF is pursued when\nredundancy in multi-key tables causes real anomalies.\n",{"id":54,"difficulty":14,"q":55,"a":56},"denormalization","What is denormalization and when is it justified?","**Denormalization** intentionally introduces redundancy into a schema to\nimprove **read performance** — typically by precomputing joins or\naggregations and caching their results in additional columns or tables.\n\n```sql\n-- Normalized: count must be computed with a JOIN every time\nSELECT u.id, COUNT(o.id) AS order_count\nFROM users u LEFT JOIN orders o ON o.user_id = u.id\nGROUP BY u.id;\n\n-- Denormalized: cache the count in the users table\nALTER TABLE users ADD COLUMN order_count INT NOT NULL DEFAULT 0;\n\n-- Maintain via trigger or application logic on each insert\u002Fdelete\nUPDATE users SET order_count = order_count + 1 WHERE id = NEW.user_id;\n```\n\nTrade-offs:\n- ✅ Faster reads, simpler queries, reduced join cost.\n- ❌ Write complexity — must keep redundant copies in sync.\n- ❌ Risk of inconsistency if update logic is missed.\n\n**Rule of thumb:** normalize first; denormalize only after profiling shows\nthat a specific query is a bottleneck *and* the added write complexity is\nworth the read gain. Document every denormalized column with a comment\nexplaining what it caches and how it is maintained.\n",{"id":58,"difficulty":14,"q":59,"a":60},"star-schema","What is a star schema and how does it differ from a normalized OLTP schema?","A **star schema** is a denormalized dimensional model used in **data\nwarehouses (OLAP)**. It centers on a large **fact table** (events\u002Ftransactions)\nsurrounded by smaller **dimension tables** (descriptive attributes). Dimensions\nare intentionally denormalized for fast, simple queries.\n\n```sql\n-- Fact table: one row per sale event\nCREATE TABLE fact_sales (\n  sale_id     BIGINT PRIMARY KEY,\n  date_key    INT REFERENCES dim_date(date_key),\n  product_key INT REFERENCES dim_product(product_key),\n  store_key   INT REFERENCES dim_store(store_key),\n  quantity    INT NOT NULL,\n  revenue     NUMERIC(12,2) NOT NULL\n);\n\n-- Dimension: denormalized (city + country in same row, no 3NF)\nCREATE TABLE dim_store (\n  store_key INT PRIMARY KEY,\n  name      TEXT,\n  city      TEXT,\n  country   TEXT\n);\n```\n\nOLTP schemas normalize to avoid write anomalies. OLAP star schemas\ndenormalize to minimize joins and maximize scan throughput for analytics.\n\n**Rule of thumb:** use a normalized 3NF schema for transactional\napplications; use a star or snowflake schema for analytical\u002FBI workloads.\nDon't mix them — ETL pipelines transform data between the two.\n",{"id":62,"difficulty":14,"q":63,"a":64},"normalization-vs-performance","How do you balance normalization and query performance in practice?","Fully normalized schemas can require many joins, which hurts read\nperformance on large datasets. Common pragmatic trade-offs:\n\n1. **Add indexes before denormalizing** — a join on indexed FKs is fast.\n   Denormalization should only be considered after indexes fail to help.\n2. **Materialized views \u002F summary tables** — precompute expensive aggregates\n   without changing the base schema.\n3. **Selective redundancy** — add a `cached_count` column or a denormalized\n   `status` flag where read frequency vastly exceeds write frequency.\n4. **Separate OLAP schema** — replicate data nightly into a star schema for\n   reporting; keep OLTP tables normalized.\n\n```sql\n-- Before denormalizing: try an index on the join column\nCREATE INDEX idx_orders_customer_id ON orders (customer_id);\n-- EXPLAIN ANALYZE to verify it is used before adding redundant columns\nEXPLAIN ANALYZE\n  SELECT u.name, COUNT(o.id)\n  FROM users u JOIN orders o ON o.user_id = u.id\n  GROUP BY u.id;\n```\n\n**Rule of thumb:** profile with real data before denormalizing. A missing\nindex is the most common \"normalization performance problem\" — and it is\na 30-second fix compared to the ongoing cost of maintaining denormalized data.\n",{"id":66,"difficulty":14,"q":67,"a":68},"surrogate-vs-natural-key","What is the difference between a surrogate key and a natural key?","- **Natural key**: a column (or set of columns) from the real-world domain\n  that uniquely identifies an entity — e.g., email address, ISBN, social\n  security number. Natural keys carry meaning but can change over time.\n- **Surrogate key**: a system-generated identifier with no business meaning —\n  e.g., an auto-increment `id` or UUID. It never changes and has no\n  domain semantics.\n\n```sql\n-- Natural key PK (email can change → cascading updates on all FKs)\nCREATE TABLE users (email TEXT PRIMARY KEY, name TEXT);\n\n-- Surrogate PK (id never changes; email change is isolated to one row)\nCREATE TABLE users (\n  id    INT  GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n  email TEXT NOT NULL UNIQUE,\n  name  TEXT NOT NULL\n);\n```\n\n**Rule of thumb:** use a surrogate PK for every table; declare natural keys\nas `UNIQUE` constraints alongside the surrogate. This gives you the integrity\nguarantee of natural keys without the cascade pain of changing a PK.\n",{"id":70,"difficulty":50,"q":71,"a":72},"fourth-normal-form","What is Fourth Normal Form (4NF) and what does it address?","**4NF** eliminates **multi-valued dependencies (MVDs)** — independent\nmany-to-many relationships stored in a single table, which causes\nmultiplicative row explosion.\n\n```\n-- Violates 4NF: Employee can have many Skills AND many Projects, independently.\n-- | emp_id | skill      | project   |\n-- | 1      | SQL        | Alpha     |\n-- | 1      | SQL        | Beta      |  ← duplicating the SQL row for Beta\n-- | 1      | Python     | Alpha     |  ← duplicating Alpha row for Python\n-- | 1      | Python     | Beta      |\n\n-- 4NF: split into two separate tables\n-- EmployeeSkills(emp_id, skill)\n-- EmployeeProjects(emp_id, project)\n```\n\nAdding a new skill in the original table requires adding rows for every\nproject combination, creating redundancy and anomalies.\n\n**Rule of thumb:** when two many-to-many relationships are independent of\neach other but share the same entity, split them into two separate\njoin tables rather than combining them into one.\n",{"id":74,"difficulty":25,"q":75,"a":76},"normalization-checklist","How do you quickly check if a table is in 3NF?","Run through this checklist:\n\n1. **1NF check**: every cell contains one atomic value; there is a primary key.\n2. **2NF check**: if the PK is composite, every non-key column depends on the\n   *entire* PK, not just part of it.\n3. **3NF check**: no non-key column determines another non-key column\n   (no transitive dependencies — e.g., `zip → city` when `zip` is not the PK).\n\n```sql\n-- Red flags that indicate a 3NF violation:\n-- 1. A column stores concatenated values ('red,blue,green')\n-- 2. Repeated column groups (phone1, phone2, phone3)\n-- 3. A non-PK column appears in WHERE of a JOIN as if it were a PK\n-- 4. Updating one row's value requires updating dozens of other rows\n-- 5. You cannot add a fact without inserting an unrelated row\n```\n\n**Rule of thumb:** if you can answer \"what does each column tell you about?\"\nand the answer is always \"it tells you something about the primary key (and\nonly the primary key)\", the table is in 3NF.\n",{"id":78,"difficulty":25,"q":79,"a":80},"junction-table","What is a junction (bridge) table and when do you use one?","A **junction table** (also called a bridge or associative table) resolves a\n**many-to-many relationship** between two entities into two one-to-many\nrelationships. It stores the *association* as rows rather than as repeated\ncolumns.\n\n```sql\n-- Many students can enroll in many courses\nCREATE TABLE students (id INT PRIMARY KEY, name TEXT NOT NULL);\nCREATE TABLE courses  (id INT PRIMARY KEY, title TEXT NOT NULL);\n\n-- Junction table: one row per (student, course) pair\nCREATE TABLE enrollments (\n  student_id  INT NOT NULL REFERENCES students(id) ON DELETE CASCADE,\n  course_id   INT NOT NULL REFERENCES courses(id)  ON DELETE CASCADE,\n  enrolled_at DATE NOT NULL DEFAULT CURRENT_DATE,\n  PRIMARY KEY (student_id, course_id)\n);\n```\n\nThe junction table can carry **payload columns** (enrolled_at, grade) that\ndescribe the relationship itself — something impossible to store on either\nside alone.\n\n**Rule of thumb:** whenever two entities have a many-to-many relationship,\nalways model it with a junction table. Never store comma-separated IDs in a\nsingle column as an alternative.\n",{"id":82,"difficulty":14,"q":83,"a":84},"when-not-to-normalize","When should you deliberately stop normalizing?","Normalization is not always the right answer. Practical reasons to stop\nbefore reaching 3NF or BCNF:\n\n1. **Read performance is the primary concern** — heavily queried analytical\n   tables benefit from fewer joins, even at the cost of redundancy.\n2. **The relationship is stable** — if a denormalized value (e.g., a country\n   name embedded in every row) will almost never change, the update anomaly\n   risk is negligible.\n3. **External schema constraints** — integrating with a vendor schema or\n   legacy system you cannot change.\n4. **Simplicity for small, short-lived data** — a temporary staging table or\n   a one-off report table does not need 3NF rigor.\n\n```sql\n-- Acceptable denormalization: reporting snapshot\n-- Copies customer_name at the time of the order; intentionally redundant\n-- so historical reports are stable even if the customer renames.\nCREATE TABLE order_snapshots (\n  order_id      BIGINT PRIMARY KEY,\n  customer_id   INT    NOT NULL,\n  customer_name TEXT   NOT NULL,   -- denormalized snapshot\n  total         NUMERIC(12,2) NOT NULL,\n  snapped_at    TIMESTAMPTZ NOT NULL DEFAULT now()\n);\n```\n\n**Rule of thumb:** normalize transactional data to 3NF by default. Deviate\ndeliberately, document the reason, and ensure the update path (trigger, ETL,\napplication code) is clearly owned and tested.\n",15,null,{"description":11},"SQL normalization interview questions — 1NF through 3NF\u002FBCNF, functional dependencies, update anomalies, denormalization trade-offs, and practical schema design patterns.","sql\u002Fschema\u002Fnormalization","Schema & Data Types","schema","2026-06-20","yogAW6WtW466-6r7gxd1LZhJEPh_v0Vd470WBErGp4Y",[95,99,102,106],{"subtopic":96,"path":97,"order":98},"Data Types","\u002Fsql\u002Fschema\u002Fdata-types",1,{"subtopic":100,"path":101,"order":12},"DDL — Creating & Altering Tables","\u002Fsql\u002Fschema\u002Fddl",{"subtopic":103,"path":104,"order":105},"Constraints & Integrity","\u002Fsql\u002Fschema\u002Fconstraints",3,{"subtopic":6,"path":21,"order":20},{"path":108,"title":109},"\u002Fblog\u002Fsql-normalization-1nf-2nf-3nf","Database Normalization — 1NF, 2NF, 3NF, and When to Denormalize",1782244107185]