Skip to content

SQL · Schema & Data Types

Database Normalization — 1NF, 2NF, 3NF, and When to Denormalize

5 min read Updated 2026-06-20 Share:

Practice Normalization interview questions

Why normalization exists

Normalization is a process of structuring a relational database to reduce data redundancy and prevent update anomalies — bugs where the same fact is stored in multiple places and they fall out of sync. The formal definition comes in numbered normal forms (NF). In practice, reaching 3NF is the standard goal for OLTP databases.

The unnormalized starting point

Imagine you started with a single spreadsheet for orders:

order_id | customer_name | customer_email       | product_ids   | product_names       | total
---------|---------------|----------------------|---------------|---------------------|-------
1001     | Alice Brown   | alice@example.com    | 42, 55        | Keyboard, Mouse     | 150.00
1002     | Alice Brown   | alice@example.com    | 42            | Keyboard            | 75.00
1003     | Bob Smith     | bob@example.com      | 55            | Mouse               | 75.00

Problems: Alice's email is duplicated. product_ids and product_names are comma-lists in one cell. If Alice changes her email, we must update every row. If a product is renamed, we must find every order containing it.

First Normal Form (1NF) — no repeating groups

1NF requires that each column holds a single atomic value — no comma-lists, arrays, or JSON blobs where each element represents a separate fact.

Fix: split product_ids / product_names into a separate order_items table with one row per product per order.

CREATE TABLE orders (
    id           BIGINT       PRIMARY KEY,
    customer_name  VARCHAR(100) NOT NULL,
    customer_email VARCHAR(254) NOT NULL,
    total_amount   NUMERIC(10,2) NOT NULL,
    created_at     TIMESTAMPTZ  NOT NULL
);

CREATE TABLE order_items (
    order_id    BIGINT NOT NULL REFERENCES orders(id),
    product_id  INT    NOT NULL,
    product_name VARCHAR(200) NOT NULL,
    unit_price  NUMERIC(10,2) NOT NULL,
    quantity    INT NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

Now each cell holds one value. But customer_name and customer_email are still duplicated across every order for the same customer.

Second Normal Form (2NF) — no partial dependencies

2NF applies to tables with composite primary keys. It requires that every non-key column depends on the whole key, not just part of it.

In order_items, product_name depends only on product_id (part of the key), not on (order_id, product_id). That is a partial dependency — a 2NF violation.

Fix: move product_name to a products table keyed on product_id.

CREATE TABLE products (
    id           INT          PRIMARY KEY,
    product_name VARCHAR(200) NOT NULL,
    unit_price   NUMERIC(10,2) NOT NULL
);

CREATE TABLE order_items (
    order_id    BIGINT NOT NULL REFERENCES orders(id),
    product_id  INT    NOT NULL REFERENCES products(id),
    quantity    INT    NOT NULL,
    unit_price  NUMERIC(10,2) NOT NULL,  -- price at time of order (snapshot)
    PRIMARY KEY (order_id, product_id)
);

Third Normal Form (3NF) — no transitive dependencies

3NF requires that non-key columns depend only on the primary key, not on other non-key columns (a transitive dependency).

In the orders table, customer_email depends on customer_name, not on order_id. If we store customer_namecustomer_email in every order, updating the email requires finding every order for that customer.

Fix: extract customers into their own table.

CREATE TABLE customers (
    id         BIGINT       PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    name       VARCHAR(100) NOT NULL,
    email      VARCHAR(254) NOT NULL UNIQUE,
    created_at TIMESTAMPTZ  NOT NULL DEFAULT NOW()
);

CREATE TABLE orders (
    id           BIGINT        PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    customer_id  BIGINT        NOT NULL REFERENCES customers(id),
    total_amount NUMERIC(10,2) NOT NULL,
    created_at   TIMESTAMPTZ   NOT NULL DEFAULT NOW()
);

Now updating a customer's email requires changing exactly one row in customers. All their orders automatically reflect the new email via the join.

The final 3NF schema

-- customers (one row per customer)
-- products  (one row per product)
-- orders    (one row per order, references customers)
-- order_items (one row per product per order, references orders + products)

-- Query: full order details
SELECT
    o.id             AS order_id,
    c.name           AS customer_name,
    p.product_name,
    oi.quantity,
    oi.unit_price,
    oi.quantity * oi.unit_price AS line_total
FROM orders      o
JOIN customers   c  ON c.id = o.customer_id
JOIN order_items oi ON oi.order_id = o.id
JOIN products    p  ON p.id = oi.product_id
WHERE o.id = 1001;

When to denormalize

Normalization is right for OLTP (transactional writes). For analytics and reporting (OLAP), denormalization trades write anomaly risk for read performance.

-- Denormalized analytics table (pre-joined, no further joins needed)
CREATE TABLE order_facts AS
SELECT
    o.id,
    o.created_at,
    c.country_code,
    p.category,
    oi.quantity * oi.unit_price AS line_revenue
FROM orders o
JOIN customers   c  ON c.id = o.customer_id
JOIN order_items oi ON oi.order_id = o.id
JOIN products    p  ON p.id = oi.product_id;

This table is a snapshot — data warehouse tables, materialized views, and reporting replicas are acceptable places for denormalization. The normalized source tables remain the system of record for writes.

Recap

Normalize your OLTP schema to at least 3NF: 1NF eliminates repeating groups (no arrays in cells); 2NF eliminates partial dependencies (non-key columns must depend on the whole key); 3NF eliminates transitive dependencies (non-key columns must depend only on the primary key). Denormalization is a deliberate optimisation for read-heavy workloads — always denormalize a copy, not the source of truth.

More ways to practice

The self-quiz is live. Get notified when mock interviews and new question packs drop.

or
Join our WhatsApp Channel