DDL (Data Definition Language) defines and modifies the structure of
database objects — tables, indexes, views, sequences, schemas. The core
statements are CREATE, ALTER, DROP, and TRUNCATE.
DML (Data Manipulation Language) operates on the data inside those
objects — SELECT, INSERT, UPDATE, DELETE.
-- DDL: define structure
CREATE TABLE products (id SERIAL PRIMARY KEY, name TEXT NOT NULL);
ALTER TABLE products ADD COLUMN price NUMERIC(10,2);
DROP TABLE products;
-- DML: manipulate data
INSERT INTO products (name, price) VALUES ('Widget', 9.99);
UPDATE products SET price = 8.99 WHERE id = 1;
DELETE FROM products WHERE id = 1;
Rule of thumb: DDL changes persist after a transaction commits (and in most databases auto-commit immediately). DML changes can be rolled back within a transaction.
A CREATE TABLE statement names the table and declares each column with
its type and optional constraints. Common additions: a primary key, NOT NULL
markers, defaults, and foreign keys.
CREATE TABLE orders (
id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id INT NOT NULL REFERENCES customers(id),
status TEXT NOT NULL DEFAULT 'pending',
total NUMERIC(10, 2) NOT NULL CHECK (total >= 0),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Key parts:
- Column name + type — required for every column.
NOT NULL— rejectsNULLinserts; always add it unlessNULLis semantically meaningful.DEFAULT— value used when the column is omitted inINSERT.PRIMARY KEY— unique non-null identifier; creates an index automatically.REFERENCES— foreign-key constraint linking to a parent table.
Rule of thumb: be explicit about NOT NULL and DEFAULT on every
column — relying on implicit NULLability makes the schema ambiguous.
ALTER TABLE modifies an existing table: add/drop/rename columns, change
types, add/drop constraints, rename the table itself.
-- Add a new column with a default (safe — no full rewrite in Postgres 11+)
ALTER TABLE orders ADD COLUMN shipped_at TIMESTAMPTZ;
-- Drop a column (removes data permanently)
ALTER TABLE orders DROP COLUMN legacy_field;
-- Rename a column
ALTER TABLE orders RENAME COLUMN status TO order_status;
-- Change a type (may rewrite the whole table and lock it)
ALTER TABLE orders ALTER COLUMN total TYPE NUMERIC(14, 4);
-- Add a NOT NULL constraint (safe only if the column has no NULLs)
ALTER TABLE orders ALTER COLUMN shipped_at SET NOT NULL;
Risks:
- Type changes and adding
NOT NULLto an existing column may lock the table and run a full rewrite on large tables. - Dropping a column is irreversible without a backup.
Rule of thumb: for large production tables, test ALTER TABLE on a copy
first; use CONCURRENTLY options or online schema change tools (pt-online-schema-change,
gh-ost) where supported.
DROP TABLEremoves the table definition and all its data permanently. The table no longer exists.TRUNCATEremoves all rows from a table but leaves the table structure intact. It resets identity columns/sequences (in most databases) and is much faster thanDELETE FROM tablebecause it deallocates data pages rather than deleting row by row.
-- Remove the table entirely
DROP TABLE staging_data;
-- Empty the table but keep its structure
TRUNCATE TABLE staging_data;
-- TRUNCATE is faster than DELETE for clearing a whole table
-- DELETE FROM staging_data; -- slow: generates undo/redo for every row
In Postgres, TRUNCATE can be inside a transaction and rolled back; in
MySQL, TRUNCATE is DDL and implicitly commits.
Rule of thumb: use TRUNCATE to reset a staging/temp table between
runs; use DROP TABLE only when you no longer need the schema.
A schema is a named namespace inside a database that groups related
tables, views, functions, and other objects. In Postgres the default schema
is public; SQL Server uses dbo.
-- Create a schema
CREATE SCHEMA reporting;
-- Create a table inside it
CREATE TABLE reporting.monthly_revenue (
month DATE PRIMARY KEY,
total NUMERIC(14, 2) NOT NULL
);
-- Search path (Postgres): sets which schemas to look in without qualifying
SET search_path TO reporting, public;
SELECT * FROM monthly_revenue; -- resolves to reporting.monthly_revenue
Benefits:
- Logical grouping (
app,reporting,audit). - Separate permissions per schema.
- Avoids name collisions between different subsystems.
Rule of thumb: use schemas to separate concerns within a single
database (e.g., app for application tables, etl for staging tables,
audit for history).
A sequence is a database object that generates a monotonically
increasing series of integers. Auto-increment columns (SERIAL, IDENTITY)
are backed by a sequence internally.
-- Explicit sequence (Postgres)
CREATE SEQUENCE order_id_seq START 1000 INCREMENT 1;
-- Use it as a default
CREATE TABLE orders (
id INT DEFAULT nextval('order_id_seq') PRIMARY KEY
);
-- Advance and read the current value
SELECT nextval('order_id_seq'); -- 1000 (first call)
SELECT currval('order_id_seq'); -- 1000 (same session, same call)
SELECT lastval(); -- most recent nextval in this session
Sequences are non-transactional by design — a rolled-back transaction still consumes a number, so gaps in IDs are normal and expected.
Rule of thumb: never rely on sequence values being gap-free; use them only as unique opaque identifiers, not as row-count proxies.
-- Standard index (locks the table for writes during build)
CREATE INDEX idx_orders_customer ON orders (customer_id);
-- Unique index
CREATE UNIQUE INDEX idx_users_email ON users (email);
-- Composite index
CREATE INDEX idx_orders_status_date ON orders (status, created_at DESC);
-- Postgres: build without blocking writes (slower, but safe in production)
CREATE INDEX CONCURRENTLY idx_orders_total ON orders (total);
-- Drop
DROP INDEX idx_orders_customer;
DROP INDEX CONCURRENTLY idx_orders_total; -- Postgres
CREATE INDEX CONCURRENTLY builds the index in multiple passes while the
table stays writable, avoiding the write lock. The trade-off: it takes longer
and cannot be run inside a transaction block.
Rule of thumb: always use CONCURRENTLY when adding indexes to large
live production tables to avoid blocking reads and writes.
A temporary table exists only for the duration of a session (or transaction, depending on the database and declaration). It is invisible to other sessions and is automatically dropped when the session ends.
-- Postgres / SQL Server
CREATE TEMP TABLE staging_orders AS
SELECT * FROM orders WHERE status = 'pending';
-- Manipulate without touching the real table
UPDATE staging_orders SET status = 'processed';
-- MySQL uses a slightly different syntax
CREATE TEMPORARY TABLE staging_orders SELECT * FROM orders WHERE status = 'pending';
Use cases:
- Breaking a complex multi-step ETL into readable steps.
- Storing an intermediate result that is referenced multiple times (avoids rerunning a slow subquery).
- Isolating work from other sessions in long-running scripts.
Rule of thumb: prefer CTEs for single-query decomposition; use temp tables when the intermediate result must be indexed, updated, or reused across multiple queries.
It depends on the database:
- Postgres: DDL is fully transactional.
CREATE TABLE,ALTER TABLE,DROP TABLEinside aBEGINblock can be rolled back if the transaction aborts. - MySQL: DDL auto-commits. Any active transaction is committed before
the DDL executes; there is no way to roll back a
CREATE TABLEin MySQL. - SQL Server: DDL is transactional (like Postgres).
-- Postgres: safe rollback of schema change
BEGIN;
ALTER TABLE orders ADD COLUMN notes TEXT;
-- something fails...
ROLLBACK;
-- The column was never added
Rule of thumb: in Postgres and SQL Server, wrap schema migrations in
transactions to get atomic, all-or-nothing deploys. In MySQL, deploy each
DDL statement separately and use idempotent migration scripts (IF NOT EXISTS).
CREATE TABLE IF NOT EXISTS creates the table only if no table with that
name already exists in the current schema. If the table already exists, the
statement succeeds silently (no error, no data changed).
-- Safe to run multiple times — won't fail on re-run
CREATE TABLE IF NOT EXISTS audit_log (
id BIGSERIAL PRIMARY KEY,
event TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Similarly: DROP TABLE IF EXISTS, CREATE INDEX IF NOT EXISTS (Postgres 9.5+).
Rule of thumb: use IF NOT EXISTS / IF EXISTS in every migration
script to make migrations idempotent — safe to re-run after a partial
failure without manual cleanup.
-- Postgres / MySQL
ALTER TABLE old_name RENAME TO new_name;
-- SQL Server (stored procedure)
EXEC sp_rename 'old_name', 'new_name';
Renaming a table does not automatically update views, stored procedures, or application code that reference the old name. Views in Postgres become invalid; in SQL Server they may silently continue working via the internal object ID until they are recompiled.
Rule of thumb: after renaming a table, search for all references to the old name in views, functions, application code, and ORM models and update them in the same migration.
A generated column is a column whose value is automatically computed from other columns. The expression is evaluated by the database, not the application. There are two variants:
STORED— the computed value is physically stored and updated on each write (can be indexed).VIRTUAL— computed on read, not stored (MySQL and SQL Server support this).
-- Postgres (STORED only)
CREATE TABLE rectangles (
width NUMERIC NOT NULL,
height NUMERIC NOT NULL,
area NUMERIC GENERATED ALWAYS AS (width * height) STORED
);
-- MySQL (VIRTUAL — no storage cost)
CREATE TABLE rectangles (
width DECIMAL(10,2) NOT NULL,
height DECIMAL(10,2) NOT NULL,
area DECIMAL(10,2) AS (width * height) VIRTUAL
);
Rule of thumb: use generated columns for values always derivable from other columns (area, full name, tax amount) to keep the value consistent and avoid application-level bugs from forgetting to update the derived field.
Table partitioning splits a logically single table into multiple physical storage chunks (partitions) based on a column value. The database routes rows transparently and can prune irrelevant partitions from queries.
-- Declarative partitioning (Postgres 10+)
CREATE TABLE events (
id BIGINT NOT NULL,
created_at DATE NOT NULL,
payload JSONB
) PARTITION BY RANGE (created_at);
-- Create child partitions (one per quarter)
CREATE TABLE events_2026_q1
PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');
CREATE TABLE events_2026_q2
PARTITION OF events
FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');
-- Queries automatically exclude irrelevant partitions
SELECT * FROM events WHERE created_at >= '2026-04-01';
-- Only scans events_2026_q2
Rule of thumb: partition very large tables (100 M+ rows) on the column most commonly used in range filters (usually a timestamp). Add indexes on each partition individually.
A view is a named, stored SELECT statement. Querying a view executes
the underlying query at runtime. Views simplify complex joins, restrict
column access, or present a stable interface over a changing schema.
-- Create a view
CREATE VIEW active_customers AS
SELECT id, name, email
FROM customers
WHERE deleted_at IS NULL;
-- Query it like a table
SELECT * FROM active_customers WHERE name ILIKE 'smith%';
-- Replace (redefine) without dropping
CREATE OR REPLACE VIEW active_customers AS
SELECT id, name, email, tier
FROM customers
WHERE deleted_at IS NULL;
-- Remove
DROP VIEW active_customers;
Rule of thumb: use CREATE OR REPLACE VIEW in migrations so
dependent objects (other views, grants) are preserved. Only DROP VIEW when
you are removing the view entirely.
A materialized view (Postgres, Oracle, SQL Server as "indexed view") is a view whose results are stored on disk like a table. This makes reads fast but the data is stale until explicitly refreshed.
-- Create
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT date_trunc('month', created_at) AS month,
SUM(total) AS revenue
FROM orders
GROUP BY 1;
-- Refresh (blocks reads in Postgres by default)
REFRESH MATERIALIZED VIEW monthly_sales;
-- Non-blocking refresh (requires a unique index)
CREATE UNIQUE INDEX ON monthly_sales (month);
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales;
Rule of thumb: use a materialized view for expensive aggregations or
reports that can tolerate slightly stale data. Schedule REFRESH in a
background job; use CONCURRENTLY on large views to avoid read downtime.
More Schema & Data Types interview questions
More ways to practice
The self-quiz is live. Get notified when mock interviews and new question packs drop.