INSERT INTO adds one or more rows to a table. You list the target columns
explicitly so the statement stays correct if the table schema changes later.
-- Single row, explicit column list (preferred)
INSERT INTO users (name, email, created_at)
VALUES ('Alice', 'alice@example.com', now());
-- Multiple rows in one statement (more efficient than many single inserts)
INSERT INTO users (name, email, created_at)
VALUES
('Bob', 'bob@example.com', now()),
('Carol', 'carol@example.com', now());
Omitting the column list makes the VALUES order dependent on the physical
column order, which breaks silently when a column is added.
Rule of thumb: always list target columns explicitly in every INSERT
statement, even when inserting into all columns.
INSERT INTO … SELECT copies rows produced by a SELECT directly into the
target table without materializing them in the application.
-- Copy active users to an archive table
INSERT INTO users_archive (id, name, email, archived_at)
SELECT id, name, email, now()
FROM users
WHERE deactivated_at IS NOT NULL;
-- Create a staging table from a production table
INSERT INTO orders_staging
SELECT * FROM orders WHERE created_at >= '2026-01-01';
The SELECT can be as complex as needed — it can join tables, apply
functions, filter, and aggregate. The column count and types must match
the target column list.
Rule of thumb: prefer INSERT … SELECT over fetching rows in
application code and re-inserting them — it keeps the data movement inside
the database and avoids round-trip latency.
An UPSERT (update-or-insert) inserts a row if it does not already exist, or updates it if it does — determined by a unique/primary key conflict.
-- Postgres: ON CONFLICT DO UPDATE (INSERT ... ON CONFLICT)
INSERT INTO page_views (page_id, date, views)
VALUES (42, '2026-06-20', 1)
ON CONFLICT (page_id, date)
DO UPDATE SET views = page_views.views + EXCLUDED.views;
-- MySQL: INSERT ... ON DUPLICATE KEY UPDATE
INSERT INTO page_views (page_id, date, views)
VALUES (42, '2026-06-20', 1)
ON DUPLICATE KEY UPDATE views = views + VALUES(views);
-- SQL Server: MERGE statement
MERGE page_views AS target
USING (VALUES (42, '2026-06-20', 1)) AS src (page_id, date, views)
ON target.page_id = src.page_id AND target.date = src.date
WHEN MATCHED THEN UPDATE SET views = target.views + src.views
WHEN NOT MATCHED THEN INSERT (page_id, date, views) VALUES (src.page_id, src.date, src.views);
Rule of thumb: use UPSERT for idempotent writes — counters, caches, configuration, event deduplication. It avoids the race condition of a separate SELECT-then-INSERT in application code.
Use ON CONFLICT DO NOTHING (Postgres) or INSERT IGNORE (MySQL) to
skip rows that would violate a unique constraint rather than raising an error.
-- Postgres
INSERT INTO tags (name)
VALUES ('sql'), ('database'), ('sql') -- duplicate 'sql'
ON CONFLICT (name) DO NOTHING;
-- Inserts 'database'; skips the second 'sql' silently
-- MySQL
INSERT IGNORE INTO tags (name)
VALUES ('sql'), ('database'), ('sql');
In SQL Server, the closest equivalent is MERGE … WHEN NOT MATCHED THEN INSERT.
Rule of thumb: use DO NOTHING for idempotent seed data or batch
imports where duplicates are expected and harmless. Avoid it when you need
to know how many rows were actually inserted.
UPDATE modifies column values in rows that satisfy the WHERE condition.
Omitting WHERE updates every row in the table.
-- Update a single row by PK
UPDATE users
SET name = 'Alice Smith', updated_at = now()
WHERE id = 1;
-- Update multiple rows matching a condition
UPDATE orders
SET status = 'archived'
WHERE created_at < now() - INTERVAL '2 years';
-- Update using a value from the same row (relative update)
UPDATE products
SET stock = stock - 1
WHERE id = 99 AND stock > 0;
Rule of thumb: always write and test the WHERE clause of an UPDATE
as a SELECT first to confirm exactly which rows will be affected before
committing the change.
Updating based on a related table requires different syntax per database.
-- Postgres: UPDATE ... FROM
UPDATE orders o
SET discount = c.tier_discount
FROM customers c
WHERE c.id = o.customer_id
AND c.tier = 'gold';
-- MySQL: UPDATE with JOIN
UPDATE orders o
JOIN customers c ON c.id = o.customer_id
SET o.discount = c.tier_discount
WHERE c.tier = 'gold';
-- SQL Server: UPDATE with FROM/JOIN
UPDATE o
SET o.discount = c.tier_discount
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE c.tier = 'gold';
Rule of thumb: always alias both tables and double-check the join
condition — a wrong ON clause can fan out rows and cause each target row
to be updated multiple times (non-deterministically in Postgres).
DELETE FROM removes rows that match the WHERE predicate. Without
WHERE, all rows in the table are deleted (the table structure remains).
-- Delete a single row
DELETE FROM users WHERE id = 42;
-- Delete with a condition
DELETE FROM sessions WHERE expires_at < now();
-- Delete all rows (use TRUNCATE for large tables — it's faster)
DELETE FROM staging_data;
Unlike TRUNCATE, DELETE fires ON DELETE triggers, respects FK
ON DELETE CASCADE / RESTRICT rules, and is logged row-by-row — making
it slower but more controllable.
Rule of thumb: run SELECT * FROM … WHERE <condition> before any
DELETE to preview what will be removed. Wrap destructive deletes in a
transaction and ROLLBACK first to verify the row count.
-- Postgres: DELETE ... USING
DELETE FROM order_items oi
USING orders o
WHERE oi.order_id = o.id
AND o.status = 'cancelled';
-- MySQL: DELETE with JOIN
DELETE oi
FROM order_items oi
JOIN orders o ON o.id = oi.order_id
WHERE o.status = 'cancelled';
-- SQL Server: DELETE with FROM/JOIN
DELETE oi
FROM order_items oi
JOIN orders o ON o.id = oi.order_id
WHERE o.status = 'cancelled';
-- ANSI alternative using a subquery (all databases)
DELETE FROM order_items
WHERE order_id IN (
SELECT id FROM orders WHERE status = 'cancelled'
);
Rule of thumb: the subquery form (WHERE … IN (SELECT …)) is the
most portable across databases; use the JOIN-based form when the subquery
is slow (the optimizer may not push it down efficiently).
RETURNING appends a SELECT-like clause to INSERT, UPDATE, or
DELETE and returns the affected rows — without a separate query. This
eliminates a round-trip and avoids race conditions from a subsequent
SELECT.
-- Get the generated ID after INSERT
INSERT INTO orders (customer_id, total)
VALUES (7, 149.99)
RETURNING id, created_at;
-- See the old values before UPDATE overwrites them
UPDATE products
SET price = price * 0.9
WHERE category = 'clearance'
RETURNING id, name, price AS new_price;
-- Confirm which rows were deleted
DELETE FROM sessions
WHERE expires_at < now()
RETURNING id, user_id;
MySQL uses LAST_INSERT_ID() for inserts only. SQL Server uses the
OUTPUT clause (OUTPUT INSERTED.id, DELETED.old_col).
Rule of thumb: use RETURNING (Postgres) or OUTPUT (SQL Server)
to retrieve generated IDs and audit old/new values atomically — never
follow a mutation with a separate SELECT when the database can return
the data in the same statement.
Soft delete marks rows as deleted without physically removing them,
preserving history and enabling recovery. Instead of DELETE, you
UPDATE a deleted_at timestamp (or a boolean flag).
-- Schema: nullable deleted_at column
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ;
-- Soft delete
UPDATE users SET deleted_at = now() WHERE id = 42;
-- Query active rows only
SELECT * FROM users WHERE deleted_at IS NULL;
-- Partial unique index so email stays unique among active users only
CREATE UNIQUE INDEX uq_users_email_active
ON users (email)
WHERE deleted_at IS NULL;
-- View to hide deleted rows from most queries
CREATE VIEW active_users AS
SELECT * FROM users WHERE deleted_at IS NULL;
Rule of thumb: use soft delete when you need an audit trail, recoverability,
or regulatory retention. Add a partial index on (natural_key) WHERE deleted_at IS NULL
to keep uniqueness constraints working correctly.
Deleting millions of rows in one statement locks large portions of the table, fills the transaction log, and may time out. The solution is to delete in small batches inside a loop.
-- Postgres: batch delete loop (run from application or a DO block)
DO $$
DECLARE deleted_count INT;
BEGIN
LOOP
DELETE FROM events
WHERE id IN (
SELECT id FROM events
WHERE created_at < now() - INTERVAL '1 year'
LIMIT 1000 -- batch size
);
GET DIAGNOSTICS deleted_count = ROW_COUNT;
EXIT WHEN deleted_count = 0;
PERFORM pg_sleep(0.05); -- brief pause to release locks
END LOOP;
END $$;
Each batch commits independently, so the table remains available for reads and writes between batches. The lock held per batch is small and short-lived.
Rule of thumb: never delete more than ~5 000–10 000 rows per transaction on a live table. Use batching with a pause between iterations to keep replication lag and lock contention low.
TRUNCATE removes all rows by deallocating data pages rather than
logging each deletion — it is orders of magnitude faster than
DELETE FROM table with no WHERE clause.
-- Slow: logs every row deletion
DELETE FROM staging_orders;
-- Fast: drops and recreates the data pages
TRUNCATE TABLE staging_orders;
-- TRUNCATE also resets identity/sequence counters
TRUNCATE TABLE events RESTART IDENTITY;
-- TRUNCATE multiple tables atomically (Postgres)
TRUNCATE TABLE staging_orders, staging_items RESTART IDENTITY CASCADE;
Trade-offs vs DELETE:
TRUNCATEdoes not fire row-level triggers.TRUNCATEcannot have aWHEREclause — it always removes all rows.- In MySQL,
TRUNCATEis DDL and auto-commits; in Postgres it is transactional.
Rule of thumb: use TRUNCATE to reset staging, temp, or test-fixture
tables between runs. Use DELETE when you need WHERE filtering, trigger
firing, or row-count reporting.
In MySQL, REPLACE INTO works like an UPSERT but via a delete-then-insert
strategy rather than an in-place update. If a row with the same primary key
(or unique key) exists, MySQL deletes it and inserts the new row. All columns
not listed in the REPLACE get their default values — not the existing values.
-- Suppose users(id PK, name, email, created_at DEFAULT now())
REPLACE INTO users (id, name, email)
VALUES (1, 'Alice Updated', 'alice@example.com');
-- If id=1 existed: deletes old row, inserts new row.
-- created_at will be reset to now(), NOT kept from the old row!
-- Safer alternative that preserves untouched columns:
INSERT INTO users (id, name, email)
VALUES (1, 'Alice Updated', 'alice@example.com')
ON DUPLICATE KEY UPDATE
name = VALUES(name),
email = VALUES(email);
Rule of thumb: avoid REPLACE INTO — it silently wipes columns you
didn't list and can cause unexpected data loss. Use INSERT … ON DUPLICATE KEY UPDATE
instead for in-place upserts.
Use a CASE expression inside SET to apply different values depending
on each row's state — more efficient than multiple UPDATE statements.
-- Apply tiered discounts in a single pass
UPDATE orders
SET discount_pct = CASE
WHEN total >= 500 THEN 20
WHEN total >= 200 THEN 10
WHEN total >= 100 THEN 5
ELSE 0
END
WHERE status = 'pending';
-- Flip a boolean flag
UPDATE tasks
SET is_done = CASE WHEN is_done THEN FALSE ELSE TRUE END
WHERE id = 7;
Rule of thumb: use SET col = CASE … END when multiple rows need
different values updated in one pass. It avoids multiple round-trips and
is atomic — all rows are updated in the same transaction.
A writeable CTE (Postgres, SQL Server) lets you stage intermediate
results or chain mutations. The CTE body can be a DELETE or UPDATE
with RETURNING, whose output feeds the outer INSERT.
-- Postgres: move rows from one table to another atomically
WITH deleted AS (
DELETE FROM job_queue
WHERE status = 'pending'
AND locked_by IS NULL
LIMIT 10
RETURNING *
)
INSERT INTO job_archive
SELECT *, now() AS archived_at
FROM deleted;
-- CTE as a data source for UPDATE
WITH price_hike AS (
SELECT id, price * 1.05 AS new_price
FROM products
WHERE category = 'premium'
)
UPDATE products p
SET price = ph.new_price
FROM price_hike ph
WHERE p.id = ph.id;
Rule of thumb: use writeable CTEs to express complex multi-step mutations as a single atomic statement. They are far safer than chaining separate DML statements across multiple round-trips where partial failure can leave data in an inconsistent state.
More Modifying Data interview questions
More ways to practice
The self-quiz is live. Get notified when mock interviews and new question packs drop.