Regular Expressions Interview Questions & Answers
5 questions Updated 2026-06-18
Python interview questions on the re module — match vs search vs fullmatch, capture and named groups, re.compile, greedy vs non-greedy, re.sub, and raw strings.
They differ in where the pattern must match. re.match anchors at the
start of the string (but not the end). re.search scans for the pattern
anywhere in the string. re.fullmatch requires the pattern to match the
entire string. All return a Match object on success or None on failure.
import re
re.match("ab", "abcd") # match — starts with 'ab'
re.match("cd", "abcd") # None — not at the start
re.search("cd", "abcd") # match — found anywhere
re.fullmatch("ab", "abcd") # None — must match the whole string
re.fullmatch("abcd", "abcd")# match
A common bug is using match expecting whole-string validation — it only
anchors the start. Rule of thumb: use search to find, fullmatch to
validate, and match only when you specifically mean "begins with".
Parentheses ( ) create a capture group you retrieve by number
(1-based; group 0 is the whole match). (?P<name>...) creates a named
group you retrieve by name — far more readable. (?:...) groups without
capturing when you only need it for grouping/alternation.
import re
m = re.search(r"(\d{4})-(\d{2})", "2026-06")
m.group(0) # '2026-06' — whole match
m.group(1) # '2026' — first group
m.groups() # ('2026', '06')
m = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})", "2026-06")
m.group("year") # '2026'
m.groupdict() # {'year': '2026', 'month': '06'}
Named groups make patterns self-documenting and resilient to reordering.
Rule of thumb: use (?P<name>...) for anything you'll extract, and
(?:...) when grouping is structural only.
re.compile(pattern) builds a reusable pattern object once, then you call
methods (.search, .match, .findall, .sub) on it. The module-level
functions actually compile internally and cache recent patterns, so the
main win is clarity and reuse — plus a small speedup when a pattern is
used many times in a loop.
import re
DATE = re.compile(r"(?P<year>\d{4})-(?P<month>\d{2})") # compile once
for line in lines:
m = DATE.search(line) # reuse the compiled object
if m:
print(m.group("year"))
It also lets you attach flags (e.g. re.IGNORECASE, re.VERBOSE) in one
place. Rule of thumb: compile patterns used repeatedly or shared across a
module; for one-off use the module functions are fine.
By default quantifiers (*, +, ?, {m,n}) are greedy — they match as
much as possible, then backtrack. Adding a trailing ? makes them
non-greedy (lazy) — they match as little as possible. This matters
hugely when a delimiter can appear multiple times.
import re
text = "<a><b>"
re.search(r"<.*>", text).group() # '<a><b>' — greedy, grabs everything
re.search(r"<.*?>", text).group() # '<a>' — lazy, stops at first '>'
Greedy patterns over-matching is a classic "regex ate too much" bug. Rule of
thumb: when matching content between delimiters, reach for the lazy *?
/ +? (or a negated character class like [^>]*).
re.sub(pattern, repl, string) returns a new string with all matches
replaced. The replacement can reference captured groups with \1 or
\g<name>, or be a function that receives each Match for dynamic
replacement. You should write patterns as raw strings (r"...") so that
backslash escapes like \d and \b reach the regex engine instead of being
interpreted by Python first.
import re
re.sub(r"\s+", " ", "a b\tc") # 'a b c' — collapse whitespace
re.sub(r"(\d{4})-(\d{2})", r"\2/\1", "2026-06") # '06/2026' — reorder groups
re.sub(r"\d+", lambda m: f"[{m.group()}]", "x9") # 'x[9]' — function repl
"\d" # in a normal string this is an invalid escape (warns)
r"\d" # raw string — passes \d straight to the engine
Without r"", "\b" becomes a backspace character, not a word boundary —
a subtle, hard-to-spot bug. Rule of thumb: always prefix regex patterns
with r.
Practice tests are coming soon
Get notified when interactive mock interviews and quizzes launch.