Regular Expressions Interview Questions & Answers

5 questions Updated 2026-06-18

Python interview questions on the re module — match vs search vs fullmatch, capture and named groups, re.compile, greedy vs non-greedy, re.sub, and raw strings.

They differ in where the pattern must match. re.match anchors at the start of the string (but not the end). re.search scans for the pattern anywhere in the string. re.fullmatch requires the pattern to match the entire string. All return a Match object on success or None on failure.

import re
re.match("ab", "abcd")      # match — starts with 'ab'
re.match("cd", "abcd")      # None  — not at the start
re.search("cd", "abcd")     # match — found anywhere
re.fullmatch("ab", "abcd")  # None  — must match the whole string
re.fullmatch("abcd", "abcd")# match

A common bug is using match expecting whole-string validation — it only anchors the start. Rule of thumb: use search to find, fullmatch to validate, and match only when you specifically mean "begins with".

Parentheses ( ) create a capture group you retrieve by number (1-based; group 0 is the whole match). (?P<name>...) creates a named group you retrieve by name — far more readable. (?:...) groups without capturing when you only need it for grouping/alternation.

import re
m = re.search(r"(\d{4})-(\d{2})", "2026-06")
m.group(0)   # '2026-06'  — whole match
m.group(1)   # '2026'     — first group
m.groups()   # ('2026', '06')

m = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})", "2026-06")
m.group("year")   # '2026'
m.groupdict()     # {'year': '2026', 'month': '06'}

Named groups make patterns self-documenting and resilient to reordering. Rule of thumb: use (?P<name>...) for anything you'll extract, and (?:...) when grouping is structural only.

re.compile(pattern) builds a reusable pattern object once, then you call methods (.search, .match, .findall, .sub) on it. The module-level functions actually compile internally and cache recent patterns, so the main win is clarity and reuse — plus a small speedup when a pattern is used many times in a loop.

import re
DATE = re.compile(r"(?P<year>\d{4})-(?P<month>\d{2})")  # compile once

for line in lines:
    m = DATE.search(line)   # reuse the compiled object
    if m:
        print(m.group("year"))

It also lets you attach flags (e.g. re.IGNORECASE, re.VERBOSE) in one place. Rule of thumb: compile patterns used repeatedly or shared across a module; for one-off use the module functions are fine.

By default quantifiers (*, +, ?, {m,n}) are greedy — they match as much as possible, then backtrack. Adding a trailing ? makes them non-greedy (lazy) — they match as little as possible. This matters hugely when a delimiter can appear multiple times.

import re
text = "<a><b>"
re.search(r"<.*>", text).group()    # '<a><b>'  — greedy, grabs everything
re.search(r"<.*?>", text).group()   # '<a>'     — lazy, stops at first '>'

Greedy patterns over-matching is a classic "regex ate too much" bug. Rule of thumb: when matching content between delimiters, reach for the lazy *? / +? (or a negated character class like [^>]*).

re.sub(pattern, repl, string) returns a new string with all matches replaced. The replacement can reference captured groups with \1 or \g<name>, or be a function that receives each Match for dynamic replacement. You should write patterns as raw strings (r"...") so that backslash escapes like \d and \b reach the regex engine instead of being interpreted by Python first.

import re
re.sub(r"\s+", " ", "a   b\tc")          # 'a b c'  — collapse whitespace
re.sub(r"(\d{4})-(\d{2})", r"\2/\1", "2026-06")  # '06/2026' — reorder groups
re.sub(r"\d+", lambda m: f"[{m.group()}]", "x9")  # 'x[9]' — function repl

"\d"     # in a normal string this is an invalid escape (warns)
r"\d"    # raw string — passes \d straight to the engine

Without r"", "\b" becomes a backspace character, not a word boundary — a subtle, hard-to-spot bug. Rule of thumb: always prefix regex patterns with r.

Practice tests are coming soon

Get notified when interactive mock interviews and quizzes launch.