Python regular expressions, explained
Regular expressions match patterns in text, and Python's re module is the standard tool.
The hard part isn't the regex syntax itself but knowing which re function to call and how
groups work. This guide covers the everyday API and the traps that catch people.
Always use raw strings
Regex uses backslashes heavily (\d, \b, \w), and so do Python string escapes. Write
patterns as raw strings (r"...") so the backslashes reach the regex engine intact.
import re
re.search(r"\bword\b", text) # raw string — \b is a word boundary
re.search("\bword\b", text) # bug! \b is a backspace char to Python first
Make r"..." a reflex for every pattern — it avoids a whole class of silent mismatches.
match vs search vs fullmatch
These three differ in where they look. match anchors at the start, search finds the
pattern anywhere, and fullmatch requires the whole string to match.
re.match(r"\d+", "123abc") # matches "123" — start only
re.search(r"\d+", "abc123") # matches "123" — anywhere
re.fullmatch(r"\d+", "123abc") # None — not the entire string
All return a match object (truthy) or None, so they're used in if tests. A common
bug is expecting match to scan the whole string — it doesn't.
Extracting with groups
Parentheses create capturing groups. .group(0) is the whole match; .group(n) is the
nth group. Named groups (?P<name>...) make the result self-documenting.
m = re.search(r"(\d{4})-(\d{2})-(\d{2})", "date: 2026-06-19")
m.group(0) # '2026-06-19' — full match
m.group(1) # '2026' — first group
m.groups() # ('2026', '06', '19')
m = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})", "2026-06")
m.group("year") # '2026'
Finding all matches
findall returns a list of matches (or of group tuples if the pattern has groups).
finditer yields match objects lazily, which is better when you need positions or it's a
large input.
re.findall(r"\d+", "a1 b22 c333") # ['1', '22', '333']
re.findall(r"(\w)=(\d)", "a=1 b=2") # [('a','1'), ('b','2')] — group tuples
for m in re.finditer(r"\d+", "a1 b22"):
print(m.group(), m.start()) # value and index
Substituting with re.sub
re.sub replaces matches. The replacement can reference groups (\1 or \g<name>) or be a
function for computed replacements.
re.sub(r"\s+", " ", "too many spaces") # 'too many spaces'
re.sub(r"(\d{4})-(\d{2})", r"\2/\1", "2026-06") # '06/2026' — reorder groups
# function replacement:
re.sub(r"\d+", lambda m: str(int(m.group()) * 2), "a1 b2") # 'a2 b4'
Greedy vs non-greedy, and compiling
By default quantifiers are greedy — they match as much as possible. Add ? to make them
lazy. This is the most common "why did it match too much" bug.
re.search(r"<.*>", "<a><b>").group() # '<a><b>' — greedy, grabs everything
re.search(r"<.*?>", "<a><b>").group() # '<a>' — lazy, stops early
When reusing a pattern many times (e.g. in a loop), compile it once for clarity and a small speedup:
pat = re.compile(r"\d+")
pat.findall(text)
pat.search(other)
Recap
Use raw strings (r"...") for every pattern. Pick the right function: match
(start), search (anywhere), fullmatch (whole string) — all return a match object
or None. Capture data with groups (positional or (?P<name>...)), get every hit with
findall/finditer, and rewrite text with re.sub (group refs or a function).
Remember quantifiers are greedy unless you add ?, and compile patterns you reuse.