Python Regular Expressions Explained — The re Module, Groups, and Common Patterns

Python regular expressions, explained

Regular expressions match patterns in text, and Python's re module is the standard tool. The hard part isn't the regex syntax itself but knowing which re function to call and how groups work. This guide covers the everyday API and the traps that catch people.

Always use raw strings

Regex uses backslashes heavily (\d, \b, \w), and so do Python string escapes. Write patterns as raw strings (r"...") so the backslashes reach the regex engine intact.

import re

re.search(r"\bword\b", text)    # raw string — \b is a word boundary
re.search("\bword\b", text)     # bug! \b is a backspace char to Python first

Make r"..." a reflex for every pattern — it avoids a whole class of silent mismatches.

match vs search vs fullmatch

These three differ in where they look. match anchors at the start, search finds the pattern anywhere, and fullmatch requires the whole string to match.

re.match(r"\d+", "123abc")      # matches "123" — start only
re.search(r"\d+", "abc123")     # matches "123" — anywhere
re.fullmatch(r"\d+", "123abc")  # None — not the entire string

All return a match object (truthy) or None, so they're used in if tests. A common bug is expecting match to scan the whole string — it doesn't.

Extracting with groups

Parentheses create capturing groups. .group(0) is the whole match; .group(n) is the nth group. Named groups (?P<name>...) make the result self-documenting.

m = re.search(r"(\d{4})-(\d{2})-(\d{2})", "date: 2026-06-19")
m.group(0)      # '2026-06-19'  — full match
m.group(1)      # '2026'        — first group
m.groups()      # ('2026', '06', '19')

m = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})", "2026-06")
m.group("year") # '2026'

Finding all matches

findall returns a list of matches (or of group tuples if the pattern has groups). finditer yields match objects lazily, which is better when you need positions or it's a large input.

re.findall(r"\d+", "a1 b22 c333")           # ['1', '22', '333']
re.findall(r"(\w)=(\d)", "a=1 b=2")          # [('a','1'), ('b','2')] — group tuples

for m in re.finditer(r"\d+", "a1 b22"):
    print(m.group(), m.start())              # value and index

Substituting with re.sub

re.sub replaces matches. The replacement can reference groups (\1 or \g<name>) or be a function for computed replacements.

re.sub(r"\s+", " ", "too   many    spaces")     # 'too many spaces'
re.sub(r"(\d{4})-(\d{2})", r"\2/\1", "2026-06")  # '06/2026' — reorder groups

# function replacement:
re.sub(r"\d+", lambda m: str(int(m.group()) * 2), "a1 b2")   # 'a2 b4'

Greedy vs non-greedy, and compiling

By default quantifiers are greedy — they match as much as possible. Add ? to make them lazy. This is the most common "why did it match too much" bug.

re.search(r"<.*>", "<a><b>").group()    # '<a><b>' — greedy, grabs everything
re.search(r"<.*?>", "<a><b>").group()   # '<a>'    — lazy, stops early

When reusing a pattern many times (e.g. in a loop), compile it once for clarity and a small speedup:

pat = re.compile(r"\d+")
pat.findall(text)
pat.search(other)

Recap

Use raw strings (r"...") for every pattern. Pick the right function: match (start), search (anywhere), fullmatch (whole string) — all return a match object or None. Capture data with groups (positional or (?P<name>...)), get every hit with findall/finditer, and rewrite text with re.sub (group refs or a function). Remember quantifiers are greedy unless you add ?, and compile patterns you reuse.

More ways to practice