Python strings, explained
Strings are everywhere, and interviewers use them to probe whether you understand
immutability, text-vs-bytes, and performance. This guide covers the three formatting styles,
the str/bytes divide, the build-a-string trap, and the format spec mini-language.
Three ways to format — and which to use
Python has accumulated three formatting styles:
name, score = "Ada", 98.5
f"{name} scored {score}" # f-string (3.6+, preferred)
"{} scored {}".format(name, score) # str.format
"%s scored %.1f" % (name, score) # %-formatting (oldest)
Prefer f-strings: they're the fastest, most readable, and evaluate expressions inline.
.format() is useful when the template is separate from the data (e.g. loaded from a
config). %-formatting is legacy but still common in logging.
f-strings can do more than interpolate
f-strings evaluate any expression and support a debugging = shortcut and format specs:
x = 42
f"{x * 2}" # '84' — arbitrary expressions
f"{x=}" # 'x=42' — name and value (great for debugging)
f"{x:>6}" # ' 42' — right-align in 6 columns
f"{3.14159:.2f}" # '3.14' — 2 decimal places
f"{1000000:,}" # '1,000,000' — thousands separator
str vs bytes
This is a crucial distinction. str is a sequence of Unicode code points (text);
bytes is a sequence of raw 8-bit values. You convert between them with an explicit
encoding:
s = "café"
b = s.encode("utf-8") # b'caf\xc3\xa9' — str -> bytes
b.decode("utf-8") # 'café' — bytes -> str
len(s) # 4 (characters)
len(b) # 5 (bytes — é is 2 bytes in UTF-8)
You can't mix them: "a" + b"b" raises TypeError. Files, sockets, and APIs deal in
bytes; your program logic should deal in str. Decode on the way in, encode on the way out.
Strings are immutable
You can't change a string in place — every "modification" creates a new string:
s = "hello"
s[0] = "H" # TypeError — strings are immutable
s = "H" + s[1:] # 'Hello' — a brand-new string
Immutability is why strings are hashable (usable as dict keys) and why interning is safe.
Why join beats += in a loop
Because strings are immutable, += in a loop creates a new string every iteration —
O(n²) total work. str.join does it in one pass:
# Slow: builds and throws away a string each iteration
result = ""
for word in words:
result += word
# Fast and idiomatic: O(n)
result = "".join(words)
join is a method on the separator: ", ".join(["a", "b", "c"]) → 'a, b, c'. The
items must all be strings — map(str, ...) first if they aren't.
Common methods and the format spec
The everyday string toolkit:
" hi ".strip() # 'hi' — also lstrip/rstrip
"a,b,c".split(",") # ['a', 'b', 'c']
"Hello".lower() # 'hello' — also upper, title
"file.txt".endswith(".txt")# True — also startswith
"abc".replace("a", "x") # 'xbc'
"name".center(10, "-") # '---name---'
The format spec mini-language (after the : in f-strings and .format) controls
alignment, padding, precision, and type:
f"{42:08.2f}" # '00042.00' — zero-padded, width 8, 2 decimals
f"{255:#x}" # '0xff' — hex with prefix
f"{0.25:.1%}" # '25.0%' — percentage
f"{'hi':^10}" # ' hi '— centered
Raw strings
A raw string (r"...") disables backslash escapes — essential for regex patterns and
Windows paths:
r"\d+\n" # literally backslash-d-plus-backslash-n
"\d+\n" # \d stays literal, but \n becomes a newline (and warns)
Recap
Use f-strings for almost all formatting — they're fast, readable, and support inline
expressions, the {x=} debug form, and the format spec mini-language. Keep str
(Unicode text) and bytes (raw octets) separate, converting with explicit .encode()
/.decode(). Strings are immutable, so build them with "".join(parts) rather than
+= in a loop to avoid O(n²) behaviour. Reach for raw strings (r"...") whenever
backslashes should be literal.