Python serialization, explained
Serialization turns in-memory objects into bytes you can store or send, and back again. Python's standard library covers the three formats you'll meet constantly: JSON for interchange, CSV for tabular data, and pickle for arbitrary Python objects. Each has a clear use case — and pickle has a sharp security warning.
JSON for interchange
json is the go-to for talking to web APIs and config files — it's language-agnostic and
human-readable. The four functions split by string vs file: dumps/loads work with
strings, dump/load with file objects.
import json
data = {"name": "Ada", "langs": ["python", "c"], "active": True}
text = json.dumps(data, indent=2) # object → JSON string
back = json.loads(text) # JSON string → object
with open("config.json", "w") as f:
json.dump(data, f, indent=2) # write to file
with open("config.json") as f:
cfg = json.load(f) # read from file
The mnemonic: the s in dumps/loads is for string.
JSON's type limitations
JSON only knows objects, arrays, strings, numbers, booleans, and null. Python types like
datetime, set, and Decimal don't map directly — you provide a default function to
serialize them, and keys become strings.
import json
from datetime import datetime
json.dumps({"t": datetime.now()}, default=str) # serialize datetime as its string
# dict int keys become strings on round-trip:
json.loads(json.dumps({1: "a"})) # {'1': 'a'} — key is now str!
That key-stringification is a frequent surprise; design around it for JSON-bound data.
CSV for tabular data
The csv module handles the quoting and escaping that make hand-rolled comma-splitting
fragile. DictReader/DictWriter work with column names, which is usually clearest.
import csv
with open("people.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["name", "age"])
writer.writeheader()
writer.writerow({"name": "Ada", "age": 36})
with open("people.csv", newline="") as f:
for row in csv.DictReader(f):
print(row["name"], row["age"]) # values are strings — convert as needed
Two gotchas: always open CSV files with newline="" (prevents blank rows on Windows), and
remember every value reads back as a string.
pickle for Python objects
pickle serializes almost any Python object — nested structures, custom classes, even
closures-of-state — into bytes, preserving types exactly. It's Python-specific and binary.
import pickle
obj = {"set": {1, 2, 3}, "tuple": (1, 2)} # types JSON can't represent
blob = pickle.dumps(obj) # → bytes
restored = pickle.loads(blob) # exact types preserved
with open("state.pkl", "wb") as f: # note binary mode
pickle.dump(obj, f)
Unlike JSON, sets and tuples survive the round-trip intact.
Never unpickle untrusted data
This is the critical rule: unpickling executes arbitrary code. A malicious pickle can run anything on your machine, so only load pickles you created or fully trust.
# DANGER: a crafted pickle can run os.system(...) during loads()
pickle.loads(data_from_the_internet) # ← never do this
For data crossing a trust boundary or other languages, use JSON. Reserve pickle for caches and internal Python-to-Python transfer you control.
Choosing a format
JSON: interchange, APIs, config — portable and readable, limited types. CSV: tabular data for spreadsheets and data tools — strings only. pickle: full-fidelity Python objects, internal and trusted use only. When you need cross-language and rich types, look at formats like MessagePack or Protocol Buffers, but the stdlib three cover most needs.
Recap
Pick the format by purpose. json is for portable interchange (dumps/loads for
strings, dump/load for files) but only handles basic types and stringifies dict keys.
csv handles tabular data safely — use DictReader/DictWriter, open with
newline="", and convert the string values yourself. pickle serializes almost any
Python object with exact types, but is binary, Python-only, and must never be used on
untrusted data because unpickling runs arbitrary code.