Python Serialization Explained — JSON, CSV, and pickle for Saving and Exchanging Data

Python serialization, explained

Serialization turns in-memory objects into bytes you can store or send, and back again. Python's standard library covers the three formats you'll meet constantly: JSON for interchange, CSV for tabular data, and pickle for arbitrary Python objects. Each has a clear use case — and pickle has a sharp security warning.

JSON for interchange

json is the go-to for talking to web APIs and config files — it's language-agnostic and human-readable. The four functions split by string vs file: dumps/loads work with strings, dump/load with file objects.

import json

data = {"name": "Ada", "langs": ["python", "c"], "active": True}

text = json.dumps(data, indent=2)        # object → JSON string
back = json.loads(text)                  # JSON string → object

with open("config.json", "w") as f:
    json.dump(data, f, indent=2)         # write to file
with open("config.json") as f:
    cfg = json.load(f)                   # read from file

The mnemonic: the s in dumps/loads is for string.

JSON's type limitations

JSON only knows objects, arrays, strings, numbers, booleans, and null. Python types like datetime, set, and Decimal don't map directly — you provide a default function to serialize them, and keys become strings.

import json
from datetime import datetime

json.dumps({"t": datetime.now()}, default=str)   # serialize datetime as its string

# dict int keys become strings on round-trip:
json.loads(json.dumps({1: "a"}))                 # {'1': 'a'} — key is now str!

That key-stringification is a frequent surprise; design around it for JSON-bound data.

CSV for tabular data

The csv module handles the quoting and escaping that make hand-rolled comma-splitting fragile. DictReader/DictWriter work with column names, which is usually clearest.

import csv

with open("people.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "age"])
    writer.writeheader()
    writer.writerow({"name": "Ada", "age": 36})

with open("people.csv", newline="") as f:
    for row in csv.DictReader(f):
        print(row["name"], row["age"])   # values are strings — convert as needed

Two gotchas: always open CSV files with newline="" (prevents blank rows on Windows), and remember every value reads back as a string.

pickle for Python objects

pickle serializes almost any Python object — nested structures, custom classes, even closures-of-state — into bytes, preserving types exactly. It's Python-specific and binary.

import pickle

obj = {"set": {1, 2, 3}, "tuple": (1, 2)}        # types JSON can't represent

blob = pickle.dumps(obj)                          # → bytes
restored = pickle.loads(blob)                     # exact types preserved

with open("state.pkl", "wb") as f:                # note binary mode
    pickle.dump(obj, f)

Unlike JSON, sets and tuples survive the round-trip intact.

Never unpickle untrusted data

This is the critical rule: unpickling executes arbitrary code. A malicious pickle can run anything on your machine, so only load pickles you created or fully trust.

# DANGER: a crafted pickle can run os.system(...) during loads()
pickle.loads(data_from_the_internet)    # ← never do this

For data crossing a trust boundary or other languages, use JSON. Reserve pickle for caches and internal Python-to-Python transfer you control.

Choosing a format

JSON: interchange, APIs, config — portable and readable, limited types. CSV: tabular data for spreadsheets and data tools — strings only. pickle: full-fidelity Python objects, internal and trusted use only. When you need cross-language and rich types, look at formats like MessagePack or Protocol Buffers, but the stdlib three cover most needs.

Recap

Pick the format by purpose. json is for portable interchange (dumps/loads for strings, dump/load for files) but only handles basic types and stringifies dict keys. csv handles tabular data safely — use DictReader/DictWriter, open with newline="", and convert the string values yourself. pickle serializes almost any Python object with exact types, but is binary, Python-only, and must never be used on untrusted data because unpickling runs arbitrary code.

More ways to practice