Python dataclasses, explained
Writing __init__, __repr__, and __eq__ by hand for a simple data-holding class is
tedious boilerplate. @dataclass generates them from your type annotations. Pair it with
__slots__ for memory savings and you have Python's modern answer to "I just need a record".
What @dataclass generates
Decorate a class with @dataclass and annotate its fields — Python writes the dunder methods
for you:
from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
p = Point(1, 2)
p # Point(x=1, y=2) — generated __repr__
p == Point(1, 2) # True — generated __eq__
By default it generates __init__, __repr__, and __eq__. Options like
@dataclass(order=True) add comparison methods, and frozen=True makes it immutable.
frozen dataclasses
frozen=True makes instances immutable — assigning to a field raises an error — and, as
a bonus, makes them hashable so they can be dict keys or set members:
@dataclass(frozen=True)
class Point:
x: int
y: int
p = Point(1, 2)
p.x = 5 # FrozenInstanceError
{p: "origin-ish"} # hashable — works as a dict key
The mutable default trap — field(default_factory)
Just like mutable default arguments, a mutable default on a dataclass field would be shared
across instances. Dataclasses forbid it outright and make you use field(default_factory=...):
from dataclasses import dataclass, field
@dataclass
class Cart:
items: list = [] # ValueError at class definition!
@dataclass
class Cart:
items: list = field(default_factory=list) # correct — fresh list per instance
The factory is a zero-argument callable (list, dict, or a lambda) called once per new
instance.
post_init for derived fields and validation
__init__ is generated, so to run extra logic after the fields are set, define
__post_init__:
@dataclass
class Rectangle:
width: float
height: float
area: float = field(init=False) # not a constructor argument
def __post_init__(self):
if self.width <= 0:
raise ValueError("width must be positive")
self.area = self.width * self.height
field(init=False) keeps area out of the constructor signature so you can compute it here.
slots — smaller, faster instances
By default each instance stores its attributes in a per-instance __dict__, which is
flexible but memory-hungry. __slots__ replaces it with a fixed layout, cutting memory and
speeding attribute access — at the cost of being unable to add new attributes:
class Point:
__slots__ = ("x", "y")
def __init__(self, x, y):
self.x, self.y = x, y
p = Point(1, 2)
p.z = 3 # AttributeError — not in __slots__
Since Python 3.10 you can combine it with dataclasses via @dataclass(slots=True). Use
__slots__ when you create huge numbers of small objects.
dataclass vs namedtuple vs NamedTuple
All three model records; pick by mutability and behaviour:
@dataclass— mutable by default (or frozen), supports methods, defaults, and inheritance. The general-purpose choice.namedtuple/typing.NamedTuple— immutable, tuple-based (indexable and unpackable), lighter weight. Best when you want tuple behaviour and immutability.
# Want tuple unpacking and immutability with little code -> NamedTuple
# Want mutability, methods, or rich behaviour -> dataclass
Recap
@dataclass generates __init__, __repr__, and __eq__ from your annotations;
frozen=True makes instances immutable and hashable. Mutable defaults are banned — use
field(default_factory=list) for a fresh object per instance, and __post_init__ for
validation or derived fields (field(init=False)). __slots__ drops the per-instance
__dict__ to save memory and speed access for large object counts. Reach for a dataclass
for general records, and a named tuple when you want immutable, tuple-like behaviour.