The collections module, explained
The built-in list, dict, set, and tuple cover most needs, but the collections
module adds specialised containers that make common tasks shorter and faster. Knowing them
is a quick way to look fluent — Counter and defaultdict in particular show up
constantly.
Counter — tallying made trivial
Counter is a dict subclass that counts hashable items. It turns a manual tally loop into
one line:
from collections import Counter
c = Counter("mississippi")
c # Counter({'s': 4, 'i': 4, 'p': 2, 'm': 1})
c["s"] # 4
c["z"] # 0 — missing keys return 0, not KeyError
c.most_common(2) # [('s', 4), ('i', 4)]
It also does arithmetic — Counter(a) + Counter(b) merges counts — which is great for
combining tallies.
defaultdict — no more "key not found"
defaultdict calls a factory function to supply a default the first time you touch a
missing key. The classic use is grouping:
from collections import defaultdict
groups = defaultdict(list)
for name in ["Ada", "Alan", "Brian"]:
groups[name[0]].append(name) # no need to check if the key exists first
groups # {'A': ['Ada', 'Alan'], 'B': ['Brian']}
The factory can be list, set, int (for counting), or any zero-argument callable.
defaultdict vs dict.setdefault
Both handle missing keys, but differently. setdefault works on a plain dict and is
evaluated every call; defaultdict only invokes the factory on a miss:
d = {}
d.setdefault("a", []).append(1) # works, but builds a [] every call
dd = defaultdict(list)
dd["a"].append(1) # cleaner; factory called only on miss
One caveat: simply reading dd[missing] creates the key in a defaultdict. Use
dd.get(k) if you want to look without inserting.
deque — fast operations at both ends
A deque (double-ended queue) gives O(1) appends and pops at both ends, unlike a list
where insert(0, x) and pop(0) are O(n):
from collections import deque
q = deque([1, 2, 3])
q.appendleft(0) # deque([0, 1, 2, 3])
q.popleft() # 0 — O(1), great for queues
q.append(4) # deque([1, 2, 3, 4])
dq = deque(maxlen=3) # bounded — old items drop off automatically
Use deque for queues, BFS frontiers, and sliding windows (maxlen).
OrderedDict — still useful?
Since Python 3.7, regular dicts guarantee insertion order, so you rarely need
OrderedDict just for ordering. It still offers a couple of unique features:
from collections import OrderedDict
od = OrderedDict()
od.move_to_end("key") # reorder explicitly
od.popitem(last=False) # pop from the front (LRU-cache pattern)
OrderedDict(a=1) == OrderedDict(a=1) # equality is order-sensitive
Plain dict equality ignores order; OrderedDict equality respects it.
ChainMap — layered lookups
ChainMap groups several dicts into one view, searching them in order — perfect for layered
configuration (CLI args over env over defaults):
from collections import ChainMap
defaults = {"color": "red", "size": "M"}
overrides = {"color": "blue"}
config = ChainMap(overrides, defaults)
config["color"] # 'blue' (found in overrides first)
config["size"] # 'M' (falls through to defaults)
Writes go to the first mapping only; the underlying dicts stay separate.
Recap
collections gives you sharper tools than the built-ins for common jobs: Counter for
tallying (missing keys are 0, plus most_common), defaultdict for grouping without
existence checks (factory runs only on a miss — but reads create keys), deque for O(1)
operations at both ends and bounded sliding windows, OrderedDict for the few
order-sensitive operations dicts lack, and ChainMap for layered lookups across multiple
dictionaries.