A HashMap stores entries in an array of buckets (Node<K,V>[] table).
Each bucket holds a chain of Node objects, where a Node packs the key's
hash, the key, the value, and a next pointer to the following
node in the same bucket.
static class Node<K,V> {
final int hash; // cached spread hash of the key
final K key;
V value;
Node<K,V> next; // chains entries in the same bucket
}
On put, the key's hash picks a bucket index; the entry is stored there (or
appended to the chain on a collision). On get, the same index is computed and
the chain is scanned with equals to find the matching key. With a good hash,
operations are average O(1); the array is resized as it fills to keep chains
short.
The bucket index is computed as (n - 1) & hash, which only keeps the low
bits of the hash (since n is a power of two). If two keys differ only in
their high bits, they'd collide. To mix high bits down into the low ones,
HashMap applies a spread function before indexing.
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
XORing the hash with its own upper 16 bits is cheap (one shift, one XOR) yet
meaningfully reduces collisions for hashes whose entropy lives in the high bits.
Note a null key is mapped to hash 0, which is why it always lands in
bucket 0.
It uses index = (n - 1) & hash, where n is the table length. Because n
is always a power of two, n - 1 is a mask of all-ones in the low bits, so the
bitwise & is equivalent to hash % n but far faster than a modulo.
int n = 16; // table length, a power of two
int idx = (n - 1) & hash; // 15 & hash -> keeps the low 4 bits
This is the whole reason capacity must be a power of two: the & trick only
distributes entries evenly across all buckets when n - 1 is a clean low-bit
mask. A non-power-of-two n would leave some buckets unreachable.
Two reasons, both about the (n - 1) & hash indexing. First, speed: a
bitwise AND replaces an expensive modulo. Second, distribution: when n is
a power of two, n - 1 is 0b...0111…1, so the AND keeps a contiguous block of
low bits and every bucket is reachable. If n weren't a power of two, the mask
would have gaps and some buckets could never be hit.
// HashMap rounds any requested capacity up to the next power of two
new HashMap<>(10); // actual table capacity becomes 16
new HashMap<>(20); // becomes 32
The constructor runs tableSizeFor to round your requested initial capacity up
to the nearest power of two, so you can never actually create a non-power-of-two
table.
- Compute the spread hash of the key.
- Compute the bucket index
(n - 1) & hash. - If the bucket is empty, drop the new node in.
- If occupied, scan the chain: a node matching by
hashandequalsmeans an existing key — overwrite its value and return the old one. - Otherwise append a new node; if the chain length crosses 8 (and the table is ≥ 64), treeify the bucket.
- Increment
size; ifsize > threshold(capacity × load factor), resize.
map.put("a", 1); // new bucket
map.put("a", 2); // same key -> value replaced, returns 1
map.put("b", 3); // collision? append to chain
The match step is why equals is essential: same bucket does not mean same
key — the chain must be walked and compared.
get mirrors put's lookup half: spread the key's hash, compute the index, then
scan the bucket. For each node it first compares the cached hash (a cheap
int compare) and only then calls equals — the hash check short-circuits
most mismatches.
V get(Object key) {
// hash -> index -> walk chain
// return node where node.hash == h && (k == node.key || k.equals(node.key))
}
In a linked-list bucket this is O(chain length); in a treeified bucket it's
O(log n). A missing key returns null — which is indistinguishable from a
key mapped to null, so use containsKey when that difference matters.
When two keys map to the same bucket (a collision), HashMap chains them.
Originally every bucket is a singly linked list of Nodes. As of Java 8,
once a single bucket's chain grows past a threshold the bucket is converted to a
red-black tree so lookups within it stay logarithmic rather than linear.
bucket[5] -> Node(k1) -> Node(k2) -> Node(k3) // linked list
// after treeification:
bucket[5] -> (balanced red-black tree of TreeNodes)
So a collision degrades the bucket from O(1) toward O(log n) at worst (with a
reasonable hashCode), not toward O(n). Distinct keys can collide even with
perfect hashCodes simply because many hashes fold into the same small index.
Treeification converts a bucket from a linked list to a red-black tree when it gets too long, bounding worst-case lookups at O(log n) instead of O(n). Two conditions must both hold:
| Constant | Value | Meaning |
|---|---|---|
TREEIFY_THRESHOLD |
8 | chain length that triggers treeify |
MIN_TREEIFY_CAPACITY |
64 | table must be at least this big |
UNTREEIFY_THRESHOLD |
6 | tree shrinks back to a list at/below this |
// if a bucket reaches 8 nodes but table < 64,
// HashMap resizes instead of treeifying
If the table is smaller than 64, a long chain usually means the table is just too
small, so HashMap resizes rather than treeifying. The gap between 8 and 6
(hysteresis) avoids flip-flopping between tree and list on repeated add/remove.
The defaults are initial capacity 16 and load factor 0.75. The load
factor is the fullness fraction at which the table grows: the threshold =
capacity × load factor, so a default map resizes when it holds more than
16 × 0.75 = 12 entries.
new HashMap<>(); // capacity 16, threshold 12
new HashMap<>(32, 0.5f); // capacity 32, threshold 16
0.75 is a deliberate time/space trade-off: lower factors waste memory but
reduce collisions; higher factors save memory but lengthen chains. If you know
the final size, presize with new HashMap<>(expected / 0.75 + 1) to avoid
repeated resizes.
When size exceeds the threshold, HashMap doubles the capacity (a new
power-of-two table) and rehashes every entry into the larger table. Doubling
keeps the power-of-two invariant so (n - 1) & hash stays valid.
// Java 8 split trick: with capacity doubled, an entry either stays
// at index i, or moves to index i + oldCapacity — decided by one bit:
if ((node.hash & oldCapacity) == 0) // stays in "low" bucket
else // moves to "high" bucket
Because only the single new high bit of the hash decides where an entry goes, Java 8 splits each old bucket's chain into exactly two ordered sub-chains without recomputing indexes from scratch. Resizing is O(n) and rebuilds the table, which is why presizing matters for large maps.
The contract: if a.equals(b) then a.hashCode() == b.hashCode()
must hold (the reverse need not — unequal objects may share a hash). equals
must also be reflexive, symmetric, transitive, and consistent.
class Point {
int x, y;
@Override public boolean equals(Object o) { /* compare x,y */ }
@Override public int hashCode() { return Objects.hash(x, y); }
}
HashMap relies on this directly: it uses hashCode to find the bucket and
equals to find the key within it. Override only one and a key you stored
becomes unfindable — get lands in the wrong bucket (or the right bucket but
fails the equals check) and returns null. Always override the two
together.
If hashCode returns the same value for every key (e.g. return 42; or
return 0;), every entry hashes to one bucket. That single bucket becomes one
giant chain, so get/put must scan it linearly — O(n) per operation,
defeating the whole point of a hash map.
class Bad {
@Override public int hashCode() { return 1; } // all collide!
}
// 10,000 Bad keys -> one bucket of 10,000 nodes -> O(n) lookups
Treeification softens but doesn't fix this: the bucket becomes a red-black tree,
improving to O(log n) — and only if the keys are also Comparable, otherwise
it falls back to a list. A well-distributed hashCode is the real cure.
Yes — HashMap allows one null key and any number of null values. The
null key is special-cased to hash 0, so it always sits in bucket 0.
Map<String, String> m = new HashMap<>();
m.put(null, "x"); // legal — single null key
m.put("a", null); // legal — null value
m.get("missing"); // null — but is the key absent or mapped to null?
System.out.println(m.containsKey("a")); // true, even though value is null
The ambiguity of a null return is the catch: get returns null both for an
absent key and a key mapped to null. Use containsKey to tell them apart.
Contrast with Hashtable and ConcurrentHashMap, which forbid nulls entirely.
HashMap's iterators are fail-fast: they track a modCount and throw
ConcurrentModificationException if the map is structurally modified (add or
remove) during iteration through any path other than the iterator itself.
for (String k : map.keySet()) {
if (k.startsWith("x")) map.remove(k); // throws CME
}
// safe: remove through the iterator
Iterator<String> it = map.keySet().iterator();
while (it.hasNext()) { if (it.next().startsWith("x")) it.remove(); }
Fail-fast is a best-effort bug detector, not a guarantee — it's specified as
"throws on a best-effort basis," so never write logic that depends on the
exception. To mutate while iterating, use iterator.remove(), collect-then-
remove, or removeIf.
HashMap has no synchronization, so concurrent puts can interleave and
corrupt the table — lost updates, wrong size, or a thread seeing a half-built
state. The most infamous case was an infinite loop on resize in pre-Java-8
JDKs.
old resize() reversed each bucket's chain while transferring it;
two threads resizing at once could splice A -> B -> A into a cycle,
so a later get() would spin forever at 100% CPU.
Java 8 changed resize to preserve chain order (the low/high split), which removed
that specific cycle — but HashMap is still not thread-safe and can lose
data or throw under concurrency. For shared maps use ConcurrentHashMap;
Collections.synchronizedMap works but locks the whole map.
A key's hash is computed when it's inserted and used to pick its bucket. If
you then mutate a field that affects hashCode/equals, the key's new hash
points to a different bucket than where it's actually stored — so the map can
no longer find it.
Set<List<Integer>> s = new HashSet<>();
List<Integer> key = new ArrayList<>(List.of(1, 2));
s.add(key);
key.add(3); // mutates the key's hashCode!
System.out.println(s.contains(key)); // false — lost in the wrong bucket
The entry becomes a "ghost": present in the map but unreachable by get or
contains, and possibly duplicated on re-insert. Prefer immutable keys
(String, Integer, records, frozen value objects) so the hash can never drift.
All implement Map, but they differ in ordering and complexity:
| Map | Ordering | get/put | Backing structure |
|---|---|---|---|
HashMap |
none (arbitrary) | O(1) avg | hash table |
LinkedHashMap |
insertion (or access) order | O(1) avg | hash table + linked list |
TreeMap |
sorted by key | O(log n) | red-black tree |
new HashMap<>(); // fastest, no order guarantee
new LinkedHashMap<>(); // predictable iteration order
new TreeMap<>(); // keys come out sorted; needs Comparable/Comparator
Pick HashMap by default, LinkedHashMap when you need stable iteration order
or an LRU cache (access-order mode + removeEldestEntry), and TreeMap when
you need sorted keys or range queries (firstKey, headMap, subMap).
| Feature | HashMap |
Hashtable |
ConcurrentHashMap |
|---|---|---|---|
| Thread-safe | no | yes (whole-method synchronized) |
yes (fine-grained) |
| Null key/values | 1 null key, null values | none | none |
| Locking | — | one lock for the whole table | per-bucket / CAS |
| Status | preferred single-thread | legacy | preferred concurrent |
Map<String,Integer> m = new ConcurrentHashMap<>();
m.compute("k", (k, v) -> v == null ? 1 : v + 1); // atomic, no external lock
Hashtable is a legacy class that locks every method, so it serializes all
access — slow and effectively retired. ConcurrentHashMap locks at the
bucket/bin level (with CAS for hot paths), so many threads can write to
different buckets in parallel. Its iterators are weakly consistent (no
ConcurrentModificationException).
Modern ConcurrentHashMap (Java 8+) abandoned the old segment design for the
same Node[] table as HashMap, plus per-bin locking and CAS. An empty
bin is filled with a lock-free compare-and-swap; a non-empty bin is updated
under a synchronized block on the bin's first node — so contention is
limited to the keys colliding in that one bin.
ConcurrentHashMap<String,Long> counts = new ConcurrentHashMap<>();
counts.merge("hit", 1L, Long::sum); // atomic increment, thread-safe
Reads are non-blocking (fields are volatile), so get never locks. Resizes
are cooperative: multiple threads help transfer bins concurrently. This is
why it scales far better than Hashtable's single lock while still being fully
correct.
A treeified bucket is a red-black tree, which normally orders nodes by the
key's compareTo. If the keys don't implement Comparable, HashMap can't
sort them, so it falls back to comparing identity hash codes and uses a
tie-breaking routine (tieBreakOrder) to keep the tree balanced — searches then
effectively scan rather than binary-search within ties.
// keys with equal hash but no Comparable order:
// tree falls back to System.identityHashCode() ordering for placement
The practical takeaway: treeification still bounds worst case better than a
plain list, but you only get the full O(log n) lookup benefit when colliding
keys are mutually Comparable. A good hashCode (so buckets never treeify) is
still the best defense.
HashSet is a thin wrapper around a HashMap: it stores each element as a
key, mapping every key to a single shared dummy value (a constant PRESENT
object). So all of HashSet's behavior — hashing, buckets, load factor,
treeification — is just HashMap's.
private static final Object PRESENT = new Object();
public boolean add(E e) {
return map.put(e, PRESENT) == null; // null means key was new
}
This is why HashSet has the same O(1) average operations, the same
null-element-allowed rule, and the same requirement that elements have a
proper hashCode/equals. LinkedHashSet and TreeSet likewise wrap
LinkedHashMap and TreeMap.
Each resize is an O(n) rehash, so for a map you'll fill to a known size,
presize it. Because resize triggers at capacity × 0.75, request an initial
capacity of about expected / 0.75 + 1 so the table never crosses its
threshold during loading.
int expected = 1000;
Map<String,Integer> m = new HashMap<>((int)(expected / 0.75f) + 1);
// Java 19+: clearer intent, does the math for you
Map<String,Integer> m2 = HashMap.newHashMap(expected);
Passing the raw expected size (new HashMap<>(1000)) is a common mistake — it
rounds to capacity 1024 but still resizes at 768 entries. Java 19's
HashMap.newHashMap(n) factory expresses "I will hold n entries" directly.
More Collections interview questions
More ways to practice
The self-quiz is live. Get notified when mock interviews and new question packs drop.