Hash Tables
Hash tables provide O(1) average-case insert, delete, and search by mapping keys to array indices using a hash function. They are one of the most widely used data structures, powering Python dicts, Java HashMaps, and database indexes.
How Hash Tables Work
A hash table (also called hash map or dictionary) stores key-value pairs. The core idea:
1. Hash function — Converts a key into an integer index: hash(key) % table_size 2. Array — The underlying storage. The hash function determines which slot to use. 3. Collision resolution — Handles the case when two keys hash to the same index
A good hash function: - Is deterministic — Same key always produces the same hash - Is fast to compute — O(1) - Distributes uniformly — Minimizes collisions - Avalanche effect — Small changes in input produce very different outputs
Collision Resolution Strategies
Collisions are inevitable (birthday paradox). Two main strategies:
- Chaining (Separate Chaining) — Each slot holds a linked list of all key-value pairs that hash to that index. Simple to implement. Performance degrades gracefully. Used by Java HashMap.
- Open Addressing — When a collision occurs, probe for the next available slot. Variants: Linear Probing (next slot), Quadratic Probing (i^2 steps), Double Hashing (second hash function).
- Load Factor — ratio of stored elements to table size. When load factor exceeds a threshold (typically 0.75), the table is resized (rehashed) to maintain O(1) performance.
- Python dicts use open addressing with a sophisticated probing scheme and resize at 2/3 load factor.
Hash Table Implementation with Chaining
A hash table implementation using separate chaining:
class HashTable:
def __init__(self, initial_capacity=16, load_factor=0.75):
self._capacity = initial_capacity
self._load_factor = load_factor
self._size = 0
self._buckets = [[] for _ in range(self._capacity)]
def _hash(self, key):
return hash(key) % self._capacity
def put(self, key, value):
idx = self._hash(key)
bucket = self._buckets[idx]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value) # Update existing
return
bucket.append((key, value)) # Insert new
self._size += 1
if self._size / self._capacity > self._load_factor:
self._resize()
def get(self, key):
idx = self._hash(key)
for k, v in self._buckets[idx]:
if k == key:
return v
raise KeyError(key)
def delete(self, key):
idx = self._hash(key)
bucket = self._buckets[idx]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket.pop(i)
self._size -= 1
return
raise KeyError(key)
def _resize(self):
old_buckets = self._buckets
self._capacity *= 2
self._buckets = [[] for _ in range(self._capacity)]
self._size = 0
for bucket in old_buckets:
for key, value in bucket:
self.put(key, value)
print(f"Resized to capacity: {self._capacity}")
def __contains__(self, key):
try:
self.get(key)
return True
except KeyError:
return False
# Test
ht = HashTable()
ht.put("name", "Alice")
ht.put("age", 30)
ht.put("city", "NYC")
print(ht.get("name")) # Alice
print("age" in ht) # True
ht.delete("city")
print("city" in ht) # FalseHash Functions for Different Key Types
Different key types require different hashing approaches:
# Python's built-in hash() works for most types
print(hash("hello")) # Consistent within a session
print(hash(42)) # 42
print(hash((1, 2, 3))) # Tuples are hashable (immutable)
# print(hash([1, 2, 3])) # TypeError: lists are not hashable!
# ---- Custom hash for strings (djb2 algorithm) ----
def djb2_hash(s, table_size):
h = 5381
for char in s:
h = ((h << 5) + h) + ord(char) # h * 33 + char
return h % table_size
# ---- Rolling hash for substring search (Rabin-Karp) ----
def rabin_karp(text, pattern):
n, m = len(text), len(pattern)
base, mod = 256, 10**9 + 7
pattern_hash = 0
window_hash = 0
h = pow(base, m - 1, mod)
for i in range(m):
pattern_hash = (base * pattern_hash + ord(pattern[i])) % mod
window_hash = (base * window_hash + ord(text[i])) % mod
for i in range(n - m + 1):
if pattern_hash == window_hash:
if text[i:i+m] == pattern: # Verify (handle hash collision)
return i
if i < n - m:
window_hash = (base * (window_hash - ord(text[i]) * h) + ord(text[i+m])) % mod
return -1
print(rabin_karp("hello world", "world")) # 6Key Takeaways
- Hash tables provide O(1) average-case insert, delete, and search by mapping keys to array indices.
- Collisions are resolved via chaining (linked lists per bucket) or open addressing (probing).
- Load factor (size/capacity) must be kept below a threshold (0.75) to maintain O(1) performance.
- Only immutable objects can be dictionary keys in Python — they must be hashable.
- Worst-case hash table performance is O(n) when all keys collide — a good hash function prevents this.
Contact Us
Have a question or feedback? Fill out the form below or reach us directly at support@nvaitraining.com