Skip to main content
Data StructuresIntermediate15 min readUpdated March 2025

Hash Tables

Hash tables provide O(1) average-case insert, delete, and search by mapping keys to array indices using a hash function. They are one of the most widely used data structures, powering Python dicts, Java HashMaps, and database indexes.

How Hash Tables Work

A hash table (also called hash map or dictionary) stores key-value pairs. The core idea:

1. Hash function — Converts a key into an integer index: hash(key) % table_size 2. Array — The underlying storage. The hash function determines which slot to use. 3. Collision resolution — Handles the case when two keys hash to the same index

A good hash function: - Is deterministic — Same key always produces the same hash - Is fast to compute — O(1) - Distributes uniformly — Minimizes collisions - Avalanche effect — Small changes in input produce very different outputs

Collision Resolution Strategies

Collisions are inevitable (birthday paradox). Two main strategies:

  • Chaining (Separate Chaining) — Each slot holds a linked list of all key-value pairs that hash to that index. Simple to implement. Performance degrades gracefully. Used by Java HashMap.
  • Open Addressing — When a collision occurs, probe for the next available slot. Variants: Linear Probing (next slot), Quadratic Probing (i^2 steps), Double Hashing (second hash function).
  • Load Factor — ratio of stored elements to table size. When load factor exceeds a threshold (typically 0.75), the table is resized (rehashed) to maintain O(1) performance.
  • Python dicts use open addressing with a sophisticated probing scheme and resize at 2/3 load factor.

Hash Table Implementation with Chaining

A hash table implementation using separate chaining:

python
class HashTable:
    def __init__(self, initial_capacity=16, load_factor=0.75):
        self._capacity = initial_capacity
        self._load_factor = load_factor
        self._size = 0
        self._buckets = [[] for _ in range(self._capacity)]

    def _hash(self, key):
        return hash(key) % self._capacity

    def put(self, key, value):
        idx = self._hash(key)
        bucket = self._buckets[idx]
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)   # Update existing
                return
        bucket.append((key, value))         # Insert new
        self._size += 1
        if self._size / self._capacity > self._load_factor:
            self._resize()

    def get(self, key):
        idx = self._hash(key)
        for k, v in self._buckets[idx]:
            if k == key:
                return v
        raise KeyError(key)

    def delete(self, key):
        idx = self._hash(key)
        bucket = self._buckets[idx]
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket.pop(i)
                self._size -= 1
                return
        raise KeyError(key)

    def _resize(self):
        old_buckets = self._buckets
        self._capacity *= 2
        self._buckets = [[] for _ in range(self._capacity)]
        self._size = 0
        for bucket in old_buckets:
            for key, value in bucket:
                self.put(key, value)
        print(f"Resized to capacity: {self._capacity}")

    def __contains__(self, key):
        try:
            self.get(key)
            return True
        except KeyError:
            return False

# Test
ht = HashTable()
ht.put("name", "Alice")
ht.put("age", 30)
ht.put("city", "NYC")

print(ht.get("name"))    # Alice
print("age" in ht)       # True
ht.delete("city")
print("city" in ht)      # False

Hash Functions for Different Key Types

Different key types require different hashing approaches:

python
# Python's built-in hash() works for most types
print(hash("hello"))         # Consistent within a session
print(hash(42))              # 42
print(hash((1, 2, 3)))       # Tuples are hashable (immutable)
# print(hash([1, 2, 3]))     # TypeError: lists are not hashable!

# ---- Custom hash for strings (djb2 algorithm) ----
def djb2_hash(s, table_size):
    h = 5381
    for char in s:
        h = ((h << 5) + h) + ord(char)   # h * 33 + char
    return h % table_size

# ---- Rolling hash for substring search (Rabin-Karp) ----
def rabin_karp(text, pattern):
    n, m = len(text), len(pattern)
    base, mod = 256, 10**9 + 7
    pattern_hash = 0
    window_hash = 0
    h = pow(base, m - 1, mod)

    for i in range(m):
        pattern_hash = (base * pattern_hash + ord(pattern[i])) % mod
        window_hash  = (base * window_hash  + ord(text[i])) % mod

    for i in range(n - m + 1):
        if pattern_hash == window_hash:
            if text[i:i+m] == pattern:   # Verify (handle hash collision)
                return i
        if i < n - m:
            window_hash = (base * (window_hash - ord(text[i]) * h) + ord(text[i+m])) % mod
    return -1

print(rabin_karp("hello world", "world"))   # 6

Key Takeaways

  • Hash tables provide O(1) average-case insert, delete, and search by mapping keys to array indices.
  • Collisions are resolved via chaining (linked lists per bucket) or open addressing (probing).
  • Load factor (size/capacity) must be kept below a threshold (0.75) to maintain O(1) performance.
  • Only immutable objects can be dictionary keys in Python — they must be hashable.
  • Worst-case hash table performance is O(n) when all keys collide — a good hash function prevents this.

Contact Us

Have a question or feedback? Fill out the form below or reach us directly at support@nvaitraining.com