All Posts

What are Hash Tables? Basics Explained

Hash Tables are the magic behind O(1) lookups. We explain Hash Functions, Collisions, and why the...

Abstract AlgorithmsAbstract Algorithms
Β·Β·5 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: A hash table gives you near-O(1) lookups, inserts, and deletes by using a hash function to map keys to array indices. The tradeoff: collisions (when two keys hash to the same slot) must be handled, and a full hash table must be resized.


πŸ“– The Magic Mailroom Stamp

Imagine a mailroom with 100 drawers. A magic stamp converts any employee name to a number 1–100. Want John's mail? Stamp "John" β†’ you get 42 β†’ open Drawer 42. No search needed.

That stamp is a hash function. The row of drawers is the array (the backing store). The combination is a hash table.


πŸ”’ Building Blocks: Array + Hash Function

flowchart LR
    Key["Key: 'cat'"] --> HashFn["Hash Function\nh('cat') = 2"]
    HashFn --> Array["Array\n[0][1][2: 'cat→meow'][3]..."]
    Array --> Value["Value: 'meow'"]

A hash function must be:

  1. Deterministic: Same input always produces the same output.
  2. Fast: O(1) computation.
  3. Uniform: Keys distribute evenly to minimize collisions.

Simple hash function (NOT production-quality):

def simple_hash(key: str, size: int) -> int:
    return sum(ord(c) for c in key) % size

Python's built-in hash() uses SipHash (cryptographically secure version) to prevent hash flooding attacks.


βš™οΈ Collisions: When Two Keys Land in the Same Slot

No hash function is perfect. Given a finite array, two different keys will eventually map to the same slot.

Example:

  • hash('cat') = 2 β†’ Stored at index 2.
  • hash('bird') = 2 β†’ Collision at index 2!

Two collision resolution strategies:

Separate Chaining

Each slot holds a linked list of key-value pairs. On collision, append to the list.

index 2: [('cat', 'meow') β†’ ('bird', 'tweet')]

O(1) average, O(n) worst case if all keys hash to the same slot.

Open Addressing (Linear Probing)

No linked lists. On collision, step forward until you find an empty slot.

hash('cat') = 2 β†’ store at 2
hash('bird') = 2 (full) β†’ probe 3 β†’ store at 3

Pros: Better cache performance (all data in one array). Cons: Clustering β€” many consecutive full slots slow probing.

StrategySpaceLookup (avg)Cache FriendlyUsed In
Separate ChainingExtra memory for listsO(1)NoJava HashMap (until JDK 8)
Linear ProbingCompact arrayO(1)YesPython dict, Java HashMap (JDK 8+)
Quadratic ProbingCompact arrayO(1)PartialOpen-addressing tables
Robin Hood HashingCompact arrayO(1)YesRust HashMap

🧠 Load Factor and Rehashing

Load factor = (number of entries) / (array capacity).

As the table fills up (load factor β†’ 1.0), collisions increase and performance degrades.

Standard practice: rehash when load factor exceeds 0.75.

Rehashing:

  1. Create a new array 2Γ— the size.
  2. Re-insert every existing key into the new array (re-hash all keys).
  3. Old array is garbage collected.

Cost: O(N) for one rehash. But since it happens at doubly-spaced intervals, the amortized cost per insert is O(1).


βš™οΈ Why Python Dicts Are Fast

Python dict uses open addressing with compact arrays (since Python 3.6). Entries are stored in an incrementally-growing compact array indexed by insertion order, with a separate hash table of indices pointing into it. This gives:

  • Fast iteration (compact memory, no pointer chasing).
  • Memory-efficient (no linked list overhead).
  • Dict guaranteed ordered by insertion order.

βš–οΈ Hash Tables vs. Other Data Structures

OperationHash TableBST (e.g., TreeMap)Array (sorted)
Lookup by keyO(1) avgO(log n)O(log n) binary search
InsertO(1) avgO(log n)O(n) shift
DeleteO(1) avgO(log n)O(n) shift
Range query❌ Not supportedβœ… O(log n + k)βœ… O(log n + k)
Ordered iteration❌ No orderβœ… In-orderβœ… Sorted

Choose hash table when: You need fast key-value lookup and don't need ordering or range queries. Choose BST when: You need sorted iteration or range queries (e.g., "find all keys between 10 and 50").


πŸ“Œ Summary

  • Hash function maps key β†’ array index in O(1). Must be deterministic and uniform.
  • Collisions are inevitable; Separate Chaining uses linked lists; Linear Probing uses the array itself.
  • Rehash at load factor 0.75 to keep average O(1); amortized cost is O(1) per insert.
  • Hash tables have no order β€” use a BST when you need sorted iteration or range queries.

πŸ“ Practice Quiz

  1. What is the primary purpose of a hash function in a hash table?

    • A) To encrypt the key for security.
    • B) To convert a key to an integer array index in O(1) time, enabling direct-access storage.
    • C) To sort the keys in the table.
      Answer: B
  2. A hash table has 100 slots and 90 entries. What is the load factor, and what should happen next?

    • A) Load factor = 0.9; rehash now (exceeds 0.75 threshold) β€” create a 200-slot table and re-insert all entries.
    • B) Load factor = 90; nothing β€” the table still has 10 empty slots.
    • C) Load factor = 0.9; delete the 15 least-recently-used entries to stay under capacity.
      Answer: A
  3. When would you choose a Binary Search Tree over a Hash Table?

    • A) When you need O(1) lookup.
    • B) When you need sorted iteration or range queries (e.g., "all keys between 50 and 100").
    • C) When your dataset is too small for a hash table.
      Answer: B

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms