2026-04-01
arrayqueue.py on Discord.PDF file on Canvas.Hash TablesSpeed Comparison:
| Data Structure | Search Time |
|---|---|
| Array | O(N) |
| Binary Tree | O(log N) |
| Hash Table | O(1) |
Hash tables provide constant-time insertion and searching!
Challenge: Store 50,000 English words in memory with fast access.
Create a custom code for lowercase letters:
| Character | Code | Character | Code |
|---|---|---|---|
| (space) | 0 | n | 14 |
| a | 1 | o | 15 |
| b | 2 | … | … |
| c | 3 | z | 26 |
Total: 27 characters (26 letters + space)
Convert each letter to its code and sum them.
Example: “elf”
e = 5
l = 12
f = 6
---------
Sum = 23
Store “elf” at array index 23.
For 10-letter words: - Minimum: “a” → 0+0+0+0+0+0+0+0+0+1 = 1 - Maximum: “zzzzzzzzzz” → 26×10 = 260
Range: 1 to 260
Problem: Only 260 possible indices for 50,000 words! - Average: ~192 words per cell - Too many collisions!
All these words hash to 23:
acne, ago, aim, baked, cable, elf, hack, ...
Make each position contribute uniquely to the final number.
Like decimal numbers:
7,546 = 7×10³ + 5×10² + 4×10¹ + 6×10⁰
For words (base 27):
"elf" = e×27² + l×27¹ + f×27⁰
= 5×729 + 12×27 + 6×1
= 3,645 + 324 + 6
= 3,975
Guarantee: Every word gets a unique number!
Example: “zzzzzzzzzz” (10 z’s)
26×27⁹ + 26×27⁸ + ... + 26×27⁰
Just 27⁹ alone = 7,625,597,484,987
Problem: Array can’t have 7+ trillion cells! - Most cells would be empty (for non-words) - Huge waste of memory
| Method | Range | Problem |
|---|---|---|
| Adding digits | 1-260 | Too small - massive collisions |
| Powers of 27 | 1-7 trillion | Too large - wasted memory |
We need something in between!
Use the modulo operator (%) to squeeze large numbers into a smaller range:
Example: Range 0-199 → Range 0-9
The modulo operation gives us the remainder, effectively “wrapping” large numbers into a smaller range.
def encode_letter(letter):
"""Encode letters a-z as 1-26, space as 0"""
letter = letter.lower()
if 'a' <= letter <= 'z':
return ord(letter) - ord('a') + 1
return 0
def unique_encode_word(word):
"""Encode word uniquely using powers of 27"""
return sum(encode_letter(word[i]) * 27 ** (len(word) - 1 - i)
for i in range(len(word)))
# Hash to array index
arraySize = 100000 # 2× the 50,000 words
arrayIndex = unique_encode_word(word) % arraySizeRule of thumb: Array should be 2× the number of items
Trade-off:
Collision: Two different keys hash to the same array index
Example: Three words hash to index 24,122:
| Word | Unique Code | Hash (mod 100,000) |
|---|---|---|
| bring | 1,424,122 | 24,122 |
| abductor | 11,303,824,122 | 24,122 |
| missable | 139,754,124,122 | 24,122 |
Collisions are unavoidable when compressing a large range into a smaller one!
Question: How many people needed before two likely share a birthday?
Think about pairs, not just people:
| People | Pairs | Collision Probability |
|---|---|---|
| 1 | 0 | 0% |
| 2 | 1 | 0.27% |
| 3 | 3 | 0.82% |
| 10 | 45 | 11.7% |
| 23 | 253 | 50.7% |
| 30 | 435 | 70.6% |
| 50 | 1,225 | 97.0% |
Number of comparisons (pairs) for n people:
\[\text{Pairs} = \binom{n}{2} = \frac{n(n-1)}{2}\]
Why this matters: - Each pair is a chance for collision - Grows quadratically (O(n²)) - 23 people → 253 comparisons in 365 days
The connection:
| Birthday Analogy | Hash Table |
|---|---|
| 365 days | Array cells |
| People | Items inserted |
| Shared birthday | Collision |
Key Insight: Even at 10% capacity, collisions are highly likely! - 23 items in 366 cells = 6.3% full → 50% collision chance - With 50,000 words in 100,000 cells → collisions are inevitable
When a collision occurs, we must have a strategy:
Open Addressing
Find another empty cell in the array
Separate Chaining
Store multiple items at the same index using linked lists
Hash Tables provide O(1) operations by:
Using hash functions to map keys → array indices
Compressing large ranges with modulo operator
Handling inevitable collisions with resolution strategies
Collision Resolution:
Open addressing: Linear probing, quadratic probing, double hashing
Separate chaining: Linked lists at each index
Collisions are MORE common than intuition suggests!
Design implication: Always plan for collisions from the start