Requirements
Hash map is a data structure which helps us store, delete and retrieve values by keys in O(1).
Practical Use Case of hash map
- Lookup contact by name in phone book list
- Database index
- Dictionaries
- Associative arrays
- Caches
- Sets
Without HashMap:
finding a key in a list of values, we need to go through all indexes until we find a node matching a key in O(n).
If the list is sorted then we can use binary search to find the key in log(n).
Tree structure can also be used for key value storage with O(log(n)) for insert, delete and lookup.
If we have a small set of keys we can use direct addressing.
For O(1) on average we need dynamic addressing AKA hash map.
Prerequisite Knowledge of Hash Map
- The basics of hash map as a key (of type T1) and value (of type T2) storage
- O(1) for insert, delete, lookup operations
- Key value types validation
- Able to imaging practical use cases of hash map (see above)
- Hash map load factor
- Possible hash function implementations
- Hash function alignment to index by mod or multiply
- Interpreting keys as real numbers
- Collision resolution types
- Chaining
- Separate Chaining
- Coalesced chaining
- Direct addressing
- Linear Probing
- Quadratic Probing
- Double Hashing
- Chaining
See references to learn these subjects
Challenge constraints & assumptions
For simplicity, are the keys integers only?
Yes
For collision resolution, can we use chaining?
Yes
Do we have to worry about load factors?
No
Can we assume inputs are valid or do we have to validate them?
Assume they’re valid
Can we assume this fits memory?
Yes
Do we need to make sure that keys are spread evenly across indexes?
No
Does adding same key twice updates or throw duplicate error?
Add is also set – update if exists
Analysis
Lets simplify & break the above to use cases and other valuable facts/functionality:
- hash map (noun) is a (relationship) key (noun) value (noun) storage (noun)
- Array of buckets or slots (noun) from which the value can be found
- has a (relationship) function (noun) that computes (verb) the index (noun)- hash code (noun)
- Might cause hash collisions where the hash function generates some index for more than one key and that should be accommodated in some way – we will use chaining for collision resolution.
- Add (verb) item in O(1)
- Remove (verb) Item in O(1)
- Search (verb) Item in O(1)
Design
Implementation
Future Possible Features
- EnlargeCapacity+Reindexing
- LoadFactor calculate
- Parallel testing best suited index size by current capacity
Hash Functions Examples
- Djb2x – https://brilliant.org/wiki/hash-tables/
- String hash table – https://cs.nyu.edu/courses/fall17/CSCI-UA.0102-001/Notes/HashTableImplementation.html
- SHA-256 and SHA3-256 Hashing in Java – https://www.baeldung.com/sha-256-hashing-java
# a simple hashing algorithm with acceptable collision rate
def _djb2x_hash(string):
hash = 5381
byte_array = string.encode(‘utf-8’)
for byte in byte_array:
# the modulus keeps it 32-bit, python ints don’t overflow
hash = ((hash * 33) ^ byte) % 0x100000000
return hash
# String hash table
NOTE: The hash function for Strings given is pretty good. It views the string as essentially a numeral in base 37 (or base 43, if the hash table size is close to a multiple of 37) and then reduces that number mod the table size. This hash function is a little slower than ideal, i.e. for German words, which are often long.
// Hashes string S to a value between 0 and L-1;
// This is a polynomial hash code : See Goodrich, Tamassia, and Goldwasser
// p. 412 (Mult here corresponds to their parameter a).
//
public int MyHash(String S) {
int Mult = 37;
int Rem = table.length % Mult;
if (Rem==0 || Rem==1 || Rem == table.length-1) Mult=43;
int Sum = 0;
for (int i=0; i<S.length(); i++)
Sum = (Mult*Sum + (int)S.charAt(i)) % table.length;
return Sum;
}
References:
(1) Challenge by: https://github.com/donnemartin/system-design-primer/blob/master/solutions/object_oriented_design/hash_table/hash_map.ipynb
(2) Hash Table wikipedia: https://en.wikipedia.org/wiki/Hash_table
(3) From the book: LEISERSON, C H A R L E S E., and R O N A L D. “Hash Tables.” Introduction to Algorithms, by Thomas H. Cormen, MIT Press, 2009, pp. 253–253. https://web.iiit.ac.in/~pratik.kamble/storage/Algorithms/Cormen_Algorithms_3rd.pdf
(4) Collision handling - https://stackoverflow.com/questions/4980757/how-do-hashtables-deal-with-collisions
(5) Hash table implementation - New York University - https://cs.nyu.edu/courses/fall17/CSCI-UA.0102-001/Notes/HashTableImplementation.html
(6) Hash Function - Wolfram - http://mathworld.wolfram.com/HashFunction.html