Challenge: Design a HashMap

Requirements

Hash map is a data structure which helps us store, delete and retrieve values by keys in O(1).

Practical Use Case of hash map

  • Lookup contact by name in phone book list
  • Database index
  • Dictionaries
  • Associative arrays
  • Caches
  • Sets

Without HashMap:
finding a key in a list of values, we need to go through all indexes until we find a node matching a key in O(n).
If the list is sorted then we can use binary search to find the key in log(n).
Tree structure can also be used for key value storage with O(log(n)) for insert, delete and lookup.
If we have a small set of keys we can use direct addressing.
For O(1) on average we need dynamic addressing AKA hash map.

Prerequisite Knowledge of Hash Map

  • The basics of hash map as a key (of type T1) and value (of type T2) storage
  • O(1) for insert, delete, lookup operations
  • Key value types validation
  • Able to imaging practical use cases of hash map (see above)
  • Hash map load factor
  • Possible hash function implementations 
    • Hash function alignment to index by mod or multiply
    • Interpreting keys as real numbers
  • Collision resolution types
    • Chaining
      • Separate Chaining
      • Coalesced chaining
    • Direct addressing 
      • Linear Probing
      • Quadratic Probing
      • Double Hashing

See references to learn these subjects

Challenge constraints & assumptions

For simplicity, are the keys integers only?
Yes

For collision resolution, can we use chaining?
Yes

Do we have to worry about load factors?
No

Can we assume inputs are valid or do we have to validate them?
Assume they’re valid

Can we assume this fits memory?
Yes

Do we need to make sure that keys are spread evenly across indexes?
No

Does adding same key twice updates or throw duplicate error?
Add is also set – update if exists

Analysis

Lets simplify & break the above to use cases and other valuable facts/functionality: 

  • hash map (noun) is a (relationship) key (noun) value (noun) storage (noun)
  • Array of buckets or slots (noun) from which the value can be found
  • has a (relationship) function (noun) that computes (verb) the index (noun)- hash code (noun)
    • Might cause hash collisions where the hash function generates some index for more than one key and that should be accommodated in some way – we will use chaining for collision resolution.
  • Add (verb) item in O(1)
  • Remove (verb) Item in O(1)
  • Search (verb) Item in O(1)

Design

Implementation

Future Possible Features

  • EnlargeCapacity+Reindexing
  • LoadFactor calculate
  • Parallel testing best suited index size by current capacity

Hash Functions Examples

# a simple hashing algorithm with acceptable collision rate

def _djb2x_hash(string):
    hash = 5381
    byte_array = string.encode(‘utf-8’)
    for byte in byte_array:
        # the modulus keeps it 32-bit, python ints don’t overflow
        hash = ((hash * 33) ^ byte) % 0x100000000
    return hash

# String hash table

NOTE: The hash function for Strings given is pretty good. It views the string as essentially a numeral in base 37 (or base 43, if the hash table size is close to a multiple of 37) and then reduces that number mod the table size. This hash function is a little slower than ideal, i.e. for German words, which are often long.

// Hashes string S to a value between 0 and L-1;
// This is a polynomial hash code : See Goodrich, Tamassia, and Goldwasser
// p. 412 (Mult here corresponds to their parameter a).
//
     public int MyHash(String S) {
       int Mult = 37;
       int Rem = table.length % Mult;
       if (Rem==0 || Rem==1 || Rem == table.length-1) Mult=43;
       int Sum = 0;
       for (int i=0; i<S.length(); i++)
         Sum = (Mult*Sum +  (int)S.charAt(i)) % table.length;
       return Sum;
      }

References:

(1) Challenge by: https://github.com/donnemartin/system-design-primer/blob/master/solutions/object_oriented_design/hash_table/hash_map.ipynb
(2) Hash Table wikipedia: https://en.wikipedia.org/wiki/Hash_table
(3) From the book: LEISERSON, C H A R L E S E., and R O N A L D. “Hash Tables.” Introduction to Algorithms, by Thomas H. Cormen, MIT Press, 2009, pp. 253–253. https://web.iiit.ac.in/~pratik.kamble/storage/Algorithms/Cormen_Algorithms_3rd.pdf
(4) Collision handling - https://stackoverflow.com/questions/4980757/how-do-hashtables-deal-with-collisions
(5) Hash table implementation - New York University - https://cs.nyu.edu/courses/fall17/CSCI-UA.0102-001/Notes/HashTableImplementation.html
(6) Hash Function - Wolfram - http://mathworld.wolfram.com/HashFunction.html