Hashing¶
Hashing is an algorithm to map an input value (key) to an output value. It is
usually defined as a function with a single parameter and a return value. The
mappings are usually expressed using arrows like string -> int, int ->
int.
Hash function¶
- Domain of a hash function: The set of all possible inputs to a hash function 
- Range of a hash function: The set of all possible outputs of a hash function 
- Required properties - deterministic (required!) 
- uniform distribution - hash function that maps all possible input values (keys) with roughly equal probability to each possible output value (hash code) 
- ensures minimal collision 
- hard to guarantee and prove 
 
 
- Other properties (application specific) - variable length key to fixed length hash value 
- constant time computation \(\Theta(1)\) 
- squeeze a large domain to a smaller range 
- chance of collision 
- asymmetric 
 
- Wellknown examples: MD5, SHA-1, SHA-2 
- Make your own hash function 
Applications¶
- Hash table data structure - Hash Map (Map ADT, dictionary, key-value store) 
- Hash Set (Set ADT) 
 
- Fingerprinting - Short summary of a large data set 
- Checksum 
- Password management 
 
- Cryptography 
- Advanced data structures - Bloom filter 
- Cuckoo filter 
- Merkle tree (e.g. BitCoin) 
 
Hashing Glossary¶
- Hash Function¶
- A function that maps a variable sized value to a fixed sized value. 
- Key (hashing)¶
- The input value to the hash function. 
- Value (hashing)¶
- A.k.a hash value, hash code or simply hash. The output of the hash function. 
- Key space / Domain (hashing)¶
- The set of all possible keys. 
- Value space / Range (hashing)¶
- The set of all possible values. 
- Collision (hashing)¶
- Multiple distinct keys maps to a same output value.