Hashing¶
Hashing is an algorithm to map an input value (key) to an output value. It is
usually defined as a function with a single parameter and a return value. The
mappings are usually expressed using arrows like string -> int
, int ->
int
.
Hash function¶
Domain of a hash function: The set of all possible inputs to a hash function
Range of a hash function: The set of all possible outputs of a hash function
Required properties
deterministic (required!)
uniform distribution
hash function that maps all possible input values (keys) with roughly equal probability to each possible output value (hash code)
ensures minimal collision
hard to guarantee and prove
Other properties (application specific)
variable length key to fixed length hash value
constant time computation \(\Theta(1)\)
squeeze a large domain to a smaller range
chance of collision
asymmetric
Wellknown examples: MD5, SHA-1, SHA-2
Make your own hash function
Applications¶
Hash table data structure
Hash Map (Map ADT, dictionary, key-value store)
Hash Set (Set ADT)
Fingerprinting
Short summary of a large data set
Checksum
Password management
Cryptography
Advanced data structures
Bloom filter
Cuckoo filter
Merkle tree (e.g. BitCoin)
Hashing Glossary¶
- Hash Function¶
A function that maps a variable sized value to a fixed sized value.
- Key (hashing)¶
The input value to the hash function.
- Value (hashing)¶
A.k.a hash value, hash code or simply hash. The output of the hash function.
- Key space / Domain (hashing)¶
The set of all possible keys.
- Value space / Range (hashing)¶
The set of all possible values.
- Collision (hashing)¶
Multiple distinct keys maps to a same output value.