Hashing

Hashing is an algorithm to map an input value (key) to an output value. It is usually defined as a function with a single parameter and a return value. The mappings are usually expressed using arrows like string -> int, int -> int.

Hash function

  • Domain of a hash function: The set of all possible inputs to a hash function

  • Range of a hash function: The set of all possible outputs of a hash function

  • Required properties

    • deterministic (required!)

    • uniform distribution

      • hash function that maps all possible input values (keys) with roughly equal probability to each possible output value (hash code)

      • ensures minimal collision

      • hard to guarantee and prove

  • Other properties (application specific)

    • variable length key to fixed length hash value

    • constant time computation \(\Theta(1)\)

    • squeeze a large domain to a smaller range

    • chance of collision

    • asymmetric

  • Wellknown examples: MD5, SHA-1, SHA-2

  • Make your own hash function

Applications

  • Hash table data structure

    • Hash Map (Map ADT, dictionary, key-value store)

    • Hash Set (Set ADT)

  • Fingerprinting

    • Short summary of a large data set

    • Checksum

    • Password management

  • Cryptography

  • Advanced data structures

    • Bloom filter

    • Cuckoo filter

    • Merkle tree (e.g. BitCoin)

Hashing Glossary

Hash Function

A function that maps a variable sized value to a fixed sized value.

Key (hashing)

The input value to the hash function.

Value (hashing)

A.k.a hash value, hash code or simply hash. The output of the hash function.

Key space / Domain (hashing)

The set of all possible keys.

Value space / Range (hashing)

The set of all possible values.

Collision (hashing)

Multiple distinct keys maps to a same output value.