Hash Table

Hash table uses hashing to implement a data structure that maps a key to an index so the value can be stored in the corresponding location (a bucket) in a sequential data structure like an array.

Quick question: why linked-list is not a good way to store buckets in a hash table? (Because the most frequent usage is to access by index, which is slow with linked-list data structure)

Characteristics

  • fast lookup \(\Theta(1)\)

  • redundant storage (certain ratio of empty buckets)

    • performance degradation

Hash function for hash tables

  • \(h(x)\)

  • a key of any type mapping to integer from 0 to N - 1 (indices in a size N array)

  • modulo operator on any integer generating hash function

Collision resolution

Methods to handle collision when multiple keys are mapped to a same bucket.

Chaining

  • each bucket is a list to store multiple values (chained)

  • Nodes in the linked-list keeps key, value and next pointer

  • unlimited storage space

  • Performance degradation when over-filled

Open addressing

  • find another available bucket to store the value when a collision occurs

  • probing sequence: the sequence of the buckets to check when collision happens

  • simplest linear probing:

    \((h1(key) + i) \% NUM\_BUCKETS, i = 0, 1, 2, 3 \dots\)

  • other probing (optional): linear probing that skips, quadratic probing, and double hashing

  • implementation

    • each bucket keeps both key and value

    • search in the probing sequence until find the key or an empty bucket.

    • insertion

      • if the key is already in the hash table, update the value

      • if the key is not in the hash table, find an empty bucket in the probing sequence and store the key and value there

    • deletion

      • will create a hole in the probing sequence that interferes with future searches, insertion and deletion

      • rearrange after every deletion

      • differentiate empty-since-start vs empty-after-removal (Optional)

Hash Table Glossary

Hash Table

An associative data structure that models mapping from keys to values. It is usually implemented as an array. A hash function is employed to map the key to the index of the array element where the value is stored

Key (hash table)

The input to the hash function to calculate the index of the buckets in the hash table

Value (hash table)

The values stored in the buckets in tha hash table

Collision (hash table)

A collision occurs when multiple keys result in the same index in the array

Collision resolution

The approach to resolve collisions in hash tables

Chaining

A collision resolution approach in which every element in an array is a linked-list

Open addressing

A collision resolution approach in which an empty bucket elsewhere in the hash table is used when collision happens

Probing

The mechanism to determine the next index of bucket to check in the open addressing collision resolution approach

Hash map

An implementation of the Map ADT based on the hash table data structure

Hash set

An implementation of the Set ADT based on the hash table data structure