Hash Table in C++¶

Hash table uses hashing to implement a data structure that maps a key to an index so the value can be stored in the corresponding location (a bucket) in a sequential data structure like an array.

Quick question: why linked-list is not a good way to store buckets in a hash table? (Because the most frequent usage is to access by index, which is slow with linked-list data structure)

Requirements
- Keys are unique (no duplicate keys)
- Keys are immutable
Properties
- Fast lookup, insert, and delete \(\Theta(1)\)
- Unordered
- Redundant storage to reduce collision and ensure performance

Hash function for hash tables¶

\(h(key)\) a special type of hash function used by hash table
Maps a key of a certain type to an integer in a certain range \(1, 2, \ldots, size - 1\), where \(size\) is the size of the hash table
Modulo arithmetic
Considerations
- Deterministic
- Fast to compute
- Uniform distribution

Hash Table Applications¶

Hash map
- Implementation of the Map ADT using hash table
- A.k.a dictionary, associative array, symbol table, etc.
- A collection of key-value pairs
- Key is unique and immutable
- Behaviors
  - Set a key-value pair
  - Get the value of a key (key lookup)
  - Delete a key-value pair
Hash set
- Implementation of the Set ADT using hash table
- A collection of unique elements
- A special case of key-value pairs when no value is present
- Only store keys (or say keys themselves are considered the values)
- Keys are unique and immutable
- Behaviors
  - Add a key
  - Remove a key
  - Check if a key exists
  - Set operations: Union, intersection, difference, subset, etc.

Collision resolution¶

Chaining¶

A.k.a separate chaining
Each bucket can store multiple key-value pairs
- Linked-list
- Dynamic array (vector)
unlimited storage space
Performance degradation when over-filled
C++ implementation
- vector<vector<pair<keyType, valueType>>> hashTable
  
  Vector of vectors of pair objects.
- vector<LinkedListNode<keyType, valueType> *> hashTable
  
  Vector of pointers to linked list.
- LinkedListNode<keyType, valueType> ** hashTable
  
  One dimensional dynamic array of pointers to a linked list.
```
1template <typename keyType, typename valueType>
2struct LikedListNode {
3  keyType key;
4  valueType value;
5  LinkedListNode * next;
6  LinkedListNode * prev;
7};
```

Open addressing¶

find another available bucket to store the value when a collision occurs
probing sequence: the sequence of the buckets to check when collision happens
Linear probing:

\((h1(key) + C_{1} * i) \% size, i = 0, 1, 2, 3 \dots size - 1\)
quadratic probing:

\((h1(key) + C_{1} * i^2) \% size, i = 0, 1, 2, 3 \dots size - 1\)
double hashing probing:

\((h1(key) + C_{1} * i * h2(key)) \% size, i = 0, 1, 2, 3 \dots size - 1\)

\(h1(key)\) is the primary hash function
\(h2(key)\) is the secondary hash function
\(C_{1}\) and \(C_{2}\), etc. are constants

lookup/insertion/removal implementation
- each bucket keeps both key and value
- differentiate empty-since-start vs empty-after-removal

C++ implementation

Bucket<keyType, valueType> * hashTable

One dimensional dynamic array of bucket objects.

vector<Bucket<keyType, valueType>> hashTable

Vector of bucket objects.

template <typename keyType, typename valueType>
struct Bucket {
  keyType key;
  valueType value;
  bool isEmpty = true;
  bool isDeleted = false;
};

Chaining Hash Map Example¶

class HashMap1 {
  struct Node {
    int key;
    int value;
    Node *next;
  };
  int capacity;
  int size;
  Node **buckets;  // 1D array of pointers to linked lists
  int hashFunction(int key) const;
public:
  HashMap1(int capacity = 101);  // prime number
  HashMap1(const HashMap1 &other);
  HashMap1 &operator=(const HashMap1 &other);
  ~HashMap1();
  void resize(int newCapacity); // rehashing
  void put(int key, int value);
  bool remove(int key);
  int get(int key) const;
  int size() const;
};

Open addressing example¶

class Bucket {
  int key;
  int value;
  bool emptySinceStart;
  bool emptySinceRemoval;
 public:
  Bucket();
  void set(int key, int value);
  void remove();
  bool isEmptySinceStart() const;
  bool isEmptySinceRemoval() const;
  bool isEmpty() const;
  int getKey() const;
  int getValue() const;
};

// HashMap
// Linear probing
class HashMap {
  int capacity;
  Bucket *buckets;  // 1D array of buckets
  int hashFunction(int key);
  // probing algorithm
  int probe(int hash, int probed);
 public:
  HashMap(int capacity = 100);
  HashMap(const HashMap &other);
  HashMap &operator=(const HashMap &other);
  ~HashMap();
  bool set(int key, int value);
  bool remove(int key);
  int lookUp(int key);
};

Hash Table Glossary¶

Key (hash table)¶: It is the unique identifier to locate the data stored in a hash table. The input to the hash function to calculate the index of the buckets in the hash table. The key is unique and immutable.
Value (hash table)¶: The data associated with the key in the hash table.
Hash function (hash table)¶: A special type of hash function used by hash table. It maps a key of an arbitrary type to an index to be used to refer to a bucket in the hash table.
Bucket (hash table)¶: The storage location in the hash table where the value is stored. It is usually an array element.
Collision (hash table)¶: A collision occurs when multiple keys result in the same index in the array. It is a common issue to be solved routinely in hash tables.
Collision resolution (hash table)¶: The approach to resolve collisions in hash tables. It is a critical part of the hash table implementation. There are two main approaches: chaining and open addressing.
Load factor (hash table)¶: The ratio of the number of keys to the number of buckets in the hash table. It is used to measure the health of a hash table and determine when to resize the hash table.
Rehashing (hash table)¶: The process of resizing the hash table to a new size when the load factor exceeds a certain threshold. It is used to maintain the performance of the hash table.
Chaining¶: A collision resolution approach in which every element in an array is a variable-length list of elements.
Open addressing¶: A collision resolution approach in which an empty bucket elsewhere in the hash table is used when collision happens
Probing¶: The mechanism to determine the next index of bucket to check in the open addressing collision resolution approach
Hash map¶: An implementation of the Map ADT based on the hash table data structure
Hash set¶: An implementation of the Set ADT based on the hash table data structure

Hash Table in C++¶

Hash function for hash tables¶

Hash Table Applications¶

Collision resolution¶

Chaining¶

Open addressing¶

Chaining Hash Map Example¶

Open addressing example¶

Hash Table Glossary¶

Table of Contents

Previous topic

Next topic

This Page