Module 5 Symbol Table, Binary Search tree¶
In this module we will start approaching the topic of searching
. Searching
problems are a group of problems that require the algorithm to locate a certain
record by a key
. You should already know methods to search in a sequential
data structure such as an array or a linked-list. In this module, we will
discuss the topic in more complicate scenarios.
Symbol Table¶
Symbol table
is a term used historically in compiler to refer to a data structure that keeps track of all the identifiers in a program and how they are used. Under this context, this term refers to a data structure.It is defined as an ADT to store key-value pairs by some authors.
Alternative names are
dictionary
andmap
.Key must be unique
Behaviors
put: put a key-value pair into the table
get: return value with a given key
contains: is there a value paired with a given key?
delete: remove key-value pair with a given key
size: number of key-value pairs
isEmpty: is the table empty?
Conventions for Java
use of
null
equals()
key should be immutable (use final keyword)
key should be comparable
Elementary implementations¶
Store key-value pairs in a sequential data structure (array or linked-list)
put, get, contains and remove are all based on key lookup
put: search for key, if found, replace value, else add new key-value pair
get: search for key, if found, return associated value, else return null
remove: search for key, if found, remove key-value pair, else do nothing
Lookup algorithm
Unsorted list + Sequential search: \(\Theta(n)\)
Sorted list + Binary search: \(\Theta(\log n)\)
Advanced implementations of symbol table¶
Binary search tree implementation
Binary search tree
(BST) is a binary tree in symmetric order.Symmetric order: each node has a key, and every node’s key is
larger than all keys in its left subtree
smaller than all keys in its right subtree
BST implementation
Node class: key, value, left, right
BST class: root node
put
get
delete
size
isEmpty
min, max, floor, ceiling, rank, select (optional)
Efficiency depends on the height of the tree
Worst case: \(\Theta(n)\) (unbalanced tree)
Average case: \(\Theta(\log n)\) (balanced tree)
Self-balancing BST to guarantee Theta(log n) performance
AVL tree (optional)
Red-black tree (next module)
Hash table (next module)