Data Structure and Algorithm Design II

Module 4

Xingang (Ian) Fang

Sections

B tree and variants
Introduction to problem and algorithm classification

B-Tree and Variants

Xingang (Ian) Fang

Outline

Overview
Operations
Variants

Overview

Invented by Bayer and McCreight in 1972.
A M-ary search tree that stores data in order.
A balanced tree with all leaves at the same depth.
Definition: A B-tree of order \(m\) is a tree which satisfies the following properties:
- Every node has at most \(m\) children.
- Every internal node has at least \(\lceil m/2 \rceil\) children.
- The root has at least 2 children if it is not a leaf node.
- All leaves appear on the same level.
- A non-leaf node with \(k\) children contains \(k-1\) keys.

Operations

B-Tree Visualization

Insertion

Start at the root node.
Find the leaf node where the key should be inserted.
If the node is not full, insert the key into the node.
If the node is full, split the node into two nodes.
Insert the median key into the parent node.
If the parent node is full, split the parent node.
Repeat until the root node is reached

Deletion

Start at the root node.
Find the node where the key should be deleted.
If the node is a leaf node, delete the key from the node.
If the node is an internal node, replace the key with the predecessor or successor key. Then delete the predecessor or successor key from the leaf node.
If the node is underfilled, borrow a key from a sibling node.
If the node is underfilled and no sibling node can lend a key, merge the node with a sibling node.
If the merge causes the parent node to be underfilled, repeat the process on the parent node.
Repeat until the root node is reached

Checkpoint

Why do all internal nodes have a lower bound of \(\lceil m/2 \rceil\) children?
Why does the room node have a lower bound of 2 children instead?
Is insertion always happening at the leaf node?
Is deletion always happening at the leaf node?
How are the implementation of the behaviors compared to that of a binary search tree?

Variants

2-3 Tree: A B-tree of order 3.
2-3-4 Tree: A B-tree of order 4.
B+ Tree: A B-tree of order \(m\) where all keys are stored in the leaves.
B* Tree: A B-tree of order \(m\) where all leaves are at least half full.
B# Tree: A B-tree of order \(m\) where all leaves are at least 2/3 full.
etc.

2-3 Tree

FYI
A B-tree of order 3.
Can be implemented as a binary search tree (red-black tree).

B+ Tree

Definition: A B+ tree of order \(m\) satisfies the following properties:
- All data are stored in the leaves.
- All leaves are linked together.
- All internal nodes are keys only for faster search.
- All internal nodes have between \(\lceil m/2 \rceil\) and \(m\) children.
- All leaves have between \(\lceil l/2 \rceil\) and \(l\) keys.
Works well with sequential access.
Ideal for file system and database indexing.
Reduced number of disk accesses.
Reduced redistribution of data when inserting or deleting.
B+ Tree visualization

Checkpoint

What is the difference between a B-tree and a B+ tree?
Why the sequential access is faster in a B+ tree?

Problem Classification

Xingang (Ian) Fang

Overview

Essential to understand the problem
Most problems can be solved by multiple algorithms
Many problems can be solved by similar algorithms
Essential to choose or design the algorithms to solve the problem

By The Goal

Find a(n) optimal/near-optimal solution - optimization
Find whether something is true or false - decision
Find a solution that satisfies given criteria - search
Find all solutions that satisfies given criteria - enumeration
Find how many ways to do something - counting
Find the output of a mathematical or logical function - evaluation problem

Algorithmic Classification

Xingang (Ian) Fang

Outline

Overview
Algorithmic Paradigms
TSP Example
Groups To Be Covered

Overview

Many ways to classify algorithms
In this course
- algorithmic paradigms
  - Brute force
  - Backtracking
  - Divide and conquer
  - Dynamic programming
  - Greedy algorithms
- data structure based algorithms
  - Self-balancing binary search tree related
  - Graph related
- other groups
  - Randomized algorithms

Algorithmic Paradigms

Algorithmic Paradigm: Fundamental approaches, strategies, or methodologies used in the design and analysis of algorithms. These paradigms provide high-level templates or frameworks for solving various types of computational problems.
The choice of an algorithmic paradigm depends on the nature of the problem and the resources available to solve it.

TSP Example

Given a list of cities and the distances or costs between each pair of cities, the goal of the Traveling Salesman Problem (TSP) is to find the shortest possible route that visits each city exactly once and returns to the starting city. This is an NP-hard problem!

Brute Force: Try all possible permutations of the cities and find the one with the minimum cost. \(\Theta((n-1)!)\)
Dynamic programming: Use a table to store the minimum cost of visiting a subset of cities.
Branch and bound: Use a tree to store the partial solutions and prune the branches that cannot lead to a better solution.
Greedy algorithms: Start with an empty tour and add the next closest city to the tour.
Metaheuristic algorithms: Genetic algorithms, simulated annealing, etc.

Credit: https://optimization.cbe.cornell.edu/index.php?title=Traveling_salesman_problem

Paradigms To Be Covered

Brute Force
- Exhaustive Search on all possible solutions.
- No heuristic, no pruning, no optimization.
- Only feasible for small problem instances.
- Exact algorithms.
- Solves enumeration problems thus can also solve searching, optimization, decision and counting problems.
- Examples: Combinations, Permutations, TSP, knapsack, etc.
Backtracking
- Try out possibilities until a solution is found or all possibilities are exhausted.
- Backtrack to the previous decision point if a partial solution cannot lead to a solution.
- Examples: N-queens, Sudoku, graph coloring, etc.
Greedy
- Make the locally optimal choice at each step.
- Usually leads to a suboptimal solution.
- Examples: Prim’s algorithm, Kruskal’s algorithm, Dijkstra’s algorithm, Huffman coding, etc.

Paradigms To Be Covered (cont’d)

Divide and Conquer
- Divide the problem into smaller subproblems.
- Solve the subproblems recursively.
- Solves independent subproblems or ignore the dependency and overlapping factors.
- Combine the solutions to the subproblems to obtain the solution to the original problem.
- Examples: Merge sort, quick sort, binary search, etc.
Dynamic Programming
- Divide the problem into smaller overlapping subproblems.
- Systematically solve the subproblems and store the solutions in a table.
- Top-down (recursive with memoization) or bottom-up (iterative).
- Examples: Fibonacci numbers, knapsack, TSP, etc.

Other Groups To Be Covered

Self-balancing tree related
- AVL tree
- B-tree
Graph related
- Minimum spanning tree
- Shortest path
Randomized Algorithms
- Introduce randomness into the algorithm.
- Pseudo-random number generator and seed.
- Examples: Genetic Algorithm, Randomized quick sort, Monte Carlo algorithms, Las Vegas algorithms, etc.