Data Structure and Algorithm Design II
Chapter 9
Xingang (Ian) Fang
Sections
Greedy Algorithms
Greedy Algorithms
Xingang (Ian) Fang
Outline
Overview
Related Algorithmic Paradigms
Applications
Bin Packing Problem (BPP)
Huffman Coding
Overview
An algorithm that makes the locally optimal choice at each stage with the hopes of finding the global optimum
Characteristics
Heuristic
Local optimum
Intuitive
No long term planning
Approximate, no guarantee to find the best solution
Less time complexity than exact algorithms
Related Algorithmic Paradigms
Greedy algorithms are related to backtracking because they both make decisions at each step.
Greedy |
Backtracking |
|---|---|
Approximate |
Exact |
Make a decision |
Make a decision |
Never look back |
Look back |
No long term planning |
Long term planning |
Less time complexity |
More time complexity |
May fail to find the best solution |
Always find the best solution by finding all solutions |
Applications
Suitable for optimization problems that are too complex to solve with exact algorithms.
Graph problems
Shortest path: Dijkstra’s algorithm
Minimum spanning tree: Prim’s algorithm, Kruskal’s algorithm
Graph coloring
Bin packing problem
Huffman coding
Activity selection problem
Fractional knapsack problem
Bin Packing Problem (BPP)
Given a set of items with different weights and bins with fixed capacity, find the minimum number of bins required to pack all items.
Formal definition
Items have size between 0 and 1
Bins have capacity 1
Each item must be packed into a bin
Minimize the number of bins used to pack all items
Classic combinatorial optimization problem
NP-hard problem
Applications in logistics, scheduling, resource allocation, etc.
Two variations of the problem
Online: Items arrive one at a time and must be immediately packed
Offline: All items are known in advance
Algorithms to Solve Bin Packing Problem
Exact algorithms
Brute force
Dynamic programming
Branch and bound
Integer linear programming
Approximation algorithms
Greedy algorithms
Why is not backtracking a good idea for solving BPP?
Baseline Solution
Brute force solution serves as the baseline solution
Create all possible permutations of items
Pack every permutation into bins in a deterministic order (e.g. online next fit)
Find the permutation that uses the fewest bins
Exercise
Which step is the determining factor for the time complexity of the algorithm?
What are the time complexities of each step?
Greedy Solution
Process items one by one, in each step, decide which bin to put the item in. Create a new bin if necessary.
Heuristics for online bin packing
Next Fit: only check the last bin
First Fit: check all previous bins for first fit
Best Fit: check all previous bins for best (tightest) fit
A simple greedy solution for offline bin packing:
Sort items in decreasing order of size
Pack items into bins using online method (first fit or best fit)
Huffman Coding
A lossless data compression algorithm
Variable length encoding
Based on the frequency of characters
Motivation
ASCII uses 8 bits to encode each character
Some characters are used more frequently than others
Less bits for more frequent characters to save space
Applications
Building-block for other algorithms
JPEG, MP3, etc.
Optimal Prefix Code
Huffman coding is an optimal prefix code
Prefix code (a.k.a. prefix-free code)
no code is a prefix of another code
no ambiguity when decoding
Optimal
Minimizes the expected length of the encoded message
The most frequent characters are encoded with the fewest bits
Employ a binary tree to represent the Huffman coding scheme
Full binary tree
Leaf nodes represent characters - guarantee prefix code
Each internal node means a bit in the codeword
Encoding Tree Examples
Left: Fixed length encoding
Total space: 3 * (45 + 13 + 12 + 16 + 9 + 5) = 300 bits
Right: Huffman coding
Total space: 1 * 45 + 3 * (13 + 12 + 16) + 4 * (9 + 5) = 224 bits
Contents in nodes:
Leaf node: <character>: <frequency>
Internal node: <frequency of subtree>
Credit: Introduction to Algorithms, 4th Ed., T. Cormen et al.
Algorithm to build a Huffman coding tree
This is a greedy algorithm that builds the tree bottom-up from the least frequent characters to the most frequent characters.
Pseudo-code adopted from Wikipedia
Create a leaf node for each symbol and add it to the min priority priority queue.
While there is more than one node in the min priority queue:
Dequeue two nodes with the lowest frequencies from the min priority queue
Create a new internal node with these two nodes as its children and with frequency equal to the sum of the two nodes’ frequency
Add the new node to the min priority queue
The remaining node is the root node and the tree is complete
Steps to build the example tree
Credit: Introduction to Algorithms, 4th Ed., T. Cormen et al.