B-Tree¶
Overview¶
The name B-Tree is after the name of its inventors, Bayer and McCreight. It also suggests that it is “balanced”.
A B-tree, short for “balanced tree,” is a self-balancing tree data structure used in computer science for organizing and storing data efficiently. It is designed to provide fast access to data, especially in situations where the data set is large and dynamic. B-trees are commonly used in database management systems and file systems.
Definition¶
A B-tree of order \(m\) is a tree which satisfies the following properties:
Every node has at most \(m\) children and \(m-1\) keys.
Every internal node has at least \(\lceil m/2 \rceil\) children.
The root has at least 2 children if it is not a leaf node.
All leaves appear on the same level.
A non-leaf node with \(k\) children contains \(k-1\) keys.
Characteristics¶
Balanced
All leaves appear on the same level.
The height of the tree is always \(\Theta(\log n)\).
N-ary search tree
In-order traversal of a B-tree will provide a sorted sequence.
Each node defines a range of keys stored in the subtree rooted at that node.
The keys in each node partitions the keys in the subtree it rooted at.
\(m-1\) keys partition keys into \(m\) sections that correspond to \(m\) children.
Self-balancing
The tree is always balanced.
The balance is maintained by redistribution of keys
splitting and merging nodes.
borrowing keys from sibling nodes.
The height of the tree changes only when the root is split or merged.
Terminology¶
There are two sets of terminology for B-trees. One focus on “order” and another focus on “degree”. They are not the same as what they mean in general tree and graph theory. Be careful when reading the literature.
Order (B-tree) Preferred
The maximum number of children a node can have: \(m\).
The maximum number of keys a node can have is always one less: \(m-1\).
This is the same as what we defined as “degree of a tree” in general tree theory.
Sometimes referred as “maximum degree”.
Degree \(t\) (B-tree)
It is actually short for “minimum degree”.
The minimum number of children an internal node may have.
Sometimes referred as \(t\) in the literature.
This term is very confusing because it is not the same as what we defined as “degree” in tree theory.
It is even more confusing in some places where it is called “order”.
This term cannot describe all possible B-trees.
We will prefer to stay away from this term!
We will use the term “order” in this document.
Characteristics¶
Balanced
N-ary search tree
Sorted
Behavior¶
Insertion
Start at the root node.
Find the leaf node where the key should be inserted.
If the node is not full, insert the key into the node.
If the node is full, split the node into two nodes.
Insert the key into the appropriate child.
Insert the median key into the parent node.
If the parent node is full after insertion of the median key, split the parent node.
Repeat until the root node is reached.
In this method, when split a full node, the split happens before the key is inserted. This method does not work well with 2-3 trees because finding a median out of 2 keys will create an empty child.
There is another way to insert a key to a full node. In that method, the key is inserted to the node first, which causes a temporary overflow, then the node is split. After that, the median key is inserted to the parent node. An equivalent version is to take the new key into consideration when finding the median key. This method works well with 2-3 trees.
Deletion
Start at the root node.
Find the node where the key should be deleted.
If the node is a leaf node, delete the key from the node.
If the node is an internal node, replace the key with the predecessor or successor key. Then delete the predecessor or successor key from the leaf node.
If the node is underfilled, borrow a key from a sibling node.
If the node is underfilled and no sibling node can lend a key, merge the node with a sibling node.
If the merge causes the parent node to be underfilled, repeat the process on the parent node.
Repeat until the root node is reached.
Visualization
Variations¶
B+ Tree (Textbook example)
A B+ tree of order \(m\) satisfies the following properties:
All data are stored in the leaves.
All leaves are linked together for better sequential access.
All internal nodes are keys only for faster search.
All internal nodes have between \(\lceil m/2 \rceil\) and \(m\) children.
All leaves have between \(\lceil l/2 \rceil\) and \(l\) keys.
The \(l\) can be adjusted according to the block size of the disk.
Ideal as a file system or database index.
Reduced number of disk accesses.
Reduced potential key redistribution when inserting or deleting.
Visualization
B* Tree (FYI)