Data Structure and Algorithm Design II

Module 3

Xingang (Ian) Fang

Sections

  • Tree data structures

  • Binary search trees

  • AVL trees

Tree Data Structure

Xingang (Ian) Fang

Outline

  • Definitions

  • Tree Concepts

  • Applications

  • Operations

  • Implementation variations

  • Binary tree

Definitions

  • In graph theory, a tree is a connected and acyclic undirected graph.

    • focus on connectivity and acyclicity

  • In computer science, a tree is a hierarchical data structure composed of nodes connected by edges, where each node has a parent (except for the root node) and zero or more children.

    • focus on hierarchy and parent-child relationship

    • Recursive definition: root and subtrees, each of which is also a tree

    Recursive definition of a tree

Tree Concepts

Tree concepts

Concepts: root, internal node, leaf, edge, subtree, sibling, parent, child, level, height

Other Concepts

  • Degree of a node or a tree

  • Depth

  • Ancestor and descendant

  • Path

  • Distance

  • Ordered Unordered

Applications

  • File system (directory, sub-directory, and file)

  • Decision making (decision tree)

  • Indexing (binary search tree, B-tree)

  • Parsing tree

  • Heap (complete binary tree)

  • Blockchain (Merkle tree)

  • Search-based Artificial Intelligence (game tree)

Merkle tree

Operations

  • Search

  • Insertion

  • Deletion

  • Traversal

    • Breadth-first traversal

    • Depth-first traversal

      • Pre-order traversal

      • Post-order traversal

  • Height

  • Degree

Implementation Variations

  • Each node stores a variable-length list of references to child nodes

  • Each node stores a reference to its first child and a reference to its next sibling

  • Each node stores a reference to its parent

First child next sibling representation

First child next sibling representation

Credit: Data Structures and Algorithm Analysis in C++, 4th Edition. Mark Allen Weiss.

Binary Tree

Only concepts and operations that are different from general trees are covered.

  • Definition: Degree 2 ordered tree.

  • Concepts:

    • Left child and right child

  • Operations:

    • In-order traversal

  • Variations:

    • Full binary tree

    • Complete binary tree

    • Perfect binary tree

Full, complete, and perfect binary tree
  • Applications:

    • Binary search tree

    • Expression parsing

    • Decision tree

    • Heap

    • Huffman coding

Decision tree

Binary Search Tree

Xingang (Ian) Fang

Outline

  • Definition

  • Behaviors

  • Implementation

  • Drawbacks

  • Applications

Definition and Behaviors

Definition A binary search tree (BST) is a binary tree, in which for every node, its left child has a smaller or equal key than itself while its right child has a greater or equal key than itself.

BST

Behaviors

  • Insertion

  • Deletion

  • Search

  • Traversal

    • In-order traversal: left subtree, root, right subtree

      • Ascending ordered output

Implementation

  • Algorithms that happens from top to bottom

    Easy to implement either iteratively or recursively

    • search

    • insert

  • Algorithms that happens from bottom to top

    Easy to implement recursively; Hard to implement iteratively (must use stack)

    • subtree removal (may be used in the destruction of a tree)

    • find min node or find max node (part of remove)

    • in-order traversal

    • pre-order traversal

Implementation (cont’d)

  • Search

    • if equal, found

    • if smaller, search left subtree

    • if greater, search right subtree

  • Insertion

    • if equal, do nothing

    • if smaller, insert to left subtree

      • if left is empty, insert

    • if greater, insert to right subtree

      • if right is empty, insert

  • Removal

    • remove leaf node: simply remove it

    • remove node with one child: replace the node with its child

    • remove node with two children

      • find the min node in the right subtree (successor)

      • replace the node to be removed with the min node

      • remove the min node (min node has at most one child)

Drawbacks of naive BST

  • Algorithm efficiency depends on the average height of the BST

  • BST may degrade a very unbalanced form or even to a linked list

  • Solution 1: randomize the order of insertion

  • Solution 2: add balancing mechanism

    • Self-balancing

    • External balancing

  • Solution 3: move the most recently accessed node toward the root

    • Only optimize the search performance for a specific access pattern

    • No guarantee for the worst case

    • Splay tree

Degraded BST

Applications

  • Indexing: store a set of references to records in a BST and use the key as the search key.

  • Tree Sort: sort a sequence of elements by building a BST and then traversing it in-order.

  • Tree Map: store a set of (key, value) pairs in a BST and use the key as the search key.

  • Tree Set: store a set of elements in a BST and use the element as the search key.

Self-balancing Trees

Xingang (Ian) Fang

Outline

  • Overview

  • Variations

Overview

  • Definition: A self-balancing tree is a type of tree data structure in which the balance or height of the tree is automatically adjusted during insertions and deletions to ensure that the tree remains reasonably balanced.

  • Motivation: To overcome the potential performance issues that can occur in trees due to unbalanced structures.

  • Alternative approach: External balancing on general trees.

Variations

  • BST family

    • AVL tree

      • Balance factor: height(left subtree) - height(right subtree)

      • Balance factor of each node is either -1, 0, or 1.

      • Insertion and deletion may trigger a balance operation.

      • Balance operations achieved by rotations.

    • Red-black tree (FYI)

      • A BST implementation of 2-3 tree (Algorithm, 4th Ed.) or 2-3-4 tree (older versions).

      • Each node is colored red or black.

      • Insertion and deletion may trigger adjustment.

      • Adjustment achieved by a re-color or a rotation.

  • B-tree family

    • B-tree (B tree, not B minus tree)

      • varying but bounded number of children per node

      • always balanced

      • maintain balance by redistributing keys among nodes

      • 2-3 tree is a B-tree with 2 or 3 children per node

    • B+ tree (FYI)

    • B* tree (FYI)

AVL Tree

Xingang (Ian) Fang

Outline

  • Overview

  • Implementation details

Overview

  • Definition: An AVL (Adelson-Velsky and Landis) tree is a self-balancing binary search tree data structure. It is designed to maintain a balanced tree structure at all times, ensuring that the height difference (balance factor) between the left and right subtrees of any node is limited to no more than one.

  • Worst case guarantee for search, insert, and delete is \(\Theta(\log n)\)

Characteristics

  • Balance factor = height(left subtree) - height(right subtree)

  • Balance factor of 0, -1, or 1 is considered balanced

  • Insertion and deletion operations may cause the tree to become unbalanced, in which cases the balance factor of one or more nodes will be -2 or +2.

  • Self-balancing: balance restored by rotations during insertion and deletion

  • Balancing operations are performed bottom-up, starting with the deepest unbalanced node.

Rotations to restore balance

Left rotation to restore right-right (RR) imbalance.

../_images/RR.png

Right rotation to restore left-left (LL) imbalance.

../_images/LL.png

Credit: Data Structures and Algorithm Analysis in C++, 4th Edition. Mark Allen Weiss.

Rotations to restore balance (cont.)

Left-right double rotation to restore left-right (LR) imbalance.

../_images/LR.png

Right-left double rotation to restore right-left (RL) imbalance.

../_images/RL.png

Credit: Data Structures and Algorithm Analysis in C++, 4th Edition. Mark Allen Weiss.