COMP1927 - Balanced BST's

Binary Search Trees are an incredibly useful structure, but the time complexity of their operations, along with their usefulness, decreases dramatically with the degradation of their balance.

If a tree is balanced, as in, it is the minimal height possible for the number of nodes, then operations happen in O(lg(n)) time. As it degrades to being the maximal height possible operations become O(n).

When nodes are inserted in a random order, they normally end up approximately balanced. But if nodes are inserted in decreasing or increasing order (e.g. 1,2,3,4,5) they end up as a linked list.

To try and ensure that BSTs stay as balanced as possible there are methods we can take to create self-balancing BSTs.

Tree Rotation

Tree_rotation.png

Tree rotations involve exactly what their name suggests; rotating notes within the tree so as to keep it ordered, but change the location of various nodes.

Left Rotation

Tree rotateLeft (Tree t) {
 
   if ((t != NULL) && (t->right != NULL)) {
 
      // Swap t with the right child
      int temp = t->data;
      t->data = t->right->data;
      t->right->data = temp;
 
      // Swap the left child and the right's right
      Tree tempTree = t->left;
      t->left = t->right->right;
      t->right->right = tempTree;
 
      // Swap the right's two children
      tempTree = t->right->left;
      t->right->left = t->right->right;
      t->right->right = tempTree;
 
      // Swap the left and right child
      tempTree = t->left;
      t->left = t->right;
      t->right = tempTree;         
 
   }
 
   return t;
 
}

Right Rotation

Tree rotateRight (Tree t) {
 
   if ((t != NULL) && (t->left != NULL)) {
 
      // Swap the parent with the left child
      int temp = t->data;
      t->data = t->left->data;
      t->left->data = temp;
 
      // Swap the right child with the left's left
      Tree tempTree = t->right;
      t->right = t->left->left;
      t->left->left = tempTree;
 
      // Swap the left's two children around
      tempTree = t->left->left;
      t->left->left = t->left->right;
      t->left->right = tempTree;
 
      // Swap the left and right child
      tempTree = t->left;
      t->left = t->right;
      t->right = tempTree;        
 
   }
 
   return t;
 
}

Inserting At Root

Rotations can be used to insert a node at the root of the tree (by inserting at leaf and recursively rotating the parents until it is the root).

Inserting at the root means that the most touched nodes or the newest nodes move to the top to be the most accessible.

// insertRoot takes in a Tree and an int
// It then inserts the int as a leafnode
// Before rotating left/right until the newnode is rotated with t
// It then returns the newnode
Tree insertRoot (Tree t, int data) {
 
   Tree newNode = insert(t, data);
 
   // While the newNode isn't root
   while (newNode->parent != NULL) {
      // If the node lies to its parents left, rotate right
      if (newNode->parent->left == newNode) {
          rotateRight(newNode->parent);
      // Otherwise rotate left
      } else {
          rotateLeft(newNode->parent);
      }
   }
 
   return newNode;
 
}

Global Rebalancing (Balancing by Partitioning)

The concept of global rebalancing is simple; if we can pick the ith node of a subtree, then we can work out the middle node of the tree, the middle of each of its children's trees, etc. By doing so we can then manually rebalance the tree by rotating those nodes to be at the root.

To do this the tree must be created, and nodes inserted, with a count element of the struct.

Algorithms

  1. Look at the count of the root node. Our median is rootCount/2.
  2. Select the medianth item from the tree.
  3. Rotate it to root.
  4. Call the function on the left and right children.
  5. Repeat until we're at a leaf/null node.

Code

Problems

The cost of rebalancing the tree is O(n) (as every node has to be put into the correct place).
If we rebalance after every insert we guarantee that the tree is always balanced, but this takes too long. If we rebalance after every k inserts then we have to deal with a non-optimally balanced tree for some time as well.

Local Rebalancing

The local approach to self-balancing BSTs consists of making small incremental operations along the way to improve the overall balance.

Three Approaches

  • Randomising the input reduces the chance of worst-case ordering occurring.
  • Amortising consists of doing more work at insertion time to lead to less work afterwards. It reduces the average operation cost (putting an upper bound on it), but increases some quite dramatically.
  • Optimising means performing all operations within certain performance bands. We modify all functions individually (and sometimes the data structure) to provide guaranteed performance.

Local Approach Random BSTs

The order of insertion into a BST can determine the shape quite dramatically:

  • Ordered input creates a link-list.
  • Perfectly split median-input will always result in a perfectly balanced tree.
  • Random input creates O(lg(n)) performance times.

Insertion

As the ADT cannot control the order of insertion, we must find an algorithm to induce randomness.

We can do this with a function that randomly decides whether to insert a node at a leaf position or as the root. This has an insertion complexity of O(lg(n)).

Deletion

Deleting a node has been done in one of two was; DLMD or DRMD. Whereas before the choice was arbitrary, we could always pick the side with the highest count, and hence improve the balance of the tree.

Local Approach Splay Trees

Splay Trees

2-3-4 Trees

2-3-4 Trees are guaranteed to have a O(lg(n)) behaviour for insertions and searches.