Balanced Search Trees

COS 265 - Data Structures & Algorithms

symbol table review

implementation search\(^*\) insert\(^*\) delete\(^*\) search\(^\dagger\) insert\(^\dagger\) delete\(^\dagger\) ordered ops on keys
seq search (unordered list) \(N\) \(N\) \(N\) \(N\) \(N\) \(N\) equals()
binary search (ordered array) \(\log N\) \(N\) \(N\) \(\log N\) \(N\) \(N\) X compareTo()
BST \(N\) \(N\) \(N\) \(\log N\) \(\log N\) \(\sqrtN\) X compareTo()
goal \(\log N\) \(\log N\) \(\log N\) \(\log N\) \(\log N\) \(\log N\) X compareTo()

\(^*\)guarantee, \(^\dagger\)average


Challenge: Guarantee performance

This lecture: 2-3 trees, left-leaning red-black BSTs, B-trees

balanced search trees

2-3 search trees

2-3 tree

Allow 1 or 2 keys per node

Symmetric order: Inorder traversal yields keys in ascending order

Perfect balance: Every path from root to null link has same length (how to maintain?)

2-3 tree demo

Search

2-3 tree demo

2-3 tree demo

2-3 tree demo

2-3 tree demo

2-3 tree demo

2-3 tree demo

2-3 tree demo

2-3 tree demo

2-3 tree: insertion

Insertion into a 2-node at bottom

2-3 tree: insertion

Insertion into a 3-node at bottom

2-3 tree: global properties

Invariants: Maintains symmetric order and perfect balance

Pf: Each transformation maintains symmetric order and perfect balance

2-3 tree: performance

Splitting a 4-node is a local transformation: constant number of operations

balanced search trees: quiz 1

What is the range of heights of a 2-3 tree with \(N\) keys (best / worst case)?

A. \(\texttilde \log_4 N\) / \(\texttilde \log_3 N\)

B. \(\texttilde \log_3 N\) / \(\texttilde \log_2 N\)

C. \(\texttilde \log_3 N\) / \(\texttilde 2 \log_2 N\)

D. \(\texttilde \log_3 N\) / \(\texttilde N\)

E. I don't know

2-3 tree: performance

Perfect balance: Every path from root to null link has same length

Tree height:

Bottom line: Guaranteed logarithmic performance for search and insert

ST implementation: summary

implementation search\(^*\) insert\(^*\) delete\(^*\) search\(^\dagger\) insert\(^\dagger\) delete\(^\dagger\) ordered ops on keys
seq search (unordered list) \(N\) \(N\) \(N\) \(N\) \(N\) \(N\) equals()
binary search (ordered array) \(\log N\) \(N\) \(N\) \(\log N\) \(N\) \(N\) X compareTo()
BST \(N\) \(N\) \(N\) \(\log N\) \(\log N\) \(\sqrtN\) X compareTo()
2-3 tree\(^\ddagger\) \(\log N\) \(\log N\) \(\log N\) \(\log N\) \(\log N\) \(\log N\) X compareTo()

\(^*\)guarantee, \(^\dagger\)average

\(^\ddagger\)but hidden constant \(c\) is large (depends upon implementation)

2-3 tree: implementation??

Direct implementation is complicated, because

// fantasy code
public void put(Key key, Value val) {
    Node x = root;
    while(x.getTheCorrectChild(key) != null) {
        x = x.getTheCorrectChildKey();
        if(x.is4Node()) x.split();
    }
    if     (x.is2Node()) x.make3Node(key, val);
    else if(x.is3Node()) x.make4Node(key, val);
}

Bottom line: Could do it, but there's a better way

balanced search trees

red-black BSTs

how to implement 2-3 trees with binary trees?

Challenge: How to represent a 3-node?

how to implement 2-3 trees with binary trees?

Challenge: How to represent a 3-node?


Approach 1: Regular BST

how to implement 2-3 trees with binary trees?

Challenge: How to represent a 3-node?


Approach 2: Regular BST with red "glue" nodes

how to implement 2-3 trees with binary trees?

Challenge: How to represent a 3-node?


Approach 3: Regular BST with red "glue" links

left-leaning red-black BSTs

[ Guibas-Sedgewick 1979 and Sedgewick 2007 ]

left-leaning red-black BSTs

A 2-3 tree and corresponding red-black BST

llrb BSTs: 1-1 correspondence with 2-3 trees

Key property: 1-1 correspondence between 2-3 and LLRB

an equivalent definition

A BST such that

search implementation for red-black BSTs

Observation: Search is the same as for elementary BST (ignore color), but runs faster because of better balance

public Value get(Key key) {
    Node x = root;
    while(x != null) {
        int cmp = key.compareTo(x.key);
        if     (cmp < 0) x = x.left;
        else if(cmp > 0) x = x.right;
        else             return x.val;
    }
    return null;
}

Remark: Most other ops (e.g., floor, iteration, selection) are also identical

red-black bst representation

Each node is pointed to by precisely one link (from its parent); can encode color of links in nodes

private static final boolean RED   = true;
private static final boolean BLACK = false;

private class Node {
    Key key;
    Value val;
    Node left, right;
    boolean color;  // color of parent link
}

private boolean isRed(Node x) {
    if(x == null) return false; // null links are black
    return x.color == RED;
}

red-black bst representation

root.left.color  == RED
root.right.color == BLACK

insertion into a LLRB tree: overview

Basic strategy: Maintain 1-1 correspondence with 2-3 trees

During internal operations, maintain:

How? Apply elementary red-black BST operations: rotation and color flip

elementary red-black BST operations

Left rotation: Orient a (temporarily) right-leaning red link to lean left

private node rotateLeft(Node h) {
    assert isRed(h.right);
    Node x = h.right;
    h.right = x.left;
    x.left = h;
    x.color = h.color;
    h.color = RED;
    return x;
}

Invariants: Maintains symmetric order and perfect black balance

elementary red-black BST operations

Left rotation: Orient a (temporarily) right-leaning red link to lean left

elementary red-black BST operations

Right rotation: Orient a left-leaning red link to (temporarily) lean right

private node rotateRight(Node h) {
    assert isRed(h.left);
    Node x = h.left;
    h.left = x.right;
    x.right = h;
    x.color = h.color;
    h.color = RED;
    return x;
}

Invariants: Maintains symmetric order and perfect black balance

elementary red-black BST operations

Right rotation: Orient a left-leaning red link to (temporarily) lean right

elementary red-black BST operations

Color flip: Recolor to split a (temporary) 4-node

private void flipColors(Node h) {
    assert !isRed(h);
    assert isRed(h.left);
    assert isRed(h.right);
    h.color = RED;
    h.left.color = BLACK;
    h.right.color = BLACK;
}

Invariants: Maintains symmetric order and perfect black balance

elementary red-black BST operations

Color flip: Recolor to split a (temporary) 4-node

insertion into a LLRB tree

Warmup 1: Insert into a tree with exactly 1 node

insertion into a LLRB tree

Warmup 1: Insert into a tree with exactly 1 node

insertion into a LLRB tree

Case 1: Insert into a 2-node at the bottom

insertion into a LLRB tree

Case 1: Insert into a 2-node at the bottom

insertion into a LLRB tree

Case 1: Insert into a 2-node at the bottom

insertion into a LLRB tree

Warmup 2: Insert into a tree with exactly 2 nodes

insertion into a LLRB tree

Warmup 2: Insert into a tree with exactly 2 nodes

insertion into a LLRB tree

Warmup 2: Insert into a tree with exactly 2 nodes

insertion into a LLRB tree

Case 2: Insert into a 3-node at the bottom

insertion into a LLRB tree

insertion into a LLRB tree

insertion into a LLRB tree

insertion into a LLRB tree

insertion into a LLRB tree

llrb bst construction demo

Insert E

llrb bst construction demo

Insert E

llrb bst construction demo

Insert A

llrb bst construction demo

Insert A

llrb bst construction demo

Insert R

llrb bst construction demo

Insert R

llrb bst construction demo

Insert C

llrb bst construction demo

Insert C

llrb bst construction demo

Insert H

llrb bst construction demo

Insert H

llrb bst construction demo

Insert X

llrb bst construction demo

Insert X

llrb bst construction demo

Insert M

llrb bst construction demo

Insert M

llrb bst construction demo

Insert P

llrb bst construction demo

Insert P

llrb bst construction demo

Insert L

llrb bst construction demo

Insert L

insertion into LLRB: java implementation

Same code for all cases

private Node put(Node h, Key key, Value val) {
    if(h == null) {
        // insert at bottom and color it red
        return new Node(key, val, RED);
    }

    int cmp = key.compareTo(h.key);
    if     (cmp < 0) h.left = put(h.left, key, val);
    else if(cmp > 0) h.right = put(h.right, key, val);
    else             h.val = val;

    // only a few extra LoC provides near-perfect balance
    // lean left
    if(isRed(h.right) && !isRed(h.left))     h = rotateLeft(h);
    // balance 4-node
    if(isRed(h.left)  && isRed(h.left.left)) h = rotateRight(h);
    // split 4-node
    if(isRed(h.left)  && isred(h.right))     flipColors(h);

    return h;
}

insertion into LLRB: visualization

255 insertions in ascending order

insertion into LLRB: visualization

255 insertions in descending order

insertion into LLRB: visualization

255 random insertions

balanced search trees: quiz 2

What is the height of an LLRB tree with \(N\) keys in the worst case?

A. \(\texttilde \log_3 N\)

B. \(\texttilde \log_2 N\)

C. \(\texttilde 2 \log_2 N\)

D. \(\texttilde N\)

E. I don't know

balance in LLRB trees

Proposition: Height of tree is \(\leq 2 \log N\) in the worst case

Pf:

Property: Height of tree is \(\texttilde 1.0 \lg N\) in typical applications

ST implementation: summary

implementation search\(^*\) insert\(^*\) delete\(^*\) search\(^\dagger\) insert\(^\dagger\) delete\(^\dagger\) ordered ops on keys
seq search (unordered list) \(N\) \(N\) \(N\) \(N\) \(N\) \(N\) equals()
binary search (ordered array) \(\log N\) \(N\) \(N\) \(\log N\) \(N\) \(N\) X compareTo()
BST \(N\) \(N\) \(N\) \(\log N\) \(\log N\) \(\sqrtN\) X compareTo()
2-3 tree\(^\ddagger\) \(\log N\) \(\log N\) \(\log N\) \(\log N\) \(\log N\) \(\log N\) X compareTo()
LLRB\(^\star\) \(\log N\) \(\log N\) \(\log N\) \(\log N\) \(\log N\) \(\log N\) X compareTo()

\(^*\)guarantee, \(^\dagger\)average

\(^\ddagger\)hidden constant \(c\) is large (depends upon implementation)

\(^\star\)hidden constant \(c\) is small (at most \(2 \lg N\) compares)

war story: why red-black?

Xerox PARC innovations (1970s)

  • Alto
  • GUI
  • Ethernet
  • Smalltalk
  • InterPress
  • Laser printing
  • Bitmapped display
  • WYSIWYG text editor
  • ...

war story: red-black BSTs

Telephone company contracted with database provider to build real-time database to store customer information

Database implementation

war story: red-black BSTs

Telephone company contracted with database provider to build real-time database to store customer information

Extended telephone service outage

If implemented properly, the height of a red-black BST with \(N\) keys is at most \(2 \lg N\).
—expert witness

balanced search trees

B-trees

file system model

Page
contiguous block of data (e.g., a 4096-byte chunk)
Probe
first access to a page (e.g., from disk to memory)
slow
fast

Property: time required for a probe is much larger than time to access data within a page

Cost model: number of probes

Goal: access data using minimum number of probes

b-trees (bayer-mccreight, 1972)

B-tree: Generalize 2-3 trees by allowing up to \(M\) keys per node

search in a b-tree

Insertion in a B-tree

balance in b-tree

Proposition: A search or an insertion in a B-tree of order \(M\) with \(N\) keys requires between \(\texttilde \log_M N\) and \(\texttilde \log_{M/2} N\) probes.

Pf: All nodes (except possibly root) have between \(\left\lfloor M/2 \right\rfloor\) and \(M\) keys

In practice: Number of probes is at most \(4\) (when \(M=1024\), \(N = 62 \text{ billion}\), then \(\log_{M/2} N \leq 4\))

balanced search trees: quiz 3

What of the following does the B in B-tree not mean?

A. Bayer

B. Balanced

C. Binary

D. Boeing

E. I don't know

the more you think about what the B in B-trees could mean, the more you learn about B-trees and that is good.
–Ed McCreight

balanced trees in the wild

Red-Black trees are widely used as system symbol tables

B-tree cousins: B+ tree, B*tree, B# tree, ...

B-trees (and cousins) are widely used for file systems and DBs

loading...