implementation | search\(^*\) | insert\(^*\) | delete\(^*\) | search\(^\dagger\) | insert\(^\dagger\) | delete\(^\dagger\) | ordered | ops on keys |
---|---|---|---|---|---|---|---|---|
seq search (unordered list) | \(N\) | \(N\) | \(N\) | \(N\) | \(N\) | \(N\) | equals() |
|
binary search (ordered array) | \(\log N\) | \(N\) | \(N\) | \(\log N\) | \(N\) | \(N\) | X | compareTo() |
BST | \(N\) | \(N\) | \(N\) | \(\log N\) | \(\log N\) | \(\sqrtN\) | X | compareTo() |
goal | \(\log N\) | \(\log N\) | \(\log N\) | \(\log N\) | \(\log N\) | \(\log N\) | X | compareTo() |
\(^*\)guarantee, \(^\dagger\)average
Challenge: Guarantee performance
This lecture: 2-3 trees, left-leaning red-black BSTs, B-trees
Allow 1 or 2 keys per node
Symmetric order: Inorder traversal yields keys in ascending order
Perfect balance: Every path from root to null
link has same length (how to maintain?)
Search
Insertion into a 2-node at bottom
Insertion into a 3-node at bottom
Invariants: Maintains symmetric order and perfect balance
Pf: Each transformation maintains symmetric order and perfect balance
Splitting a 4-node is a local transformation: constant number of operations
What is the range of heights of a 2-3 tree with \(N\) keys (best / worst case)?
A. \(\texttilde \log_4 N\) / \(\texttilde \log_3 N\)
B. \(\texttilde \log_3 N\) / \(\texttilde \log_2 N\)
C. \(\texttilde \log_3 N\) / \(\texttilde 2 \log_2 N\)
D. \(\texttilde \log_3 N\) / \(\texttilde N\)
E. I don't know
Perfect balance: Every path from root to null link has same length
Tree height:
Bottom line: Guaranteed logarithmic performance for search and insert
implementation | search\(^*\) | insert\(^*\) | delete\(^*\) | search\(^\dagger\) | insert\(^\dagger\) | delete\(^\dagger\) | ordered | ops on keys |
---|---|---|---|---|---|---|---|---|
seq search (unordered list) | \(N\) | \(N\) | \(N\) | \(N\) | \(N\) | \(N\) | equals() |
|
binary search (ordered array) | \(\log N\) | \(N\) | \(N\) | \(\log N\) | \(N\) | \(N\) | X | compareTo() |
BST | \(N\) | \(N\) | \(N\) | \(\log N\) | \(\log N\) | \(\sqrtN\) | X | compareTo() |
2-3 tree\(^\ddagger\) | \(\log N\) | \(\log N\) | \(\log N\) | \(\log N\) | \(\log N\) | \(\log N\) | X | compareTo() |
\(^*\)guarantee, \(^\dagger\)average
\(^\ddagger\)but hidden constant \(c\) is large (depends upon implementation)
Direct implementation is complicated, because
// fantasy code public void put(Key key, Value val) { Node x = root; while(x.getTheCorrectChild(key) != null) { x = x.getTheCorrectChildKey(); if(x.is4Node()) x.split(); } if (x.is2Node()) x.make3Node(key, val); else if(x.is3Node()) x.make4Node(key, val); }
Bottom line: Could do it, but there's a better way
Challenge: How to represent a 3-node?
Challenge: How to represent a 3-node?
Approach 1: Regular BST
Challenge: How to represent a 3-node?
Approach 2: Regular BST with red "glue" nodes
Challenge: How to represent a 3-node?
Approach 3: Regular BST with red "glue" links
A 2-3 tree and corresponding red-black BST
Key property: 1-1 correspondence between 2-3 and LLRB
A BST such that
null
link has the same number of black links ("perfect black balance")Observation: Search is the same as for elementary BST (ignore color), but runs faster because of better balance
public Value get(Key key) { Node x = root; while(x != null) { int cmp = key.compareTo(x.key); if (cmp < 0) x = x.left; else if(cmp > 0) x = x.right; else return x.val; } return null; } |
![]() |
Remark: Most other ops (e.g., floor, iteration, selection) are also identical
Each node is pointed to by precisely one link (from its parent); can encode color of links in nodes
private static final boolean RED = true; private static final boolean BLACK = false; private class Node { Key key; Value val; Node left, right; boolean color; // color of parent link } private boolean isRed(Node x) { if(x == null) return false; // null links are black return x.color == RED; }
root.left.color == RED root.right.color == BLACK
Basic strategy: Maintain 1-1 correspondence with 2-3 trees
During internal operations, maintain:
How? Apply elementary red-black BST operations: rotation and color flip
Left rotation: Orient a (temporarily) right-leaning red link to lean left
private node rotateLeft(Node h) { assert isRed(h.right); Node x = h.right; h.right = x.left; x.left = h; x.color = h.color; h.color = RED; return x; }
Invariants: Maintains symmetric order and perfect black balance
Left rotation: Orient a (temporarily) right-leaning red link to lean left
Right rotation: Orient a left-leaning red link to (temporarily) lean right
private node rotateRight(Node h) { assert isRed(h.left); Node x = h.left; h.left = x.right; x.right = h; x.color = h.color; h.color = RED; return x; }
Invariants: Maintains symmetric order and perfect black balance
Right rotation: Orient a left-leaning red link to (temporarily) lean right
Color flip: Recolor to split a (temporary) 4-node
private void flipColors(Node h) { assert !isRed(h); assert isRed(h.left); assert isRed(h.right); h.color = RED; h.left.color = BLACK; h.right.color = BLACK; }
Invariants: Maintains symmetric order and perfect black balance
Color flip: Recolor to split a (temporary) 4-node
Warmup 1: Insert into a tree with exactly 1 node
null
link of rootA
converts 2-node to 3-nodeWarmup 1: Insert into a tree with exactly 1 node
null
link of rootB
(right-leaning)Case 1: Insert into a 2-node at the bottom
Case 1: Insert into a 2-node at the bottom
Case 1: Insert into a 2-node at the bottom
Warmup 2: Insert into a tree with exactly 2 nodes
null
link of rootWarmup 2: Insert into a tree with exactly 2 nodes
null
linkWarmup 2: Insert into a tree with exactly 2 nodes
null
linkCase 2: Insert into a 3-node at the bottom
R
)S
rightR
red, so flip colorsR
red, so flip colorsE
red, so rotate leftR
red, so flip colorsE
red, so rotate leftInsert E
Insert E
Insert A
Insert A
Insert R
Insert R
Insert C
Insert C
Insert H
Insert H
Insert X
Insert X
Insert M
Insert M
Insert P
Insert P
Insert L
Insert L
Same code for all cases
private Node put(Node h, Key key, Value val) { if(h == null) { // insert at bottom and color it red return new Node(key, val, RED); } int cmp = key.compareTo(h.key); if (cmp < 0) h.left = put(h.left, key, val); else if(cmp > 0) h.right = put(h.right, key, val); else h.val = val; // only a few extra LoC provides near-perfect balance // lean left if(isRed(h.right) && !isRed(h.left)) h = rotateLeft(h); // balance 4-node if(isRed(h.left) && isRed(h.left.left)) h = rotateRight(h); // split 4-node if(isRed(h.left) && isred(h.right)) flipColors(h); return h; }
255 insertions in ascending order
255 insertions in descending order
255 random insertions
What is the height of an LLRB tree with \(N\) keys in the worst case?
A. \(\texttilde \log_3 N\)
B. \(\texttilde \log_2 N\)
C. \(\texttilde 2 \log_2 N\)
D. \(\texttilde N\)
E. I don't know
Proposition: Height of tree is \(\leq 2 \log N\) in the worst case
Pf:
Property: Height of tree is \(\texttilde 1.0 \lg N\) in typical applications
implementation | search\(^*\) | insert\(^*\) | delete\(^*\) | search\(^\dagger\) | insert\(^\dagger\) | delete\(^\dagger\) | ordered | ops on keys |
---|---|---|---|---|---|---|---|---|
seq search (unordered list) | \(N\) | \(N\) | \(N\) | \(N\) | \(N\) | \(N\) | equals() |
|
binary search (ordered array) | \(\log N\) | \(N\) | \(N\) | \(\log N\) | \(N\) | \(N\) | X | compareTo() |
BST | \(N\) | \(N\) | \(N\) | \(\log N\) | \(\log N\) | \(\sqrtN\) | X | compareTo() |
2-3 tree\(^\ddagger\) | \(\log N\) | \(\log N\) | \(\log N\) | \(\log N\) | \(\log N\) | \(\log N\) | X | compareTo() |
LLRB\(^\star\) | \(\log N\) | \(\log N\) | \(\log N\) | \(\log N\) | \(\log N\) | \(\log N\) | X | compareTo() |
\(^*\)guarantee, \(^\dagger\)average
\(^\ddagger\)hidden constant \(c\) is large (depends upon implementation)
\(^\star\)hidden constant \(c\) is small (at most \(2 \lg N\) compares)
Xerox PARC innovations (1970s)
|
![]() ![]() |
Telephone company contracted with database provider to build real-time database to store customer information
Database implementation
Telephone company contracted with database provider to build real-time database to store customer information
Extended telephone service outage
“If implemented properly, the height of a red-black BST with \(N\) keys is at most \(2 \lg N\).
”
—expert witness
Property: time required for a probe is much larger than time to access data within a page
Cost model: number of probes
Goal: access data using minimum number of probes
B-tree: Generalize 2-3 trees by allowing up to \(M\) keys per node
Proposition: A search or an insertion in a B-tree of order \(M\) with \(N\) keys requires between \(\texttilde \log_M N\) and \(\texttilde \log_{M/2} N\) probes.
Pf: All nodes (except possibly root) have between \(\left\lfloor M/2 \right\rfloor\) and \(M\) keys
In practice: Number of probes is at most \(4\) (when \(M=1024\), \(N = 62 \text{ billion}\), then \(\log_{M/2} N \leq 4\))
What of the following does the B in B-tree not mean?
A. Bayer
B. Balanced
C. Binary
D. Boeing
E. I don't know
“the more you think about what the B in B-trees could mean, the more you learn about B-trees and that is good.
”
–Ed McCreight
Red-Black trees are widely used as system symbol tables
java.util.TreeMap
, java.util.TreeSet
linux/rbtree.h
B-tree cousins: B+ tree, B*tree, B# tree, ...
B-trees (and cousins) are widely used for file systems and DBs