A collection is a data type that stores groups of items
data type | key operations | data structure |
---|---|---|
stack | push , pop |
LL, resizing array |
queue | enqueue , dequeue |
LL, resizing array |
priority queue | insert , delete-max |
binary heap |
symbol table | put , get , delete |
BST, hash table |
set | add , contains , delete |
BST, hast table |
“Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won't usually need your code; it'll be obvious
”
—Fred Brooks
Collections: Insert and delete items. Which item to delete?
Stack: Remove the item most recently added
Queue: Remove the item least recently added
Randomized queue: Remove a random item
Priority queue: Remove the largest (or smallest) item
insert(P) insert(Q) insert(E) remove-max() => Q insert(X) insert(A) insert(M) remove-max() => X insert(P) insert(L) insert(E) remove-max() => P
Requirement: Generic items are Comparable
Note: Duplicate keys allowed; delMax()
and max()
picks any maximum key
Challenge: Find the largest \(M\) items in a stream of \(N\) items, where \(N\) is huge and \(M\) is large
Constraint: not enough memory to store \(N\) items
$ more transactions.txt Turing 6/17/1990 644.08 vonNeumann 3/26/2002 4121.85 Dijkstra 8/22/2007 2678.40 vonNeumann 1/11/1999 4409.74 Dijkstra 11/18/1995 837.42 Hoare 5/10/1993 3229.27 vonNeumann 2/12/1994 4732.35 Hoare 8/18/1992 4381.21 Turing 1/11/2002 66.10 Thompson 2/27/2000 4747.08 Turing 2/11/1991 2156.86 Hoare 8/12/2003 1025.70 vonNeumann 10/13/1993 2520.97 Dijkstra 9/10/2000 708.95 Turing 10/12/1993 3532.36 Hoare 2/10/2005 4050.20 $ java TopM 5 < transactions.txt # sort key = last col Thompson 2/27/2000 4747.08 vonNeumann 2/12/1994 4732.35 vonNeumann 1/11/1999 4409.74 Hoare 8/18/1992 4381.21 vonNeumann 3/26/2002 4121.85
// Transaction data type is Comparable (ordered by $) // use a min-oriented priority queue MinPQ<Transaction> pq = new MinPQ<Transaction>(); while(StdIn.hasNextLine()) { String line = StdIn.readLine(); Transaction transaction = new Transaction(line); pq.insert(transaction); if(pq.size() > M) pq.delMin(); // pq now contains largest M items }
// unordered ordered // op ret sz contents contents insert(P) 1 P P insert(Q) 2 P Q P Q insert(E) 3 P Q E E P Q delMax() Q 2 P E E P insert(X) 3 P E X E P X insert(A) 4 P E X A A E P X insert(M) 5 P E X A M A E M P X delMax() X 4 P E A M A E M P insert(P) 5 P E A M P A E M P P insert(L) 6 P E A M P L A E L M P P insert(E) 7 P E A M P L E A E E L M P P delMax() P 6 E A M P L E A E E L M P P
A sequence of operations on a priority queue that is implemented using unordered array (left) and ordered array (right)
implementation | insert |
delMax |
max |
---|---|---|---|
unordered array | \(1\) | \(N\) | \(N\) |
ordered array | \(N\) | \(1\) | \(1\) |
goal for today | \(\log N\) | \(\log N\) | \(\log N\) |
Order of growth of running time for priority queue with \(N\) items
Binary tree: Empty or node with links to left and right binary trees
Complete tree: Perfectly balanced, except for bottom level
complete binary tree with \(N=16 \text{ nodes}\) (\(\text{height} = 4\))
Property: Height of complete binary tree with \(N\) nodes is \(\lfloor \lg N \rfloor\).
Pf: Height increases only when \(N\) is a power of \(2\).
Array representation
0 1 2 3 4 5 6 7 8 9 10 11 a[i] = [ . T S R P N O A E I H G] T S R P N O A E I H G
What is the index of the parent of the item at index \(k\) in a binary heap?
A. \(k/2 - 1\) |
0 1 2 3 4 5 6 7 8 9 10 11 a[i] = [ . T S R P N O A E I H G]
Array representation
Max-Heap ordering
Binary heap: Array representation of a heap-ordered complete binary tree
"Just enough" ordering to support efficient priority queue operations.
a[1]
, which is the root of the binary treek
at locations 2*k
and 2*k+1
k
is at k/2
insert()
and delMax()
violate heap order, but easy to fix upInsert: Add node at end, them swim it up
Remove the maximum: Exchange root with node at end, then sink it down
Scenario: A key becomes larger than its parent's key
To eliminate the violation:
private void swim(int k) { while(k > 1 && less(k/2, k)) { exch(k, k/2); k = k/2; } }
|
|
Insert: Add node at end, then swim it up
Cost: At most \(1+\lg N\) compares
public void insert(Key k) { pq[++N] = k; swim(N); }
|
|
Scenario: A key becomes smaller than one (or both) of its children's
To eliminate the violation:
private void sink(int k) { while(2*k <= N) { int j = 2*k; // first child if(j < N && less(j, j+1)) j++; // second is larger if(!less(k, j)) break; // parent > child? exch(k, j); k = j; } }
|
|
Delete max: Exchange root with node at end, then sink it down
Cost: At most \(2 \lg N\) compares
public Key delMax() { Key max = pq[1]; exch(1, N); pq[N--] = null; // prevent loitering! sink(1); return max; }
|
|
public class MaxPQ<Key extends Comparable<Key>> { private Key[] pq; private int N; public MaxPQ(int capacity) { pq = (Key[]) new Comparable[capacity+1]; } public boolean isEmpty() { return N == 0; } public void insert(Key key) { /* see prev code */ } public Key delMax() { /* see prev code */ } private void swim(int k) { /* see prev code */ } private void sink(int k) { /* see prev code */ } private boolean less(int i, int j) { return pq[i].compareTo(pq[j]) < 0; } private void exch(int i, int j) { Key t = pq[i]; pq[i] = pq[j]; pq[j] = t; } }
implementation | insert |
delMax |
max |
---|---|---|---|
unordered array | \(1\) | \(N\) | \(N\) |
ordered array | \(N\) | \(1\) | \(1\) |
binary heap | \(\log N\) | \(\log N\) | \(1\) |
order-of-growth of running time for priority queue with \(N\) items
Challenge: Delete a random key from a binary heap in logarithmic time
Do "half-exchanges" in sink or swim
Multiway heaps
Fact: Height of complete \(d\)-way tree on \(N\) nodes is \(\texttilde \log_d N\)
How many compares (in the worst case) to insert in a \(d\)-way heap?
A. \(\texttilde \log_2 N\)
B. \(\texttilde \log_d N\)
C. \(\texttilde d \log_2 N\)
D. \(\texttilde d \log_d N\)
E. I don't know
How many compares (in the worst case) to delete-max in a \(d\)-way heap?
A. \(\texttilde \log_2 N\)
B. \(\texttilde \log_d N\)
C. \(\texttilde d \log_2 N\)
D. \(\texttilde d \log_d N\)
E. I don't know
implementation | insert |
delMax |
max |
---|---|---|---|
unordered array | \(1\) | \(N\) | \(N\) |
ordered array | \(N\) | \(1\) | \(1\) |
binary heap | \(\log N\) | \(\log N\) | \(1\) |
\(d\)-ary heap | \(\log_d N\) | \(d \log_d N\) | \(1\) |
Fibonacci | \(1\) | \(\log N^*\) | \(1\) |
Brodal queue | \(1\) | \(\log N\) | \(1\) |
impossible | \(1\) | \(1\) | \(1\) |
\(^*\) amortized
sweet spot for \(d\) is \(d=4\)
why is last line impossible?
order-of-growth of running time for priority queue with \(N\) items
Underflow and overflow
Minimum-oriented priority queue
less()
with greater()
greater()
Binary heap is not cache friendly (ex: page size = 8 nodes)
Other operations
sink()
and swim()
(stay tuned for Prim/Dijkstra)Immutability of keys
Data type: set of values and operations on those values
Immutable data type: cannot change the data type value once created
public final class Vector { // final = can't override // instance methods private final int N; // instance vars private private final double[] data; // and final public Vector(double[] data) { this.N = data.length; this.data = new double[N]; for(int i = 0; i < N; i++) // defensive copy of this.data[i] = data[i]; // mutable instance vars } /* ... */ // instance methods don't // change instance vars }
Immutable: String
, Integer
, Double
, Color
, Vector
, Transaction
, Point2D
Mutable: StringBuilder
, Stack
, Counter
, Java array
Advantages of immutability:
Disadvantage: Must create new object for each data type value
“Classes should be immutable unless there's a very good reason to make them mutable. [...] If a class cannot be made immutable, you should still limit its mutability as much as possible.
”
—Joshua Bloch (Java architect)
What is the name of this sorting algorithm?
public void sort(String[] a) { int N = a.length; MaxPQ<String> pq = new MaxPQ<String>(); for(int i = 0; i < N; i++) pq.insert(a[i]); for(int i = N-1; i >= 0; i--) a[i] = pq.delMax(); }
A. insertion sort
B. mergesort
C. quicksort
D. None of the above
E. I don't know
What are its properties?
public void sort(String[] a) { int N = a.length; MaxPQ<String> pq = new MaxPQ<String>(); for(int i = 0; i < N; i++) pq.insert(a[i]); for(int i = N-1; i >= 0; i--) a[i] = pq.delMax(); }
A. \(N \lg N\) compares in the worst case
B. in-place sorting
C. stable sorting
D. All of the above
E. I don't know
Basic plan for in-place sort
Heap construction: build max heap using bottom-up method (we assume array entries are indexed 1 to N)
Sortdown: Repeatedly delete the largest remaining item
sink(5)
sink(5)
sink(4)
sink(4)
sink(3)
sink(3)
sink(2)
sink(2)
sink(1)
sink(1)
max-heap!
exch(11), then sink(1)
exch(10), then sink(1)
exch(10), then sink(1)
exch(9), then sink(1)
exch(9), then sink(1)
exch(8), then sink(1)
exch(8), then sink(1)
exch(7), then sink(1)
exch(7), then sink(1)
exch(6), then sink(1)
exch(6), then sink(1)
exch(5), then sink(1)
exch(5), then sink(1)
exch(4), then sink(1)
exch(4), then sink(1)
exch(3), then sink(1)
exch(3), then sink(1)
exch(2), then sink(1)
exch(2), then sink(1)
exch(1), then sink(1)
done sorting!
Heap construction (first pass):
for(int k = N/2; k >= 1; k--) sink(a, k, N);
Sortdown (second pass):
while(N > 1) { exch(a, 1, N--); sink(a, 1, N); }
public class Heap { public static void sort(Comparable[] a) { int N = a.length; for(int k = N/2; k >= 1; k--) sink(a, k, N); while(N > 1) { exch(a, 1, N); sink(a, 1, --N); } } private static void sink(Comparable[] a, int k, int N) { /* as before, but make static and pass arguments */ } private static boolean less(Comparable[] a, int i, int j) { /* as before, but convert from 1-based indexing to 0-base */ } private static void exch(Object[] a, int i, int j) { /* as before, but convert from 1-based indexing to 0-base */ } }
N k 0 1 2 3 4 5 6 7 8 9 10 11 S O R T E X A M P L E initial values 11 5 . . . . L . . . . E E 11 4 . . . T . . . M P . . 11 3 . . X . . R A . . . . 11 2 . T . P L . . M O . . 11 1 X T S . . R A . . . . X T S P L R A M O E E heap-ordered 10 1 T P S O L . . M E . X 9 1 S P R . . E A . . T . 8 1 R P E . . E A . S . . 7 1 P O E M L . . R . . . 6 1 O M E A L . P . . . . 5 1 M L . A . O . . . . . 4 1 L E E A M . . . . . . 3 1 E A E L . . . . . . . 2 1 E A E . . . . . . . . 1 1 A E . . . . . . . . . A E E L M O P R S T X sorted result
Black values are sorted
Gray values are unsorted
Red triangle marks algorithm position
Proposition: Heap construction uses \(\leq 2N\) compares and \(\leq N\) exchanges
Pf sketch (assume \(N = 2^{h+1}-1\)):
\[\begin{array}{rcl} h + 2(h-1) + 4(h-2) + 8(h-3) + \ldots + 2^h(0) & \leq & 2^{h+1}-1 \\ & = & N \end{array}\]
note: left side of \(\leq\) is a tricky sum (see Discrete Math)
Proposition: Heapsort uses \(\leq 2N \lg N\) compares and exchanges, though algorithm can be improved to \(\texttilde 1 N \lg N\) (but no such variant is known to be practical)
Significance: In-place sorting algorithm with \(N \log N\) worst-case
Proposition: Heapsort uses \(\leq 2N \lg N\) compares and exchanges, though algorithm can be improved to \(\texttilde 1 N \lg N\) (but no such variant is known to be practical)
Bottom line: Heapsort is optimal for both time and space, but...
Goal: as fast as quicksort in practice; \(N \log N\) worst case, in place
Introsort
In the wild: C++ STL, Microsoft .NET Framework
inplace? | stable? | best | avg | worst | remarks | |
---|---|---|---|---|---|---|
selection | X | \(\onehalf N^2\) | \(\onehalf N^2\) | \(\onehalf N^2\) | \(N\) exchanges | |
insertion | X | X | \(N\) | \(\onequarter N^2\) | \(\onehalf N^2\) | use for small \(N\) or partially ordered |
shell | X | \(N \log_3 N\) | ? | \(c N^a\) | tight code; subquadratic | |
merge | X | \(\onehalf N \lg N\) | \(N \lg N\) | \(N \lg N\) | \(N \log N\) guarantee; stable | |
timsort | X | \(N\) | \(N \lg N\) | \(N \lg N\) | improves mergesort when preexisting order | |
quick | X | \(N \lg N\) | \(2 N \ln N\) | \(\onehalf N^2\) | \(N \log N\) probabilistic guarantee; fastest in practice | |
3-way qs | X | \(N\) | \(2 N \ln N\) | \(\onehalf N^2\) | improves quicksort when duplicate keys | |
heap | X | \(N\) | \(2 N \lg N\) | \(2 N \lg N\) | \(N \log N\) guarantee; in-place | |
? | X | X | \(N\) | \(N \lg N\) | \(N \lg N\) | holy grail of sorting |