Critical components in the world's computational infrastructure
Merge sort: last lecture
Quicksort: this lecture
Basic plan
j
a[j]
is in placej
j
input: Q U I C K S O R T E X A M P L E shuffle: K R A T E L E P U I M Q C X O S partition item: ^---------v partition: E C A I E K L P U T M Q R X O S all <= K | | all >= K sort left: A C E E I . . . . . . . . . . . sort right: . . . . . . L M O P Q R S T U X result: A C E E I K L M O P Q R S T U X
|
“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.
”
“I call it my billion-dollar mistake. It was the invention of the null reference in 1965... This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.
”
|
Repeat until i
and j
pointers cross
i
from left to right so long as a[i] < a[lo]
j
from right to left so long as a[j] > a[lo]
a[i]
with a[j]
When pointers cross
a[lo]
with a[j]
Partitioned!
private static int partition(Comparable[] a, int lo, int hi) { int i = lo, j = hi + 1; while(true) { while(less(a[++i], a[lo])) // find item on left to swap if(i == hi) break; while(less(a[lo], a[--j])) // find item on right to swap if(j == lo) break; if(i >= j) break; // check if pointers cross exch(a, i, j); // swap } exch(a, lo, j); // swap with partition item return j; // return index of item now know to be in place }
Q. How many compares (in the worst case) to partition an array of length \(N\)?
A. \(\texttilde \frac{1}{4} N\)
B. \(\texttilde \frac{1}{2} N\)
C. \(\texttilde N\)
D. \(\texttilde N \lg N\)
E. I don't know.
public class Quick { private static int partition(Comparable[] a, int lo, int hi) { /* as before */ } public static void sort(Comparable[] a) { // shuffle needed for performance guarantee (stay tuned...) StdRandom.shuffle(a); sort(a, 0, a.length - 1); } private static void sort(Comparable[] a, int lo, int hi) { if(hi <= lo) return; int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); } }
lo j hi 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 Q U I C K S O R T E X A M P L E <- initial values K R A T E L E P U I M Q C X O S <- random shuffle 0 5 15 E C A I E|K|L P U T M Q R X O S 0 3 4 E C A|E|I . . . . . . . . . . . 0 2 2 A C|E|. . . . . . . . . . . . . 0 0 1 |A|C . . . . . . . . . . . . . . 1 1 .|C|. . . . . . . . . . . . . . <- no partition for 4 4 . . . .|I|. . . . . . . . . . . <- subarrays of sz 1 6 6 15 . . . . . .|L|P U T M Q R X O S 7 9 15 . . . . . . . M O|P|T Q R X U S 7 7 8 . . . . . . .|M|O . . . . . . . 8 8 . . . . . . . .|O|. . . . . . . <- (same) 10 13 15 . . . . . . . . . . S Q R|T|U X 10 12 12 . . . . . . . . . . R Q|S|. . . 10 11 11 . . . . . . . . . . Q|R|. . . . 10 10 . . . . . . . . . .|Q|. . . . . <- (same) 14 14 15 . . . . . . . . . . . . . .|U|X 15 15 . . . . . . . . . . . . . . .|X| <- (same) A C E E I K L M O P Q R S T U X <- result
Black values are sorted
Gray values are unsorted
Red triangle marks algorithm position
Dark gray values denote the current subarray
Partition in-place: Using an extra array makes partitioning easier (and stable), but is not worth the cost
Terminating the loop: Testing whether the pointers cross is trickier than it might seem
Equal keys: When duplicates are present, it is (counter-intuitively) better to stop scans on keys equal to the partitioning item's key (stay tuned)
Preserving randomness: Shuffling is needed for performance guarantee
Equivalent alternative: Pick a random partitioning item in each subarray
Running time estimates
Running time estimates:
computer | IS 1k | IS 1m | IS 1b | MS 1k | MS 1m | MS 1b | QS 1k | QS 1m | QS 1b |
---|---|---|---|---|---|---|---|---|---|
home | instant | 2.8hrs | 317yrs | instant | 1sec | 18min | instant | 0.6sec | 12min |
super | instant | 1sec | 1wk | instant | instant | instant | instant | instant | instant |
Lesson 1: Good algorithms are better than supercomputers
Lesson 2: Great algorithms are better than good ones
Best case: Number of compares is \(\texttilde N \lg N\)
lo j hi 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 A B C D E F G H I J K L M N O <- initial values H A C B F E G D L I K J N M O <- random shuffle 0 7 14 D A C B F E G|H|L I K J N M O 0 3 6 B A C|D|F E G . . . . . . . . 0 1 2 A|B|C . . . . . . . . . . . . 0 0 |A|. . . . . . . . . . . . . . 2 2 . .|C|. . . . . . . . . . . . 4 5 6 . . . . E|F|G . . . . . . . . 4 4 . . . .|E|. . . . . . . . . . 6 6 . . . . . .|G|. . . . . . . . 8 11 14 . . . . . . . . J I K|L|N M O 8 9 10 . . . . . . . . I|J|K . . . . 8 8 . . . . . . . .|I|. . . . . . 10 10 . . . . . . . . . .|K|. . . . 12 13 14 . . . . . . . . . . . . M|N|O 12 12 . . . . . . . . . . . .|M|. . 14 14 . . . . . . . . . . . . .|O| A B C D E F G H I J K L M N O <- result
Worst case: Number of compares is \(\texttilde \frac{1}{2} N^2\)
lo j hi 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 A B C D E F G H I J K L M N O <- initial values A B C D E F G H I J K L M N O <- random shuffle 0 0 14 |A|B C D E F G H I J K L M N O 1 1 14 .|B|C D E F G H I J K L M N O 2 2 14 . .|C|D E F G H I J K L M N O 3 3 14 . . .|D|E F G H I J K L M N O 4 4 14 . . . .|E|F G H I J K L M N O 5 5 14 . . . . .|F|G H I J K L M N O 6 6 14 . . . . . .|G|H I J K L M N O 7 7 14 . . . . . . .|H|I J K L M N O 8 8 14 . . . . . . . .|I|J K L M N O 9 9 14 . . . . . . . . .|J|K L M N O 10 10 14 . . . . . . . . . .|K|L M N O 11 11 14 . . . . . . . . . . .|L|M N O 12 12 14 . . . . . . . . . . . .|M|N O 13 13 14 . . . . . . . . . . . . .|N|O 14 14 14 . . . . . . . . . . . . .|O| A B C D E F G H I J K L M N O <- result
Proposition: The average number of compares \(C_N\) to quicksort an array of \(N\) distinct keys is \(\texttilde 2 N \ln N\) (and the number of exchanges is \(\texttilde \frac{1}{3} N \ln N\))
Pf: \(C_N\) satisfies the recurrence \(C_0 = C_1 = 0\) and for \(N \geq 2\):
\[\scriptsize C_N = \underbrace{(N+1)}_\text{partitioning} + \frac{C_0+C_{N-1}}{N} + \underbrace{\frac{\overbrace{C_1}^\text{left}+\overbrace{C_{N-2}}^\text{right}}{N}}_\text{partitioning probability} + \ldots + \frac{C_{N-1} + C_0}{N}\]
\[\scriptsize C_N = (N+1) + \frac{C_0+C_{N-1}}{N} + \frac{C_1+C_{N-2}}{N} + \ldots + \frac{C_{N-1} + C_0}{N}\]
\[N C_N = N(N+1) + 2(C_0 + C_1 + \ldots + C_{N-1})\]
\[N C_N - (N-1)C_{N-1} = 2N + 2C_{N-1}\]
\[\frac{C_N}{N+1} = \frac{C_{N-1}}{N} + \frac{2}{N+1}\]
\[\begin{array}{rcl} \frac{C_N}{N+1} & = & \frac{C_{N-1}}{N} + \frac{2}{N+1} \\ & = & \frac{C_{N-2}}{N-1} + \frac{2}{N} + \frac{2}{N+1} \\ & = & \frac{C_{N-3}}{N-2} + \frac{2}{N-1} + \frac{2}{N} + \frac{2}{N+1} \\ & = & \frac{2}{3} + \frac{2}{4} + \frac{2}{5} + \ldots + \frac{2}{N+1} \end{array}\]
\[\begin{array}{rcl} C_N & = & 2(N+1) \left(\frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \ldots + \frac{1}{N+1}\right) \\ & \texttilde & 2(N+1) \int_3^{N+1} \frac{1}{x} dx \end{array}\]
\[C_N \texttilde 2(N+1) \ln N \approx 1.39 N \lg N\]
Quicksort is a (Las Vegas) randomized algorithm
Average case: Expected number of compares is \(\texttilde 1.39 N \lg N\)
Best case: Number of compares is \(\texttilde N \lg N\)
Worst case: Number of compares is \(\texttilde \frac{1}{2} N^2\) (but more likely that lightning bolt strikes computer during execution!)
Proposition: Quicksort is an in-place sorting algorithm
Pf:
Proposition: Quicksort is not stable.
Pf (by counterexample):
i j 0 1 2 3 B1 C1 C2 A1 <- input with partition = B1 1 3 B1 C1 C2 A1 <- found first inversion for partition 1 3 B1 A1 C2 C1 <- swap (oh no!) 1 A1 B1 C2 C1 <- swap partition in place
Insertion sort small subarrays
private static void sort(Comparable[] a, int lo, int hi) { if(hi <= lo + CUTOFF - 1) { Insertion.sort(a, lo, hi); return; } int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); }
Median of sample
private static void sort(Comparable[] a, int lo, int hi) { if(hi <= lo) return; int median = medianOf3(a, lo, lo+(hi-lo)/2, hi); swap(a, lo, median); // swap median into lo position int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); }