Ex: Student records in a university
Chen | 3 | A | 991-878-4944 | 308 Blair |
Rohde | 2 | A | 232-343-5555 | 343 Forbes |
Gazsi | 4 | B | 766-093-9873 | 101 Brown |
Furia | 1 | A | 766-093-9873 | 101 Brown |
Kanaga | 3 | B | 898-122-9643 | 22 Brown |
Andrews | 3 | A | 664-480-0023 | 97 Little |
Battle | 4 | C | 874-088-1212 | 121 Whitman |
Item: a row in the table
Furia | 1 | A | 766-093-9873 | 101 Brown |
Key: a specific entry of an item that may or may not be unique
Furia | 1 | A | 766-093-9873 | 101 Brown |
Sort: Rearrange array of \(N\) items into ascending order
Chen | 3 | A | 991-878-4944 | 308 Blair |
Rohde | 2 | A | 232-343-5555 | 343 Forbes |
Gazsi | 4 | B | 766-093-9873 | 101 Brown |
Furia | 1 | A | 766-093-9873 | 101 Brown |
Kanaga | 3 | B | 898-122-9643 | 22 Brown |
Andrews | 3 | A | 664-480-0023 | 97 Little |
Battle | 4 | C | 874-088-1212 | 121 Whitman |
Sort: Rearrange array of \(N\) items into ascending order
Andrews | 3 | A | 664-480-0023 | 97 Little |
Battle | 4 | C | 874-088-1212 | 121 Whitman |
Chen | 3 | A | 991-878-4944 | 308 Blair |
Furia | 1 | A | 766-093-9873 | 101 Brown |
Gazsi | 4 | B | 766-093-9873 | 101 Brown |
Kanaga | 3 | B | 898-122-9643 | 22 Brown |
Rohde | 2 | A | 232-343-5555 | 343 Forbes |
Goal: Sort any type of data
Ex 1: Sort random real numbers in ascending order (seems artificial... stay tuned for an application)
public class Experiment { public static void main(String[] args) { int N = Integer.parseInt(args[0]); Double[] a = new Double[N]; for(int i = 0; i < N; i++) a[i] = StdRandom.uniform(); Insertion.sort(a); for(int i = 0; i < N; i++) StdOut.println(a[i]); } }
$ java Experiment 8 0.08614716385210452 0.10708746304898642 0.21166190071646818 0.363292849257276 0.460954145685913 0.5340026311350087 0.7216129793703496 0.9293994908845686
Goal: Sort any type of data
Ex 2: Sort strings in alphabetical order
public class StringSorter { public static void main(String[] args) { String[] a = StdIn.readAllStrings(); Insertion.sort(a); for(int i = 0; i < a.length; i++) StdOut.println(a[i]); } }
$ more words3.txt bed bug dad yet zoo [...] all bad yes $ java StringSorter < words3.txt all bad bed bug dad [...] yes yet zoo [supressing newlines]
Goal: Sort any type of data
Ex 3: Sort the files in a given directory by filename
import java.io.File; public class FileSorter { public static void main(String[] args) { File directory = new File(args[0]); File[] files = directory.listFiles(); Insertion.sort(files); for(int i = 0; i < files.length; i++) StdOut.println(files[i].getName()); } }
$ java FileSorter . FileSorter.class FileSorter.java Insertion.class Insertion.java Selection.class Selection.java
Goal: Sort any type of data (for which sorting is well defined)
A total order is a binary relation \(\leq\) that satisfies
Ex:
No transitivity: Rock-Paper-Scissors
No totality: CSE course prerequisites
Rock-Paper-Scissors violates transitivity (if both \(v \leq w\) and \(w \leq x\), then \(v \leq x\))
scissors \(\leq\) rock, and rock \(\leq\) paper, but scissors \(\cancel{\leq}\) paper
CSE course prerequisites violate totality (either \(v \leq w\) or \(w \leq v\) or both)
cannot compare cos382 and cos424!
Goal: Sort any type of data (for which sorting is well defined)
Q: How can sort()
know how to compare data of type Double
, String
, and java.io.File
without any information about the type of an item's key?
Callback = Reference to executable code
sort()
functionsort()
calls object's compareTo()
method as neededImplementing callbacks
// client code public class StringSorter { public static void main(String[] args) { String[] a = StdIn.readAllStrings(); Insertion.sort(a); // <- defined below for(int i = 0; i < a.length; i++) StdOut.println(a[i]); } } // Comparable interface (built in to Java) public interface Comparable<Item> { public int compareTo(Item that); } // data-type implementation public class String implements Comparable<String> { /* ... */ public int compareTo(String b) { /* ... */ return -1; /* ... */ return +1; /* ... */ return 0; } } // sort implementation public class Insertion { public static void sort(Comparable[] a) { int N = a.length; for(int i = 0; i < N; i++) for(int j = i; j > 0; j--) if(a[j].compareTo(a[j-1]) < 0) exch(a, j, j-1); // ^^^^^^^^^^^^ key point: no dependence on // String data type! else break; } }
Implement compareTo()
so that v.compareTo(w)
v
is less than, equal to, or greater than w
respectivelynull
)if(v < w) return -1; if(v == w) return 0; if(v > w) return +1;
Built-in comparable types: Integer
, Double
, String
, Date
, File
, ...
User-defined comparable types: implement the Comparable
interface
Comparable
interfaceDate data type (simplified version of java.util.Date
)
public class Date implements Comparable<Date> { // ^^^^^ // only compares dates to other dates private final int month, day, year; public Date(int m, int d, int y) { month = m; day = d; year = y; } public int compareTo(Date that) { if(this.year < that.year ) return -1; if(this.year > that.year ) return +1; if(this.month < that.month) return -1; if(this.month > that.month) return +1; if(this.day < that.day ) return -1; if(this.day > that.day ) return +1; return 0; } }
i
, find index min
of smallest remaining entrya[i]
and a[min]
Algorithm: ↑ scans from left to right
Invariants
X | X | X | |||||||||||||
X | X | X | X | X | |||||||||||
X | X | X | X | X | X | X | |||||||||
X | X | X | X | X | X | X | X | X | X | ||||||
X | X | X | X | X | X | X | X | X | X | X | |||||
X | X | X | X | X | X | X | X | X | X | X | X | X | X | ||
X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X |
1 | 1 | 2 | 2 | 2 | 3 | 4 | 4 | 7 | 6 | 6 | 7 | 5 | 5 | 4 | 7 |
↑ |
Helper functions: Refer to data through compares and exchanges
Less: is item v
less than w
?
private static boolean less(Comparable v, Comparable w) { return v.compareTo(w) < 0; }
Exchange: swap item in array a[]
at index i
with one at index j
private static void exch(Comparable[] a, int i, int j) { Comparable swap = a[i]; a[i] = a[j]; a[j] = swap; }
To maintain algorithm invariants:
// 1 i++; // 2 int min = i; for(int j = i+1; i < N; j++) if(less(a[j], a[min])) min = j; // 3 exch(a, i, min);
public class Selection { public static void sort(Comparable[] a) { int N = a.length; for(int i = 0; i < N; i++) { int min = i; for(int j = i+1; j < N; j++) if(less(a[j], a[min])) min = j; exch(a, i, min); } } private static boolean less(Comparable v, Comparable w) { /* as before */ } private static void exch(Comparable[] a, int i, int j) { /* as before */ } }
Black values are sorted
Gray values are unsorted
Red triangle marks algorithm position
Proposition: Selection sort uses \((N-1)+(N-2)+\ldots+1+0 \texttilde N^2/2\) compares and \(N\) exchanges.
Running time insensitive to input: Quadratic time, even if input is sorted
Data movement is minimal: Linear number of exchanges
i
, swap a[i]
with each larger entry to its leftAlgorithm: ↑ scans from left to right
Invariants
X | X | ||||||||||||||
X | X | X | X | ||||||||||||
X | X | X | X | X | X | ||||||||||
X | X | X | X | X | X | X | X | X | |||||||
X | X | X | X | X | X | X | X | X | X | ||||||
X | X | X | X | X | X | X | X | X | X | X | X | X | |||
X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X |
1 | 1 | 2 | 4 | 5 | 7 | 3 | 6 | 1 | 4 | 6 | 7 | 2 | 5 | 4 | 2 |
↑ |
To maintain algorithm invariants:
a[i]
with each larger entry to its left// 1 i++; // 2 for(int j = i; j > 0; j--) if(less(a[j], a[j-1])) exch(a, j, j-1); else break;
public class Insertion { public static void sort(Comparable[] a) { int N = a.length; for(int i = 0; i < N; i++) for(int j = i; j > 0; j--) if(less(a[j], a[j-1])) exch(a, j, j-1); else break; } private static boolean less(Comparable v, Comparable w) { /* as before */ } private static void exch(Comparable[] a, int i, int j) { /* as before */ } }
Black values are sorted
Gray values are unsorted
Red triangle marks algorithm position
Proposition: To sort a randomly-ordered array with distinct keys, insertion sort uses \(\texttilde \frac{1}{4}N^2\) compares and \(\texttilde \frac{1}{4} N^2\) exchanges on average.
Pf: Expect each entry to move halfway back.
Best case: If the array is in ascending order, insertion sort makes \(N-1\) compares and \(0\) exchanges.
A E E L M O P R S T X
Worst case: if the array is in descending order (and no duplicates), insertion sort makes \(\texttilde \frac{1}{2} N^2\) compares and \(\texttilde \frac{1}{2} N^2\) exchanges.
X T S R P O M L F E A
Def: An inversion is a pair of keys that are out of order
A E E L M O T R X P S
Above has 6 inversions: T-R
, T-P
, T-S
, R-P
, X-P
, X-S
Def: An array is partially sorted if the number of inversions is \(\leq cN\)
Proposition: For partially-sorted arrays, insertion sort runs in linear time
Pf: Number of exchanges equals the number of inversions (num of compares = exchanges + \((N-1)\))
Half exchanges: Shift items over (instead of exchanging)
less()
and exch()
to access data\ / A C H H I M N N P Q X Y K B I N A R Y > > > > > > > ^ A C H H I K M N N P Q X Y B I N A R Y
Binary insertion sort: Use binary search to find insertion point
A C H H I M N N P Q X Y K B I N A R Y | binary search for | first key > K
Idea: Move entries more than one position at a time by \(h\)-sorting the array
An \(h\)-sorted array is \(h\) interleaved sorted subsequences
h = 4 full: L E E A M H L E P S O L T S X R grp0: L ----- M ----- P ----- T grp1: E ----- H ----- S ----- S grp2: E ----- L ----- O ----- X grp3: A ----- E ----- L ----- R
Shellsort [Shell 1959]: \(h\)-sort array for decreasing sequence of values of \(h\)
input: S H E L L S O R T E X A M P L E 13-sort: P H E L L S O R T E X A M S L E 4-sort: L E E A M H L E P S O L T S X R 1-sort: A E E E H L L L M O P R S S T X
i
, swap a[i]
with each larger entry h
positions to its left\(h\)-sorted, where \(h=3\)
How to \(h\)-sort an array? Insertion sort, with stride length \(h\)
Why insertion sort?
input: S O R T E X A M P L E 7-sort: S O R T E X A M P L E M S . P L R E T M O L E E X A S P R T 3-sort: M O L E E X A S P R T E M E O . X A E M . . S . P X . . . R . . . T A E L E O P M S X R T 1-sort: A E L E O P M S X R T E L E L O P M O P S X R S X T X A E E L M O P R S T X sorted: A E E L M O P R S T X
public class Shell { public static void sort(Comparable[] a) { int N = a.length; int h = 1; while(h < N/3) h = 3*h + 1; // 1, 4, 13, 40, 121, 364, ... while(h >= 1) { for(int i = h; i < N; i++) { for(int j = i; j >= h && less(a[j], a[j-h]); j -= h) exch(a, j, j-h); } h = h / 3; } } private static boolean less(Comparable v, Comparable w) { /* as before */ } private static void exch(Comparable[] a, int i, int j) { /* as before */ } }
Black values are sorted
Gray values are unsorted
Dark gray values show current sub-array that is being sorted
Red triangle marks algorithm position
Powers of two: 1, 2, 4, 8, 16, 32, ...
No
Powers of two minus one: 1, 3, 7, 15, 31, 63, ...
Maybe
\(3x + 1\): 1, 4, 13, 40, 121, 364, ...
OK; Easy to compute
Sedgewick: 1, 5, 19, 41, 109, 209, 505, 929, 2161, 3905, ...
(merging of \((9*4^i) - (9*2^i) + 1)\) and \(4^i - (3*2^i)+1\))
Good; tough to beat in empirical studies
Proposition: An \(h\)-sorted array remains \(h\)-sorted after \(g\)-sorting it.
7-sort: S O R T E X A M P L E /-> 3-sort: M O L E E X A S P R T M S | E M . P | E O L R | . X E T | A E M M O L E E X A S P R T ->/ . . S ^ ^ . P X | | . . . R 7-sorted . . . T A E L E O P M S X R T ^ ^ | | still 7-sorted!
Challenge: Prove this fact—it's more subtle than you'd think!
Proposition: The order of growth of the worst-case number of compares used by shellsort with the \(3x+1\) increments is \(N^{3/2}\).
Property: The expected number of compares to shellsort a randomly-ordered array using \(3x+1\) increment is...
\(N\) | compares | \(2.5 N \ln N\) | \(0.25 N \ln^2 N\) | \(N^a\) |
---|---|---|---|---|
5k | 93k | 106k | 91k | 64k |
10k | 209k | 230k | 213k | 158k |
20k | 467k | 495k | 290k | 390k |
40k | 1022k | 1059k | 1122k | 960k |
80k | 2266k | 2258k | 2549k | 2366k |
where \(a = 1.3\)
Remark: Accurate model has not yet been discovered (!)
Example of simple idea leading to substantial performance gains
Useful in practice
Simple algorithm, nontrivial performance, interesting questions
Lesson: Some good algorithms are still waiting discovery
This section: Elementary sorting algorithms
Order of growth of running time to sort an array of \(N\) items
algorithm | best | average | worst |
---|---|---|---|
selection sort | \(N^2\) | \(N^2\) | \(N^2\) |
insertion sort | \(N\) | \(N^2\) | \(N^2\) |
Shellsort (\(3x+1\)) | \(N \log N\) | ? | \(N^\threehalves\) |
goal | \(N\) | \(N \log N\) | \(N \log N\) |
Next section: \(N \log N\) sorting algorithms (in worst case)
Goal: Rearrange array so that result is a uniformly random permutation (uniformly → all permutations are equally likely)
Proposition: Shuffle sort produces a uniformly random permutation, assuming real numbers uniformly at random (and no ties)
Microsoft antitrust probe by EU: Microsoft agreed to provide a randomized ballot screen for users to select browser in Windows 7.
However, IE8 appeared last 50% of the time!
Microsoft antitrust probe by EU: Microsoft agreed to provide a randomized ballot screen for users to select browser in Windows 7.
Solution? Implement shuffle sort by making comparator always return a random answer
// browser comparator (should implement a total order!) public int compareTo(Browser that) { double r = Math.random(); if(r < 0.5) return -1; if(r > 0.5) return +1; return 0; }
i
, pick integer r
between 0
and i
uniformly at randoma[i]
and a[r]
Proposition [Fisher-Yates 1938]: Knuth shuffling algorithm produces a uniformly random permutation of the input array in linear time, assuming integers uniformly at random.
i
, pick integer r
between 0
and i
uniformly at randoma[i]
and a[r]
Common bug: picking r
between 0
and N-1
.
Correct variant: between i
and N-1
public class StdRandom { /* ... */ public static void shuffle(Object[] a) { int N = a.length; for(int i = 0; i < N; i++) { int r = StdRandom.uniform(i+1); // between 0 and i exch(a, i, r); } } }
Q. What happens if integer is chosen between 0
and N-1
?
A. Not uniformly random!
Probability of each result when shuffling A B C
permutation | Knuth shuffle | broken shuffle |
---|---|---|
A B C |
\(1/6\) | \(4/27\) |
A C B |
\(1/6\) | \(5/27\) |
B A C |
\(1/6\) | \(5/27\) |
B C A |
\(1/6\) | \(5/27\) |
C A B |
\(1/6\) | \(4/27\) |
C B A |
\(1/6\) | \(4/27\) |
Texas hold'em poker: Software must shuffle electronic cards
// shuffling algorithm in FAQ at www.planetpoker.com for i := 1 to 52 do begin r := random(51) + 1; // between 1 and 51 swap := card[r]; card[r] := card[i]; card[i] := swap; end;
Bug 1: Random number r
never 52 ⇒ 52nd card cannot end up in 52nd place
Bug 2: Shuffle not uniform (should be between 1
and i
)
Bug 3: random()
uses 32-bit seed ⇒ \(2^{32}\) possible shuffles
Bug 4: Seed
= milliseconds since midnight ⇒ 86.4 million shuffles
“The generation of random numbers is too important to be left to chance.
”
—Robert R. Coveyou
Best practices for shuffling (if your business depends on it)
Bottom line: Shuffling a deck of cards is hard!