Elementary Sorts

COS 265 - Data Structures & Algorithms

Elementary Sorts

rules of the game

sorting problem

Ex: Student records in a university

Chen 3 A 991-878-4944 308 Blair
Rohde 2 A 232-343-5555 343 Forbes
Gazsi 4 B 766-093-9873 101 Brown
Furia 1 A 766-093-9873 101 Brown
Kanaga 3 B 898-122-9643 22 Brown
Andrews 3 A 664-480-0023 97 Little
Battle 4 C 874-088-1212 121 Whitman

Item: a row in the table

Furia 1 A 766-093-9873 101 Brown

Key: a specific entry of an item that may or may not be unique

Furia 1 A 766-093-9873 101 Brown

sorting problem

Sort: Rearrange array of \(N\) items into ascending order

Chen 3 A 991-878-4944 308 Blair
Rohde 2 A 232-343-5555 343 Forbes
Gazsi 4 B 766-093-9873 101 Brown
Furia 1 A 766-093-9873 101 Brown
Kanaga 3 B 898-122-9643 22 Brown
Andrews 3 A 664-480-0023 97 Little
Battle 4 C 874-088-1212 121 Whitman

sorting problem

Sort: Rearrange array of \(N\) items into ascending order

Andrews 3 A 664-480-0023 97 Little
Battle 4 C 874-088-1212 121 Whitman
Chen 3 A 991-878-4944 308 Blair
Furia 1 A 766-093-9873 101 Brown
Gazsi 4 B 766-093-9873 101 Brown
Kanaga 3 B 898-122-9643 22 Brown
Rohde 2 A 232-343-5555 343 Forbes

sorting applications

Library of Congress numbers
playing cards

sorting applications

FedEx packages
contacts

sorting applications

Hogwarts houses

sample sort client 1

Goal: Sort any type of data
Ex 1: Sort random real numbers in ascending order (seems artificial... stay tuned for an application)

public class Experiment {
    public static void main(String[] args) {
        int N = Integer.parseInt(args[0]);
        Double[] a = new Double[N];
        for(int i = 0; i < N; i++) a[i] = StdRandom.uniform();
        Insertion.sort(a);
        for(int i = 0; i < N; i++) StdOut.println(a[i]);
    }
}
$ java Experiment 8
0.08614716385210452
0.10708746304898642
0.21166190071646818
0.363292849257276
0.460954145685913
0.5340026311350087
0.7216129793703496
0.9293994908845686

sample sort client 2

Goal: Sort any type of data
Ex 2: Sort strings in alphabetical order

public class StringSorter {
    public static void main(String[] args) {
        String[] a = StdIn.readAllStrings();
        Insertion.sort(a);
        for(int i = 0; i < a.length; i++) StdOut.println(a[i]);
    }
}
$ more words3.txt
bed bug dad yet zoo [...] all bad yes

$ java StringSorter < words3.txt
all bad bed bug dad [...] yes yet zoo
[supressing newlines]

sample sort client 3

Goal: Sort any type of data
Ex 3: Sort the files in a given directory by filename

import java.io.File;

public class FileSorter {
    public static void main(String[] args) {
        File directory = new File(args[0]);
        File[] files = directory.listFiles();
        Insertion.sort(files);
        for(int i = 0; i < files.length; i++)
            StdOut.println(files[i].getName());
    }
}
$ java FileSorter .
FileSorter.class
FileSorter.java
Insertion.class
Insertion.java
Selection.class
Selection.java

total order

Goal: Sort any type of data (for which sorting is well defined)

A total order is a binary relation \(\leq\) that satisfies

Ex:

No transitivity: Rock-Paper-Scissors

No totality: CSE course prerequisites

total order

Rock-Paper-Scissors violates transitivity (if both \(v \leq w\) and \(w \leq x\), then \(v \leq x\))

scissors \(\leq\) rock, and rock \(\leq\) paper, but scissors \(\cancel{\leq}\) paper

total order

CSE course prerequisites violate totality (either \(v \leq w\) or \(w \leq v\) or both)

cannot compare cos382 and cos424!

callbacks

Goal: Sort any type of data (for which sorting is well defined)

Q: How can sort() know how to compare data of type Double, String, and java.io.File without any information about the type of an item's key?

Callback = Reference to executable code

callbacks

Implementing callbacks

callbacks: roadmap

// client code
public class StringSorter {
    public static void main(String[] args) {
        String[] a = StdIn.readAllStrings();
        Insertion.sort(a);  // <- defined below
        for(int i = 0; i < a.length; i++) StdOut.println(a[i]);
    }
}


// Comparable interface (built in to Java)
public interface Comparable<Item> {
    public int compareTo(Item that);
}


// data-type implementation
public class String implements Comparable<String> {
    /* ... */

    public int compareTo(String b) {
        /* ... */
        return -1;
        /* ... */
        return +1;
        /* ... */
        return 0;
    }
}


// sort implementation
public class Insertion {
    public static void sort(Comparable[] a) {
        int N = a.length;
        for(int i = 0; i < N; i++)
            for(int j = i; j > 0; j--)
                if(a[j].compareTo(a[j-1]) < 0) exch(a, j, j-1);
                //  ^^^^^^^^^^^^ key point: no dependence on
                //                          String data type!
                else break;
    }
}

comparable api

Implement compareTo() so that v.compareTo(w)

if(v  < w) return -1;
if(v == w) return  0;
if(v  > w) return +1;

Built-in comparable types: Integer, Double, String, Date, File, ...

User-defined comparable types: implement the Comparable interface

implementing the Comparable interface

Date data type (simplified version of java.util.Date)

public class Date implements Comparable<Date> {
    //                                  ^^^^^
    // only compares dates to other dates

    private final int month, day, year;

    public Date(int m, int d, int y) {
        month = m;
        day = d;
        year = y;
    }

    public int compareTo(Date that) {
        if(this.year  < that.year ) return -1;
        if(this.year  > that.year ) return +1;
        if(this.month < that.month) return -1;
        if(this.month > that.month) return +1;
        if(this.day   < that.day  ) return -1;
        if(this.day   > that.day  ) return +1;
        return 0;
    }
}

Elementary Sorts

selection sort

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort demo

selection sort

Algorithm: ↑ scans from left to right

Invariants

X X X
X X X X X
X X X X X X X
X X X X X X X X X X
X X X X X X X X X X X
X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X
1 1 2 2 2 3 4 4 7 6 6 7 5 5 4 7

two useful sorting abstractions

Helper functions: Refer to data through compares and exchanges


Less: is item v less than w?

private static boolean less(Comparable v, Comparable w) {
    return v.compareTo(w) < 0;
}

Exchange: swap item in array a[] at index i with one at index j

private static void exch(Comparable[] a, int i, int j) {
    Comparable swap = a[i];
    a[i] = a[j];
    a[j] = swap;
}

selection sort inner loop

To maintain algorithm invariants:

  1. Move ↑ to the right
  2. Identify index of minimum entry on right
  3. Exchange into position
// 1
i++;

// 2
int min = i;
for(int j = i+1; i < N; j++)
    if(less(a[j], a[min])) min = j;

// 3
exch(a, i, min);

selection sort: java implementation

public class Selection {
    public static void sort(Comparable[] a) {
        int N = a.length;
        for(int i = 0; i < N; i++) {
            int min = i;
            for(int j = i+1; j < N; j++)
                if(less(a[j], a[min])) min = j;
            exch(a, i, min);
        }
    }

    private static boolean less(Comparable v, Comparable w)
    { /* as before */ }

    private static void exch(Comparable[] a, int i, int j)
    { /* as before */ }
}

selection sort: animations

random
nearly sorted
reversed
few unique

Black values are sorted

Gray values are unsorted

Red triangle marks algorithm position

selection sort: mathematical analysis

Proposition: Selection sort uses \((N-1)+(N-2)+\ldots+1+0 \texttilde N^2/2\) compares and \(N\) exchanges.

Running time insensitive to input: Quadratic time, even if input is sorted

Data movement is minimal: Linear number of exchanges

Elementary Sorts

insertion sort

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort demo

insertion sort

Algorithm: ↑ scans from left to right

Invariants

X X
X X X X
X X X X X X
X X X X X X X X X
X X X X X X X X X X
X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X
1 1 2 4 5 7 3 6 1 4 6 7 2 5 4 2

insertion sort inner loop

To maintain algorithm invariants:

  1. Move the pointer to the right
  2. Moving from right to left, exchange a[i] with each larger entry to its left
// 1
i++;

// 2
for(int j = i; j > 0; j--)
    if(less(a[j], a[j-1])) exch(a, j, j-1);
    else break;

insertion sort: java implementation

public class Insertion {
    public static void sort(Comparable[] a) {
        int N = a.length;
        for(int i = 0; i < N; i++)
            for(int j = i; j > 0; j--)
                if(less(a[j], a[j-1])) exch(a, j, j-1);
                else break;
    }

    private static boolean less(Comparable v, Comparable w)
    { /* as before */ }

    private static void exch(Comparable[] a, int i, int j)
    { /* as before */ }
}

insertion sort: animations

random
nearly sorted
reversed
few unique

Black values are sorted

Gray values are unsorted

Red triangle marks algorithm position

insertion sort: mathematical analysis

Proposition: To sort a randomly-ordered array with distinct keys, insertion sort uses \(\texttilde \frac{1}{4}N^2\) compares and \(\texttilde \frac{1}{4} N^2\) exchanges on average.

Pf: Expect each entry to move halfway back.

insertion sort: analysis

Best case: If the array is in ascending order, insertion sort makes \(N-1\) compares and \(0\) exchanges.

A E E L M O P R S T X


Worst case: if the array is in descending order (and no duplicates), insertion sort makes \(\texttilde \frac{1}{2} N^2\) compares and \(\texttilde \frac{1}{2} N^2\) exchanges.

X T S R P O M L F E A

insertion sort: partially-sorted arrays

Def: An inversion is a pair of keys that are out of order

A E E L M O T R X P S

Above has 6 inversions: T-R, T-P, T-S, R-P, X-P, X-S

Def: An array is partially sorted if the number of inversions is \(\leq cN\)

Proposition: For partially-sorted arrays, insertion sort runs in linear time

Pf: Number of exchanges equals the number of inversions (num of compares = exchanges + \((N-1)\))

insertion sort: practical improvements

Half exchanges: Shift items over (instead of exchanging)

                       \ /
A C H H I M N N P Q X Y K B I N A R Y
          > > > > > > > ^
A C H H I K M N N P Q X Y B I N A R Y


Binary insertion sort: Use binary search to find insertion point

A C H H I M N N P Q X Y K B I N A R Y
|  binary search for  |
     first key > K

Elementary Sorts

shellsort

shellsort overview

Idea: Move entries more than one position at a time by \(h\)-sorting the array

An \(h\)-sorted array is \(h\) interleaved sorted subsequences

h = 4
full: L E E A M H L E P S O L T S X R
grp0: L ----- M ----- P ----- T
grp1:   E ----- H ----- S ----- S
grp2:    E ----- L ----- O ----- X
grp3:      A ----- E ----- L ----- R

Shellsort [Shell 1959]: \(h\)-sort array for decreasing sequence of values of \(h\)

  input: S H E L L S O R T E X A M P L E
13-sort: P H E L L S O R T E X A M S L E
 4-sort: L E E A M H L E P S O L T S X R
 1-sort: A E E E H L L L M O P R S S T X

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorting demo

\(h\)-sorted, where \(h=3\)

\(h\)-sorting

How to \(h\)-sort an array? Insertion sort, with stride length \(h\)

Why insertion sort?

Shellsort example: increments 7, 3, 1

input:  S O R T E X A M P L E

7-sort: S O R T E X A M P L E
        M             S
          .             P
            L             R
              E             T
        M O L E E X A S P R T

3-sort: M O L E E X A S P R T
        E     M
          E     O
            .     X
        A     E     M
          .     .     S
            .     P     X
        .     .     .     R
          .     .     .     T
        A E L E O P M S X R T

1-sort: A E L E O P M S X R T
          E
            L
            E L
                O
                  P
                M O P
                      S
                        X
                      R S X
                          T X
        A E E L M O P R S T X

sorted: A E E L M O P R S T X

shellsort: java implementation

public class Shell {
    public static void sort(Comparable[] a) {
        int N = a.length;

        int h = 1;
        while(h < N/3) h = 3*h + 1; // 1, 4, 13, 40, 121, 364, ...

        while(h >= 1) {
            for(int i = h; i < N; i++) {
                for(int j = i; j >= h && less(a[j], a[j-h]); j -= h)
                    exch(a, j, j-h);
            }

            h = h / 3;
        }
    }

    private static boolean less(Comparable v, Comparable w)
    { /* as before */ }

    private static void exch(Comparable[] a, int i, int j)
    { /* as before */ }
}

shellsort: animations

random
nearly sorted
reversed
few unique

Black values are sorted

Gray values are unsorted

Dark gray values show current sub-array that is being sorted

Red triangle marks algorithm position

shellsort: which increment sequence to use?

Powers of two: 1, 2, 4, 8, 16, 32, ...
No

Powers of two minus one: 1, 3, 7, 15, 31, 63, ...
Maybe

\(3x + 1\): 1, 4, 13, 40, 121, 364, ...
OK; Easy to compute

Sedgewick: 1, 5, 19, 41, 109, 209, 505, 929, 2161, 3905, ...
(merging of \((9*4^i) - (9*2^i) + 1)\) and \(4^i - (3*2^i)+1\))
Good; tough to beat in empirical studies

shellsort: intuition

Proposition: An \(h\)-sorted array remains \(h\)-sorted after \(g\)-sorting it.

7-sort: S O R T E X A M P L E   /-> 3-sort: M O L E E X A S P R T
        M             S         |           E     M
          .             P       |             E     O
            L             R     |               .     X
              E             T   |           A     E     M
        M O L E E X A S P R T ->/             .     .     S
            ^             ^                     .     P     X
            |             |                 .     .     .     R
                7-sorted                      .     .     .     T
                                            A E L E O P M S X R T
                                                ^             ^
                                                |             |
                                                still 7-sorted!

Challenge: Prove this fact—it's more subtle than you'd think!

shellsort: analysis

Proposition: The order of growth of the worst-case number of compares used by shellsort with the \(3x+1\) increments is \(N^{3/2}\).

Property: The expected number of compares to shellsort a randomly-ordered array using \(3x+1\) increment is...

\(N\) compares \(2.5 N \ln N\) \(0.25 N \ln^2 N\) \(N^a\)
5k 93k 106k 91k 64k
10k 209k 230k 213k 158k
20k 467k 495k 290k 390k
40k 1022k 1059k 1122k 960k
80k 2266k 2258k 2549k 2366k

where \(a = 1.3\)

Remark: Accurate model has not yet been discovered (!)

why are we interested in shellsort?

Example of simple idea leading to substantial performance gains

Useful in practice

why are we interested in shellsort?

Simple algorithm, nontrivial performance, interesting questions

Lesson: Some good algorithms are still waiting discovery

Elementary sorts summary

This section: Elementary sorting algorithms

Order of growth of running time to sort an array of \(N\) items

algorithm best average worst
selection sort \(N^2\) \(N^2\) \(N^2\)
insertion sort \(N\) \(N^2\) \(N^2\)
Shellsort (\(3x+1\)) \(N \log N\) ? \(N^\threehalves\)
goal \(N\) \(N \log N\) \(N \log N\)

Next section: \(N \log N\) sorting algorithms (in worst case)

Elementary Sorts

shuffling

how to shuffle an array

Goal: Rearrange array so that result is a uniformly random permutation (uniformly → all permutations are equally likely)

shuffle sort

shuffle sort

Proposition: Shuffle sort produces a uniformly random permutation, assuming real numbers uniformly at random (and no ties)

war stary (microsoft)

Microsoft antitrust probe by EU: Microsoft agreed to provide a randomized ballot screen for users to select browser in Windows 7.

Select your web browser(s)

However, IE8 appeared last 50% of the time!

war stary (microsoft)

Microsoft antitrust probe by EU: Microsoft agreed to provide a randomized ballot screen for users to select browser in Windows 7.

Solution? Implement shuffle sort by making comparator always return a random answer

// browser comparator (should implement a total order!)
public int compareTo(Browser that) {
    double r = Math.random();
    if(r < 0.5) return -1;
    if(r > 0.5) return +1;
    return 0;
}

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

knuth shuffle demo

Proposition [Fisher-Yates 1938]: Knuth shuffling algorithm produces a uniformly random permutation of the input array in linear time, assuming integers uniformly at random.

knuth shuffle

Common bug: picking r between 0 and N-1.
Correct variant: between i and N-1

public class StdRandom {
    /* ... */
    public static void shuffle(Object[] a) {
        int N = a.length;
        for(int i = 0; i < N; i++) {
            int r = StdRandom.uniform(i+1); // between 0 and i
            exch(a, i, r);
        }
    }
}

Broken knuth shuffle

Q. What happens if integer is chosen between 0 and N-1?
A. Not uniformly random!

Probability of each result when shuffling A B C

permutation Knuth shuffle broken shuffle
A B C \(1/6\) \(4/27\)
A C B \(1/6\) \(5/27\)
B A C \(1/6\) \(5/27\)
B C A \(1/6\) \(5/27\)
C A B \(1/6\) \(4/27\)
C B A \(1/6\) \(4/27\)

War story (online poker)

Texas hold'em poker: Software must shuffle electronic cards

War story (online poker)

// shuffling algorithm in FAQ at www.planetpoker.com
for i := 1 to 52 do begin
    r := random(51) + 1; // between 1 and 51
    swap := card[r];
    card[r] := card[i];
    card[i] := swap;
end;

Bug 1: Random number r never 52 ⇒ 52nd card cannot end up in 52nd place

Bug 2: Shuffle not uniform (should be between 1 and i)

Bug 3: random() uses 32-bit seed ⇒ \(2^{32}\) possible shuffles

Bug 4: Seed = milliseconds since midnight ⇒ 86.4 million shuffles

War story (online poker)

The generation of random numbers is too important to be left to chance.
—Robert R. Coveyou

War story (online poker)

Best practices for shuffling (if your business depends on it)

Bottom line: Shuffling a deck of cards is hard!

loading...