Review of Sorting

Review of Sorting

Perl's built-in sort operator sorts text strings in their natural text order, by default.[*] This is fine if we want to sort text strings:

[*] My friends call that the "ASCIIbetical" ordering. Normally, modern Perl doesn't use ASCII; instead, it uses a default sort order, depending on the current locale and character set. See the perllocale (not perllocal!) manpage.

my @sorted = sort qw(Gilligan Skipper Professor Ginger Mary_Ann);

but gets pretty messy when we want to sort numbers:

my @wrongly_sorted = sort 1, 2, 4, 8, 16, 32;

The resulting list is 1, 16, 2, 32, 4, 8. Why didn't sort order these properly? It treats each item as a string and sorts them in string order. Any string that begins with 3 sorts before any string that begins with 4.

If we don't want the default sorting order, we don't need to write an entire sorting algorithm, which is good news since Perl already has a good one of those. But no matter what sorting algorithm we use, at some point we have to look at item A and item B and decide which one comes first. That's the part we'll write: code to handle just two items. Perl will do the rest.

By default, as Perl orders the items, it uses a string comparison. We can specify a new comparison using a sort block that we place between the sort keyword and the list of things to sort.[[The sort block must return a coded value to indicate the sort order. If $a comes before $b in our desired sorting order, it should return -1; it should return +1 if $b comes before $a; if the order doesn't matter, it should return 0. The order might not matter, for example, if it's a case-insensitive sort comparing "FRED" to "Fred", or if it's a numeric sort comparing 42 to 42.[*]

[*] Actually, we can use any negative or positive number in place of -1 and +1, respectively. Recent Perl versions include a default sorting engine that is stable, so zero returns from the sort block cause the relative ordering of $a and $b to reflect their order in the original list. Older versions of Perl didn't guarantee such stability, and a future version might not use a stable sort, so don't rely on it.

For example, to sort those numbers in their proper order, we can use a sort block comparing $a and $b, like so:

my @numerically_sorted = sort {
  if ($a < $b)    { -1 }
  elsif ($a > $b) { +1 }
  else            {  0 }
} 1, 2, 4, 8, 16, 32;

Now we have a proper numeric comparison, so we have a proper numeric sort. Of course, this is far too much typing, so we can use the spaceship operator instead:

my @numerically_sorted = sort { $a <=> $b } 1, 2, 4, 8, 16, 32;

The spaceship operator returns -1, 0, and +1, according to the rules we laid out. A descending sort is simple in Perl:[[But there is more than one way to do it. The spaceship operator is nearsighted; and can't see which one of its parameters comes from $a and which from $b; it sees only which value is to its left and which is to its right. If we reverse the position of $a and $b, the spaceship will sort everything in the opposite order:

my @numerically_descending =
    sort { $b <=> $a } 1, 2, 4, 8, 16, 32;

In every place the previous sort expression returned -1, this expression returns +1, and vice versa. Thus, the sort is in the opposite order, and so it doesn't need a reverse. It's also easy to remember because if $a is to the left of $b, we get out the lower items first, just like a and b would be in the resulting list.

Which way is better? When should we use a reverse sort, and when should we switch $a and $b? Well, in most cases it shouldn't matter much for efficiency, so it's probably best to optimize for clarity and use reverse. For a more complex comparison, however, a single reverse may not be up to the task.

Like the spaceship operator, we can indicate a string sort with cmp, although this is rarely used alone because it is the default comparison. The cmp operator is most often used in more complex comparisons, as we'll show shortly.

 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows