Sorting

Ascending Order in the Court

By Matt Neuburg

Matt Neuburg is the author of REALbasic: The Definitive Guide, and a member of the editorial board of REALbasic Developer. This article was originally published in REALbasic Developer Issue 1.2 (Oct/Nov 2002).

Probably no algorithm has been the subject of more interest than sorting. Entire books have been written, and entire courses have been taught, on sorting alone. And it’s easy to see why. Rearrangement of data is a common need. For one thing, you might like to present ordered data to your user. Also, ordered data is much faster for your program to search, that is, to see whether a given item is present, and if so, where. (To find an item in unordered data requires examining every item; to find an item in ordered data requires merely going to the spot where it should be, and this can be very fast, as in the binary search example on p. 57 of my book.)

One reason sorting is interesting is that it so heavily taxes the computer’s resources. For example, with the need for sorting comes the need for speed. Intuitively one senses that sorting will take more time the more data we have; if this time increase is linear (twice as much data takes twice as long to sort), it becomes prohibitive to sort large data sets at all. There is also a space problem. Suppose there’s lots of data. It might all fit in the computer’s memory, but what if our sorting method requires making a copy, which won’t? Or it might not all fit in the computer’s memory to start with; what then?

Much ingenuity has been devoted to solving these problems, and their mathematics are an important aspect of computer science. Luckily, there are now enough sorting “recipes” to satisfy the needs of most beginners. The simplest sorting algorithm is the “bubble sort”. It’s also one of the worst, so you wouldn’t normally use it! Nevertheless, it’s worth starting with, just because it’s easy to understand and easy to code.

The idea of a bubble sort is that your data is divided into two clumps: the clump you’ve already sorted, followed by the clump you haven’t. You then take the first item of data from the unsorted clump and put it into place in the sorted clump by swapping it with the preceding item as many times as necessary. With each swap, the item moves upwards towards its proper place in the sorted clump; hence the “bubble” metaphor. Here’s how the procedure goes:

Start with the first item of data. It’s obviously in order relative to itself, so leave it alone. The first item is now sorted.

On to the second item of data. If it’s in order relative to the first item, leave it alone; if not, swap them. The first two items are now sorted.

On to the third item of data. If it’s in order relative to the second item, leave it alone; if not, swap them, and now it’s the second item of data. If it’s in order relative to the first item, leave it alone; if not, swap them. The first three items are now sorted.

On to the fourth item…

But it is now clear what the algorithm is: Run through all the items in their current order. For each, if it is in order with the item preceding it, do nothing; otherwise, swap it with the item preceding it, and keep doing that until it is in order with the item preceding it (or until it is first).

We’re ready to code! Assume what we want to sort are numbers in an array. Here we go:

Bubble sort for an array of integers:

Sub bubblesort(data() as integer)
   dim i,u,n as integer
   dim temp as integer
   u = ubound(data)
   for i = 0 to u
      n = i
      while n  > 0 and data(n) < data(n-1)
         temp = data(n-1)
         data(n-1) = data(n)
         data(n) = temp
         n = n-1
      wend
   next
End Sub

The code is obvious, and corresponds neatly to our verbal description of the algorithm. Arrays are passed to subroutines in REALbasic by reference, so our routine sorts the actual array handed to it, with no need to return a value. The variable n is introduced to keep track of the current item as we swap it upwards into place. The variable temp is introduced because swapping the value of two variables requires three variables. That fact surprises beginners; on paper, this “feels” more like swapping:

Not swapping:

         data(n-1) = data(n)
         data(n) = data(n-1)

But it isn’t, because after the first line the two variables have the same value, and the second line causes no change at all! The trouble is that the first line destroys the original value of the variable assigned to. The variable temp prevents this destruction.

To see the main weakness of Bubblesort, imagine starting with our data in reverse sorted order. Bubblesort will have to swap every item all the way to the top — for n items, that’s n+(n-1)+(n-2)… swaps, which is nearly n-squared swaps. The time required for Bubblesort to process worst-case data doesn’t increase linearly, it increases exponentially! That’s appalling. In next issue’s column we’ll present some better sorting algorithms.

For now, let’s change gears and talk about how to use REALbasic to apply this or any sorting algorithm to your data in a more general way.

At present, our Bubblesort method has a huge limitation — it can only sort an array of numbers. That’s pointless, since REALbasic can already do this. The time when you might need a sorting algorithm is to sort something else, such as an array of objects. REALbasic doesn’t know how to sort that. But neither does Bubblesort! The algorithm is sound (though slow), but the less-than operator in Bubblesort’s while-test is meaningless for objects. There is no general useful rule as to whether one object is “less than” another; it all depends what type of object we’re talking about. By the principles of object orientation, therefore, it shouldn’t be Bubblesort’s job to compare two objects; it should be the object’s job!

Suppose, then, that we have a class ComparableThing, having a method ShouldSwapWith, defined like this:

Definition for ComparableThing.ShouldSwapWith:

Function shouldSwapWith(o As comparableThing) As Boolean

The point of sending an object the ShouldSwapWith message is that the object should compare itself (somehow) with the object passed as parameter, and report whether the two objects are incorrectly ordered. We can now rewrite Bubblesort like this:

Bubble sort for an array of ComparableThings:

Sub bubblesort(data() as comparableThing)
   dim i,u,n as integer
   dim temp as variant // *
   u = ubound(data)
   for i = 0 to u
      n = i
      while n  > 0 and data(n).shouldSwapWith(data(n-1)) // *
         temp = data(n-1)
         data(n-1) = data(n)
         data(n) = temp
         n = n-1
      wend
   next
End Sub

Apart from the declaration, which now expects an array of ComparableThings, not integers, the only changes are the two starred lines. REALbasic’s Variant type can be assigned a value of any object, and can then be assigned back to a variable of that same object type, without our having to specify (or know) what that object type is. The while-test, instead of comparing two values directly, now asks one object whether it should be swapped with the other.

Now, you might say: This isn’t general at all! Previously, Bubblesort could accept only one type of data, an array of integers. Now, it still can accept only one type of data, an array of ComparableThings. But that’s not so, thanks to REALbasic’s class interface mechanism. If ComparableThing is a class interface, not a class, then any class can implement it. So Bubblesort can now sort an array of any type of object, without even knowing what type of object it is! The only thing Bubblesort needs to be sure of is that this is a class with a ShouldSwapWith method — and it can be sure of this, because that is exactly what it means for a class to implement the ComparableThing interface.

To illustrate, let’s create a class NumberPointer, having an integer property TheNumber. NumberPointer implements ComparableThing, and therefore has a ShouldSwapWith method, which goes like this:

NumberPointer.ShouldSwapWith:

Function shouldSwapWith(o As comparableThing) As Boolean
   return (numberPointer(o).theNumber  > self.theNumber)
End Function

It works! Bubblesort can now sort an array of NumberPointers.

Okay, let’s try another example. Let’s create a class SortableDate, a subclass of Date. SortableDate implements ComparableThing, and therefore has a ShouldSwapWith method, which goes like this:

SortableDate.ShouldSwapWith:

Function shouldSwapWith(o As comparableThing) As Boolean
   return (sortableDate(o).totalseconds  > self.totalSeconds)
End Function

It works! Bubblesort can now sort an array of SortableDates.

You can see that we are creating class after class, and Bubblesort can sort an array of any of them — without any alteration to Bubblesort. The only requirement is the class should implement the ComparableThing class interface, which is no great hardship. We have thus clearly achieved a far greater measure of generality.

How general can we get? One answer, suggested on p. 123 of my book, is that we might rewrite Bubblesort to escape from the restriction that the thing to be sorted is an array. In this scenario, we are handed a single object, a Sortable, where Sortable is a class interface. We will have no idea what the object really is — it might be an array of objects, it might be a ListBox subclass, it might be anything at all, provided it implements the Sortable class interface. The object must be able to think in terms of index numbers for items (even if it doesn’t internally maintain items in an array), and it must have two methods declared as follows:

Definition for ComparableThing.ShouldSwapWith:

Function shouldSwapWith(ix1 as integer, ix2 as integer) As Boolean

Definition for ComparableThing.Swap:

Subroutine swap(ix1 as integer, ix2 as integer)

Since Bubblesort doesn’t know how the Sortable object is implemented internally, the job of swapping two items, as well as that of comparing them, is left to the Sortable object. The job of creating an example is left as an exercise for the reader.