Saturday, 15 December 2012

Sorting an immutable list (in Scala)

[2013-01-07 Update: Results, Discussion and Conclusion updated in light of new benchmarks obtained using a different set-up.]

This post presents a method that does not resort to using arrays, as the built-in methods do, but instead uses scala.collection.immutable.List itself.

Motivation

Finding myself writing Java code making heavy use of arrays, one minute, and then Scala code in which everything should be immutable, the next, I decided to try to rewrite some of the former in the manner of the latter.

The code in question examines a set of 2-d points, and finds all those points which lie on hidden straight lines (with at least four points per line); the object of the exercise was to demonstrate a clever algorithm which employs sorting to reduce the time taken, so that it becomes proportional to N*log(N) rather than N*N (for N points).

Having eventually succeeded in rewriting this in Scala, using immutable lists instead of arrays, I began to wonder how the sort method I was blithely calling was actually implemented.

The answer is that the list is first copied to an array, which is then sorted (using one of Java's built-in sort methods), before being copied to a new immutable list.

Well, well!

No wonder the immutable list version was taking longer than the array one, with all that copying going on.

Why not do the sorting by creating new immutable lists directly?

So I set out to find out.

All been done before?

Searching the web, the best algorithm I could find was one for sorting mutable linked lists, with a sample implementation written in C. Therefore, using this one as a starting point, I began to devise one of my own, using Scala.

This could easily have been one of the FPPinS assignments, but without the skeleton solution!

After a great deal of perseverance, writing out of lists using pencil and paper, and an exercise in divining recursive functions from sets of similar ones, the following algorithm and implementation was arrived at.

Algorithm for sorting an immutable list

Sorting is done by first recognising that the number of elements in the list may be reduced to a sum of factors which are decreasing powers of 2; e.g. 15 = 8 + 4 + 2 + 1. The list to be sorted is
then considered to be the concatenation of subsequences of these lengths, and is sorted in two
stages:
  1. each subsequence, starting with the longest, is sorted and prepended to a list;
  2. each element of the tail of that list is then merged with the head.
Each subsequence in stage 1. is sorted by repeatedly considering it to be the concatenation of pairs of sorted subsequences, initially of length, 1, which are then merged.

In this way, it was found possible to use only constant-time list-operations (head, tail, ::) throughout.

Cases

Each time a pair of subsequences is merged, heads must be prepended to the result first, so that the order of the result is the reverse of those of the subsequences being merged. Therefore, it is important to keep track of the order of each subsequence, and ensure that the order of the result of the very first merge is chosen so that very last one will be from low to high. This was achieved by recognising that there are three cases which can arise, based on the number of stage 1. subsequences, each one with its own sequence of orders (where, in the following, '<' indicates low-to-high, '>', high-to-low):
  1. Single subsequence: <
  2. Even number of subsequences: >(<>...<>)>
  3. Odd number of subsequences (> 1): >(<>...<>)<<
Note that the the number of subsequences may be derived from the number of elements using Java's Integer.bitCount method.

The cases can be neatly implemented using case-classes, of a trait supplying a method that returns the order as a function of both the position of the subsequence and the previous order.

Algorithm analysis

The number (strictly, order of growth) of comparisons of elements (for a list of length, N) appears to be optimal for sorting:
  • stage 1. N*log(N)
  • stage 2. N

Stability

If possible, the order of two elements considered equal by the sort criterion should be preserved (in case it might be significant), in which case the sorting algorithm is termed 'stable'. Here, stability was achieved by being more careful when selecting the particular inequality to test for when comparing elements.

Results

The implementation, complete with tests of performance as well as for correctness, is to be found in a GitHub project called list-sort.

Performance

Performance testing was carried out using the rather useful ScalaMeter framework (which I learned of here).

As shown in the figure, the answer to the above question soon became apparent - it takes about 7 times longer to sort an immutable list of 30,000 integers, compared with using the built-in sorted method (which first copies the elements to an array, and sorts that), falling to about 4 times longer for 150,000 integers.
[2013-01-07 The above results were obtained on a dual-core MacBook Pro, using--as pointed out by Eugene Platonov (see comments)--input arrays/lists which were already sorted. So, new results have been obtained, this time using input arrays/lists in reverse-sorted order (thereby representing the worst case), and on two different set-ups (thanks to results submitted by Eugene in the comments).

Note that the 'AMD PC' is a quad-core set-up, running Java 6, versus Java 7 on the MacBook Pro (see comments). 2013-01-07]

Lines of code

Given that it is possible to sort a list in just a few lines of code if the restriction to constant-time list operations is removed, at about 100 lines of code excluding doc-comments, this particular algorithm and implementation are relatively complex and lengthy.

Discussion

One observation which goes some way towards accounting for the relatively poor performance is
that the recursive function to sort the first (largest) power-of-2 elements is not tail-recursive.


Note that this should not however (in itself) result in excessive use of memory, since only log2(len) frames will accumulate (which is only 17 in the largest test case).

[2013-01-07 The new results show a significant difference in performance characteristics between the two set-ups.

With Java 7 on a dual-core (Intel) Mac, the built-in sorting methods perform much better than the current algorithm, and this does not appear to be affected by whether or not the input is already sorted.

With Java 6 on a PC with a quad-core AMD CPU, the current algorithm's performance is on par with that of the built-in method for sorting an array, and exceeds that of the build-in method for sorting a list. 2013-01-07]

Conclusion

A stable algorithm for sorting an immutable list, using only constant-time list-operations, was devised and implemented in Scala.

One of the reasons for going to this trouble was to try to find out if it might be more efficient to sort such a list using functional programming, rather than by first copying it to an array. The above performance testing suggests that it definitely is not, and would appear to be in keeping with advice from Martin Odersky (skip to 1:07:30) about when to 'drop down' to using imperative arrays:
  • where functional programming really matters is [programming] 'in the large'
  • okay provided access is controlled and interface is purely functional.
[2013-01-07 The contributed results suggest that, on the contrary, using functional programming to sort a list can indeed be more efficient than using imperative arrays, depending on the particular set-up.

They also suggest that the built-in sorting methods of Java 7 and available on (Intel) Macs are highly optimised in comparison to those of Java 6 and available on AMD hardware. 2013-01-07]
Comments welcome.

11 comments:

  1. Tysvm for taking the time to explore this space sp explicitly and then to document the details.

    ReplyDelete
    Replies
    1. Glad to hear it's appreciated. Writing things up also actually helps!

      Delete
  2. Scalameter generators always produce same data and in your case they generate already sorted arrays and lists. SeqLike's sorted method uses java.util.Arrays.sort internally which takes advantage of already sorted arrays.

    So I fist run your benchmarks on my laptop as is (sbt test)

    [info] ::Benchmark list.sorted::
    [info] cores: 4
    [info] hostname: laptop
    [info] jvm-name: Java HotSpot(TM) 64-Bit Server VM
    [info] jvm-vendor: Sun Microsystems Inc.
    [info] jvm-version: 20.0-b11
    [info] os-arch: amd64
    [info] os-name: Linux
    [info] Parameters(size -> 30000): 1.74525
    [info] Parameters(size -> 60000): 4.127519
    [info] Parameters(size -> 90000): 6.065323
    [info] Parameters(size -> 120000): 8.383576
    [info] Parameters(size -> 150000): 10.409659
    [info]
    [info] ::Benchmark array.sorted::
    [info] cores: 4
    [info] hostname: laptop
    [info] jvm-name: Java HotSpot(TM) 64-Bit Server VM
    [info] jvm-vendor: Sun Microsystems Inc.
    [info] jvm-version: 20.0-b11
    [info] os-arch: amd64
    [info] os-name: Linux
    [info] Parameters(size -> 30000): 2.350784
    [info] Parameters(size -> 60000): 4.684734
    [info] Parameters(size -> 90000): 7.369618
    [info] Parameters(size -> 120000): 10.222242
    [info] Parameters(size -> 150000): 12.574138
    [info]
    [info] ::Benchmark sort(list)::
    [info] cores: 4
    [info] hostname: laptop
    [info] jvm-name: Java HotSpot(TM) 64-Bit Server VM
    [info] jvm-vendor: Sun Microsystems Inc.
    [info] jvm-version: 20.0-b11
    [info] os-arch: amd64
    [info] os-name: Linux
    [info] Parameters(size -> 30000): 4.02182
    [info] Parameters(size -> 60000): 8.782457
    [info] Parameters(size -> 90000): 14.214572
    [info] Parameters(size -> 120000): 18.521795
    [info] Parameters(size -> 150000): 24.256433
    [info]
    [info] ::Benchmark sort(list, len)::
    [info] cores: 4
    [info] hostname: laptop
    [info] jvm-name: Java HotSpot(TM) 64-Bit Server VM
    [info] jvm-vendor: Sun Microsystems Inc.
    [info] jvm-version: 20.0-b11
    [info] os-arch: amd64
    [info] os-name: Linux
    [info] Parameters(size -> 30000): 3.644443
    [info] Parameters(size -> 60000): 7.533142
    [info] Parameters(size -> 90000): 11.753961
    [info] Parameters(size -> 120000): 16.136787
    [info] Parameters(size -> 150000): 20.715183

    ReplyDelete
    Replies
    1. Hm.. your results are quite a bit different, aren't they: the algorithm compares much more favourably on your machine!

      Btw, my results were obtained on a dual-core MacBook Pro (which I should have mentioned).

      Delete
  3. then I shuffled generated using scala.util.Random.shuffle

    [info] ::Benchmark list.sorted::
    [info] cores: 4
    [info] hostname: laptop
    [info] jvm-name: Java HotSpot(TM) 64-Bit Server VM
    [info] jvm-vendor: Sun Microsystems Inc.
    [info] jvm-version: 20.0-b11
    [info] os-arch: amd64
    [info] os-name: Linux
    [info] Parameters(size -> 30000): 9.098993
    [info] Parameters(size -> 60000): 21.171567
    [info] Parameters(size -> 90000): 37.549006
    [info] Parameters(size -> 120000): 52.337578
    [info] Parameters(size -> 150000): 69.088432
    [info]
    [info] ::Benchmark array.sorted::
    [info] cores: 4
    [info] hostname: laptop
    [info] jvm-name: Java HotSpot(TM) 64-Bit Server VM
    [info] jvm-vendor: Sun Microsystems Inc.
    [info] jvm-version: 20.0-b11
    [info] os-arch: amd64
    [info] os-name: Linux
    [info] Parameters(size -> 30000): 9.730438
    [info] Parameters(size -> 60000): 20.343916
    [info] Parameters(size -> 90000): 34.258645
    [info] Parameters(size -> 120000): 50.509151
    [info] Parameters(size -> 150000): 65.711689
    [info]
    [info] ::Benchmark sort(list)::
    [info] cores: 4
    [info] hostname: laptop
    [info] jvm-name: Java HotSpot(TM) 64-Bit Server VM
    [info] jvm-vendor: Sun Microsystems Inc.
    [info] jvm-version: 20.0-b11
    [info] os-arch: amd64
    [info] os-name: Linux
    [info] Parameters(size -> 30000): 9.083168
    [info] Parameters(size -> 60000): 23.138869
    [info] Parameters(size -> 90000): 39.177328
    [info] Parameters(size -> 120000): 52.281516
    [info] Parameters(size -> 150000): 75.24673
    [info]
    [info] ::Benchmark sort(list, len)::
    [info] cores: 4
    [info] hostname: laptop
    [info] jvm-name: Java HotSpot(TM) 64-Bit Server VM
    [info] jvm-vendor: Sun Microsystems Inc.
    [info] jvm-version: 20.0-b11
    [info] os-arch: amd64
    [info] os-name: Linux
    [info] Parameters(size -> 30000): 6.814306
    [info] Parameters(size -> 60000): 17.37955
    [info] Parameters(size -> 90000): 29.994623
    [info] Parameters(size -> 120000): 40.230976
    [info] Parameters(size -> 150000): 57.358517

    ReplyDelete
    Replies
    1. And now the algorithm wins!

      Okay. But what if we just generate the array/list in reverse order (instead of randomising it) - surely the algorithm should win here as well?

      Well, I've just tried this (and pushed the changes to GitHub), but unfortunately I'm getting a similar result to the one I got before...

      Delete
  4. That's what I get with reversed lists and arrays

    [info] ::Benchmark list.sorted::
    [info] cores: 4
    [info] hostname: laptop
    [info] jvm-name: Java HotSpot(TM) 64-Bit Server VM
    [info] jvm-vendor: Sun Microsystems Inc.
    [info] jvm-version: 20.0-b11
    [info] os-arch: amd64
    [info] os-name: Linux
    [info] Parameters(size -> 30000): 5.089015
    [info] Parameters(size -> 60000): 11.629481
    [info] Parameters(size -> 90000): 18.653214
    [info] Parameters(size -> 120000): 24.230846
    [info] Parameters(size -> 150000): 30.833749
    [info]
    [info] ::Benchmark array.sorted::
    [info] cores: 4
    [info] hostname: laptop
    [info] jvm-name: Java HotSpot(TM) 64-Bit Server VM
    [info] jvm-vendor: Sun Microsystems Inc.
    [info] jvm-version: 20.0-b11
    [info] os-arch: amd64
    [info] os-name: Linux
    [info] Parameters(size -> 30000): 5.915962
    [info] Parameters(size -> 60000): 7.962077
    [info] Parameters(size -> 90000): 19.421093
    [info] Parameters(size -> 120000): 16.569323
    [info] Parameters(size -> 150000): 21.729165
    [info]
    [info] ::Benchmark sort(list)::
    [info] cores: 4
    [info] hostname: laptop
    [info] jvm-name: Java HotSpot(TM) 64-Bit Server VM
    [info] jvm-vendor: Sun Microsystems Inc.
    [info] jvm-version: 20.0-b11
    [info] os-arch: amd64
    [info] os-name: Linux
    [info] Parameters(size -> 30000): 4.38155
    [info] Parameters(size -> 60000): 8.693128
    [info] Parameters(size -> 90000): 14.379568
    [info] Parameters(size -> 120000): 19.524849
    [info] Parameters(size -> 150000): 25.148961
    [info]
    [info] ::Benchmark sort(list, len)::
    [info] cores: 4
    [info] hostname: laptop
    [info] jvm-name: Java HotSpot(TM) 64-Bit Server VM
    [info] jvm-vendor: Sun Microsystems Inc.
    [info] jvm-version: 20.0-b11
    [info] os-arch: amd64
    [info] os-name: Linux
    [info] Parameters(size -> 30000): 4.162253
    [info] Parameters(size -> 60000): 8.373021
    [info] Parameters(size -> 90000): 12.985315
    [info] Parameters(size -> 120000): 18.704217
    [info] Parameters(size -> 150000): 22.457074

    scalameter from master on scala 2.10.0 and java 6

    ReplyDelete
    Replies
    1. Very nice results!

      Compared with your first set (for a pre-sorted array/list), the times for array.sorted and list.sorted have indeed increased, so much so that the algorithm is actually on-par with array.sorted(!) and, more crucially, is faster than list.sorted, presumably on account of all that copying that list.sorted has to do!

      Here are my my results, obtained using the current master branch of the list-sort project on GitHub (Scala 2.10.0-RC2, ScalaMeter 0.2), and the latest Java 7 Update 10 for Mac, which, as I said, are similar to those I obtained before (for the pre-sorted array/list):

      [info] cores: 2
      [info] jvm-name: Java HotSpot(TM) 64-Bit Server VM
      [info] jvm-vendor: Oracle Corporation
      [info] jvm-version: 23.6-b04
      [info] os-arch: x86_64
      [info] os-name: Mac OS X

      [info] ::Benchmark array.sorted::
      [info] Parameters(size -> 30000): 1.449
      [info] Parameters(size -> 60000): 2.794
      [info] Parameters(size -> 90000): 4.313
      [info] Parameters(size -> 120000): 5.438
      [info] Parameters(size -> 150000): 7.186

      [info] ::Benchmark list.sorted::
      [info] Parameters(size -> 30000): 1.611
      [info] Parameters(size -> 60000): 3.402
      [info] Parameters(size -> 90000): 5.89
      [info] Parameters(size -> 120000): 7.347
      [info] Parameters(size -> 150000): 9.776

      [info] ::Benchmark sort(list)::
      [info] Parameters(size -> 30000): 7.852
      [info] Parameters(size -> 60000): 18.907
      [info] Parameters(size -> 90000): 32.389
      [info] Parameters(size -> 120000): 37.33
      [info] Parameters(size -> 150000): 49.966

      [info] ::Benchmark sort(list, len)::
      [info] Parameters(size -> 30000): 7.807
      [info] Parameters(size -> 60000): 17.19
      [info] Parameters(size -> 90000): 28.615
      [info] Parameters(size -> 120000): 39.292
      [info] Parameters(size -> 150000): 51.908

      Delete
    2. Entry just updated in light of these new results - thanks a lot, Eugene.

      Delete
  5. Pleasant site. Thankq for the presentation and helpful subtle elements for amateurs and freshers as well... For all the more inside and out points of interest on scala Training visit our siteRead more... . Free demo accessible

    Check this site mindmajix for indepth scala blogs.
    Go here if you’re looking for information on scala training.

    ReplyDelete