March 27, 2012

Second attempt to shoot myself in the foot; Scala


In this blogpost I will use Scala to solve the exact same problem I described in my previous blogpost:
write a program that given G produces sequences of numbers x, x+1, x+2..., x+n such that G = sum(x, x+1...x+n).

Scala currently is at the center of attention and rightly so. Scala allows for massively parallel and fault-tolerant computing. It supports both object-orientation and functional programming and produces code for Java's Virtual Machine, a highly optimized and robust runtime. No surprise more and more companies adopt Scala.


Looking at Scala one thing becomes obvious. Martin Odersky and his team are clearly heading in the right direction. Previous attempts at tools for massively parallel computing —Occam and Erlang among others— never became generally adopted. Chances are that Scala will become a mainstream language, as it is already adopted for production by companies like Twitter, LinkedIn, UBS, CSC and others.

So here's my attempt using Scala:



import scala.annotation.tailrec
object FindRangeWithSum { 
  def find(goal:Int):List[(Int,Int)] = { 

    @tailrec    def scan(tail:Int, head:Int, sum:Int, goal:Int, 
             accu: List[(Int, Int)]):List[(Int, Int)] = { 
      if (tail + head > goal) 
        accu 
      else if (sum < goal) 
        scan(tail, head+1, sum+head+1, goal, accu) 
      else if (sum > goal) 
        scan(tail+1, head, sum-tail, goal, accu) 
      else /*(sum == goal)*/ 
        scan(tail+1, head, sum-tail, goal, (tail, head):: accu) 
    } 
    scan(1, 2, 3, goal, List()) 
  } 
}

import Profiling._
object RangeFinder extends App { 
  val data = timed(printTime("Execution took: ")) {
    val list = FindRangeWithSum.find(10000)
    println(list)
  }
}

scala.annotation.tailrec provides an annotation that checks wether a method allows for tailrec elimination. A method annotated with @tailrec is checked to be tail recursive. If not, the compiler issues a warning. Compared to Erlang that's a better. While this problem is simple there are a lot of situations where tail-recursion is a lot less obvious.

By the way, if you're interested in experimenting with this code you need to pickup the profiling code written by Matthias Mann.

Now to the performance of this code. This will run within the JVM which uses JIT (Just In Time compilation). JIT is known to have a startup delay. I've been tinkering around firing the code with different values for goal. For most values for goal the Erlang example outperformed the Scala implementation by a wide margin where the Scala code never came in the range of micro seconds. Only for goal = 1.000.000.000 the Scala code outperformed the Erlang implementation.

To do justice to the JVM and the effort a lot of bright minds put into JIT, I modified the code:

object RangeFinder extends App {
  for (runs <- 0 to 1) {
    for (x <- 1 to 9) {
      val data = timed(printTime(" Execution took: ")) {
        val list = FindRangeWithSum.find(math.pow(10, x).toInt)
        print("10^" + x)
      }
    }
  }
}

This lead to remarkable results that definitely confirm the startup delay of JIT.  When warming up JIT by repeating the code, Scala outperforms Erlang for almost every value for goal.


10^1 Execution took: 12ms 858us 0ns
10^2 Execution took: 33us 0ns
10^3 Execution took: 59us 0ns
10^4 Execution took: 356us 0ns
10^5 Execution took: 4ms 261us 0ns
10^6 Execution took: 3ms 79us 0ns
10^7 Execution took: 32ms 903us 0ns
10^8 Execution took: 307ms 669us 0ns
10^9 Execution took: 2s 969ms 420us 0ns
10^1 Execution took: 35us 0ns
10^2 Execution took: 21us 0ns
10^3 Execution took: 23us 0ns
10^4 Execution took: 49us 0ns
10^5 Execution took: 316us 0ns
10^6 Execution took: 2ms 977us 0ns
10^7 Execution took: 29ms 556us 0ns
10^8 Execution took: 298ms 727us 0ns
10^9 Execution took: 2s 977ms 928us 0ns

This shows Scala clearly outperforms the Erlang implementation. And by a wide margin. Note the 3 seconds for goal = 1,000,000,000. It took the Erlang version 2 minutes to complete.

Let me try to put these results into perspective. With 'one-shot' execution, Erlang outperforms Scala by a wide margin. But when code executes repeatedly, Scala really shines and Erlang is -let's be fair- no match. Now try to imagine what will happen in a situation where an application is comprised of hundreds or possibly thousands of SCP's (Sequential Communicating Processes). Each of these processes complete a distinct and hopefully well defined and simple task.

The larger the load on such a system, the more Scala and JIT will shine and the more it will outperform an equivalent implementation in Erlang. Until now Erlang used to be in a class of it's own when it came to applications under heavy load. In a shoot out between Apache and YAWS (a web server implemented in Erlang), YAWS was still functioning at over 80,000 parallel connections. Apache died when subject to a load of 4,000 parallel connections. Check the Apache vs Yaws report for further details.

In a similar setup for Erlang vs. Scala my money definitely would be on Scala. I didn't dive into Akka yet, but the information I've seen until now gives me confidence that Scala, accompanied by frameworks like Akka and Play will be a winning combination for those who have the guts to dare. Especially Play is worth noting since it combines the strengths of Node.js (asynchronous I/O) with those of Erlang (massively parallel processing).

March 21, 2012

First attempt to shoot myself in the foot; Erlang

Acquiring knowledge is one of the joys in my every day life. Greater joys do exist and while important, are beyond the scope of this blog.

I recently installed Erlang and Scala and started experimenting with both of them. To get acquainted with a new programming language, I first read a little. After a couple of chapters I write some code and play around with it. If there's an interactive shell available, I play with it to get a feel for the syntax and semantics. I play around until I'm satisfied I've got a firm grip on the subjects presented in the first chapters. I then repeat this until I'm satisfied with the knowledge gathered.

So here's my first experiment with Erlang. Please take into account that these are my first lines of Erlang code. Comments are explicitly encouraged, especially from seasoned Erlang programmers.

Let's write a program that given G produces sequences of numbers x, x+1, x+2..., x+n such that G = sum(x, x+1...x+n).

Here's my first attempt using Erlang

seqwithsum(Goal) ->
  Max = Goal + 1 div 2,
  [ {Tail, Head} ||
    Tail <- lists:seq(1, Max-1),
    Head <- lists:seq(Tail+1, Max),
    lists:sum(lists:seq(Tail,Head)) =:= Goal
  ].


Nice. Compact, elegant and inefficient. Execution time will be some third degree function of G. Nonetheless these 7 lines of code perform fairly well for smaller values for G:

1> timer:tc(mycalc, seqwithsum, [100]).
{2207,[{9,16},{18,22}]}
2> 

2.2 milliseconds to compute the answer. However, for larger values of G performance quickly degrades. G = 1000 takes 1.6 seconds. Let's have a look at the numbers:

Gtime(µs)
  10     12
  20     46
  30    109
  40    230
  50    382
  60    727
  …      …
 100   2396
 200  16600
 300  50000
 400 112000
 500 212000
   …      …
10001619000

This clearly does not scale well with regards to G. There's two generators and one filter computing the sum of a sequence (and thus iterating over the sequence at hand). Execution time is some third degree function of G. Using polyfit from GNU Octave here's an approximation for t as a function of g:

      t  =  0.01*g^3 + 6.4*g^2 - 3854*g + 161950

Let's try to get rid of these three nested loops. To do so I'll use recursion, which is said to be a natural fit for Erlang. So let's have a go:

seqwithsum(Goal) ->
  seqwithsum(1, 2, 3, Goal, []).

seqwithsum(Tail, Head, Sum, Goal, L) ->
  if
    Tail + Head > Goal ->
      L;
    Sum < Goal ->
%     advance head
      seqwithsum(Tail, Head+1, Sum+Head+1, Goal, L);
    Sum > Goal ->
%     advance tail
      seqwithsum(Tail+1, Head, Sum-Tail, Goal, L);
    true ->
%     Prepend matching sequence to list and advance tail
      seqwithsum(Tail+1,Head,Sum-Tail,Goal,[{Tail, Head}] ++ L)
  end.

Execution time now will be a linear function of G. Let's have a look at how this performs:

1> timer:tc(mycalc, seqwithsum, [100]).
{17,[{18,22},{9,16}]}

17 µs? That's better than I expected. A 140 fold improvement over the previous approach. That looks promising indeed. The few extra lines of code were definitely worth their while. Let's experiment some more with this version to see where that takes us:

2> timer:tc(mycalc, seqwithsum, [10000]).
{1450,[{1998,2002},{388,412},{297,328},{18,142}]}

Looking even better compared to the previous algorithm where G=1000 took 1.6 seconds to compute the result. Literally a thousand times better than the first approach. Let's try to comprehend the consequences of this experiment. Some extra tinkering led to a thousand fold performance gain!

Back to the code. This is a recursive solution to the problem. Experience tells me I must be able to break this. It's a recursive approach so it must be easy to force a stack overflow. A recursive implementation of fac in Pascal caused a stacktrace with fac(5000), so my recursive seqwithsum(1000000)will break for sure. No greater joy for a programmer than break his own programs. Let's give this another try:

3> timer:tc(mycalc, seqwithsum, [1000000]).
{131456,
 [{199998,200002},
  {39988,40012},
  {7938,8062},
  {7749,7876},
  {1288,1912},
  {1243,1882}]}

This time the answer consists of 6 sequences and it took BEAM 131 ms to compute. These figures support the conclusion that execution time is a linear function of G. Determined to bring the beast to it's knees I type:

> timer:tc(mycalc, seqwithsum2, [1000000000]).

Hah, that seems to have knocked out BEAM. I stare at the screen convinced that victory is mine. Find sequences of integers where the sum of the sequence is 1 billion. That wil break things for sure. Since the algorithm is recursive this will result in a stack overflow. No doubt about that.

The cursor blinks and two of the four cores of my Mac OS X system consume somewhere around 50% cpu. I threw a brick at Erlang and it will shatter its windows to pieces, that's for sure.

But after 132 seconds disappointment kicks in as BEAM responds with the following:

{131827022,
 [{199999998,200000002},
  {39999988,40000012},
  {7999938,8000062},
  {1599688,1600312},
  {976051,977074},
  {318438,321562},
  {192753,197872},
  {56188,71812},
  {26263,51862}]}

Puzzling this didn't result in a stack-overflow? Head and Tail will advance until the end condition Head + Tail > GOAL is reached. Each advancement of Head and each advancement of Tail will add one to the recursion-depth, resulting in GOAL-1 recursive calls to seqwithsum/5. For GOAL = 1 billion, this ought to create 1 billion -1 stack frames. But no paging occurred. Memory allocated to BEAM is steady at 10MB during this calculation.

Clearly there's something else going on here: tail-recursion. Tail recursion allows the compiler to replace the recursive function calls with a simple jump to the entry point of the function.

My next blogpost will be about solving this problem in Scala and comparing the results of the Scala solutions with the results of the solutions In Erlang.

Erlang specialists are explicitly invited to comment on this article.