April 28, 2012

Study, choices and focus

Choices, choices, choices.

In the past few weeks I've been looking into a number of things I would like to study in the coming months. On the database side I had a look at NoSQL solutions. Notably CouchDB, CouchBase and MongoDB. I installed all of these on my Mac and experimented a couple of hours with each of them.

Programming languages.

I had a look at Scala and Erlang. Languages I never used before and that both present me with a few new concepts. I implemented one chore in Scala and multiple chores in Erlang. I published my Scala chore and the Erlang equivalent on this blog (Erlang chore here, Scala chore here).

On Rosetta Code I completed three simple tasks in Erlang. Count occurrences of a substring, HQ9+ and Ceasar cipher. These are, from a computer science point of view, simple chores. The idea behind implementing them was not they would present me with an intellectual challenge. The goal was to get familiar with the syntax. To get a feel for the beauty or ugliness and the possibilities and limitations of these languages.

Limitations.

Now it's about time to make up my mind what to study the coming months. I can't study all these things in parallel. Next to the intellectual challenge this would present, I've got a day-time job to attend. Besides that the missus and our children, while very patient, also deserve attention. Luckily my Mac is in the living room. That gives me, the missus and the kids a sense of togetherness while each doing our own thing. Here's a difference with my previous live where a subject would totally "suck me in". Completely unable to react or respond to events in my environment. Now I happily chat with the missus or the kids while figuring out map/reduce in CouchDB or hammering out some code in Erlang or Scala.

Choice of language.

Back to the choice of language to study. The fact I implemented more chores using Erlang already gives a hint. Perhaps I didn't do justice to Scala. Perhaps I should have implemented more chores to get more proficient in Scala. But the truth is that Erlangs syntax is really, really simple. Scala's syntax is  more elaborate but the advantages of this more complex syntax over Erlang remain unclear.

Martin Oderski was quite upset when someone stated Scala was about to become the next C++. In my view Martin and his team already set the first steps in making Scala the next C++. It's difficult to top the ugliness of C++'s syntax. And let's be fair, Scala currently doesn't come close, But for me, Scala comes close enough to keep away from it for the time being. That does not mean Scala isn't worth studying. It is. It's capabilities and the fact it produces code for the world's most popular runtime will make it a popular language. No doubts about that.

Erlang's syntax is very simple and elegant. Some minor points ignored (e.g. the ugly "if" construct, the use of which is by the way not mandatory), Erlang's syntax is the ultimate application of "Occam's razor" and stands in stark contrast with Scala's elaborate and sometimes incromprehensible language constructs.

So the coming months I'll be studying Erlang and especially OTP. I'm excited about Erlang as a language and OTP as a framework for designing massively parallel and fault tolerant systems. The elegance simply grabbed me and implementing simple chores like the Ceasar Cipher gave me a feel of true elegance. It convinced me that Erlang is a language that allows me to express my ideas in a very elegant manner. Not because of my programming-abilities or my proficiency in Erlang but because of the elegance of the language.

Perhaps I didn't do right to Scala by implementing only a single chore. But the beauty and elegance of Erlang simply grabbed my attention while Scala's syntax repelled me. While Erlang's language constructs feel natural, elegant and simple, Scala's constructs sometimes feel artificial.

My point of view is that Martin Oderski is desperately in need of Occam's razor and is well advised to take the remark that Scala will be the next C++ as a kind and very relevant advise.

Nagging aspects in language choice.

I see two nagging aspects in my choice for Erlang vs Scala. One is Scala's Akka framework which is relevant without any doubt and really looks like a thing of beauty. The other aspect is the Play framework which looks very promising.

Akka.

The fact that a small team is able to implement capabilities that rival Erlang's OTP framework is an accomplishment in itself. The relevant metrics in comparing Akka and OTP would be simplicity of use, performance and lines of code spent using Akka or OTP.

The concurrency provided by Akka and Erlang is based on Communicating Sequential Processes. In Erlang this is implemented as a language feature, while Akka is a framework written in Scala.

Erlang/OTP has a proven track record with respect to reliability and uptime. Many Erlang/OTP based systems showed uptimes of 5*9 or even better. In this respect Erlang is in a class of its own. It remains to be seen wether Scala/Akka will be able to deliver the same with equal or comparable cost in hardware or development effort .


Play framework.

Another compelling aspect of Scala is the "Play" framework. The value of such a framework is beyond imagination. Frameworks like Play and Akka will propell Scala, that's for sure. Just like Ruby on Rails and ActiveRecord propelled Ruby.

Truth is that the guys that implemented Play borrowed about 99% of their ideas from David Heinemeier Hanssons work on Ruby on Rails without adding significant new ideas. Play to me looks and feels like Rails. Been there, done that. In fact still doing it. So no real benefit in studying Play. That's not to say Play isn't worth studying. It is. Play's performance gain over frameworks like Ruby on Rails is tremendous.

Choice of database.

To be fair I must admit I have no experience with NoSQL databases like CouchDB, CouchBase, MongoDb or BigTable. A colleague of mine successfully used CouchDB in one of his projects saving him a ton of work.

My choice of NoSQL database has no foundation and is completely random. I picked MongoDB combined with Ruby to study for the coming months. Simply because the combination of MongoDB and Ruby could become relevant in my day-time job.

April 26, 2012

Making things more Erlangy

In a previous blogpost I wrote a little Erlang to find sequences n, n+1, ..., n+m that have a given sum:

seqwithsum(Goal) ->
  seqwithsum(1, 2, 3, Goal, []).

seqwithsum(Tail, Head, Sum, Goal, L) ->
  if
    Tail + Head > Goal ->
      L;
    Sum < Goal ->
%     advance head
      seqwithsum(Tail, Head+1, Sum+Head+1, Goal, L);
    Sum > Goal ->
%     advance tail
      seqwithsum(Tail+1, Head, Sum-Tail, Goal, L);
    true ->
%     Prepend matching sequence to list and advance tail
      seqwithsum(Tail+1,Head,Sum-Tail,Goal,[{Tail, Head}] ++ L)
end.


I now dislike the code I wrote because of the ugly if-construct. Now that I've read some more and experimented a little I come to the conclusion that this construct, while efficient, is not Erlangy. "If" looks like a language design after-thought. Guards simply have a better feel to them. So here we go:

seqwithsum(Goal) ->
  seqwithsum(1, 2, 3, Goal, []).

seqwithsum(Tail, Head, Sum, Goal, Results) when Sum < Goal ->
  % advance Head
  seqwithsum(Tail, Head+1, Sum+Head+1, Goal, Results);

seqwithsum(Tail, Head, _Sum, Goal, Results) when Tail + Head > Goal ->
  Results;

seqwithsum(Tail, Head, Sum, Goal, Results) when Sum > Goal ->
  %advance Tail
  seqwithsum(Tail+1, Head, Sum-Tail, Goal, Results);

seqwithsum(Tail, Head, Sum, Goal, Results)
  seqwithsum(Tail+1, Head, Sum- Tail, Goal, [{Tail, Head} | Results]).


That's more the way things are supposed to be in Erlang. At least that's my point of view. To me, this looks more elegant than the code using the if-construct.

Be aware that the guards are evaluated in the order in which they occur in the source. Moving the code guarded by Tail + Head > Goal further down will cause an infinite loop.

April 04, 2012

Erlang performance revisited; HiPE to the rescue

In my two previous articles I solved a trivial problem in Scala and Erlang: Write a program that -given G- produces sequences of numbers x, x+1, x+2..., x+n such that G = sum(x, x+1...x+n). The challenge was not to solve this trivial problem. My quest was to compare the performance of an Erlang solution to a Scala implementation.

I stated beforehand these were my first lines of code in both Erlang and Scala. For smaller values for G, the Erlang implementation outperformed 'single-shot' execution of the Scala equivalent by a wide margin. I then modified the Scala implementation in a direction where I knew JIT would shine. And, lo and behold, in this setup Scala outperformed Erlang. For larger values for G even by a wide margin.

And then I picked up the book "Erlang and OTP in Action" by Martin Logan et al. The introduction mentions the HiPE project (High Performance Erlang), a native code compiler for Erlang. This wetted my apetite for further experiments.

So I checked my Erlang installation and was happy to find it contained HiPE support. I recompiled my little Erlang program using HiPE with maximum optimization. This dramatically improved performance.

In my article on Scala I assumed that the larger the load on a system, the more JIT would shine and the more Scala would outperform an equivalent Erlang implementation. But this was no more than an assumption. One I'm not too sure about anymore. In the Scala setup I explicitly picked a situation that would clearly benefit JIT; a small piece of code repeatedly executed to warm up JIT in order to let it shine.

Both Scala and Erlang are directed towards large scale projects containing million+ lines of code. Erlang already proved to be able to hold its ground in such situations. For Scala this remains to be seen. In a large scale system a lot of the code is executed and a lot of it is executed in parallel. But when it's different code that gets executed all the time, JIT would suffer from its warm up delay. Causing effects that resemble my first attempts with Scala where I used "one shot" execution. A situation where Erlang outperformed Scala by a margin of 1.000

My experiments so far show Erlang clearly does not suffer from any startup delay, on the contrary. So I now doubt wether my conclusion that the larger the load on a large scale system, the more JIT will shine is correct. Especially since it is probable that in such a large scale system major parts of the code will only execute incidentally. Most of the time the system will be busy executing incidental code. In such a system a small part of the code will be executed almost continually, giving JIT an opportunity to show its strength. But most of the time the system will be busy executing incidental code and will, in the Scala case, suffer from the warm up delay which severely degrades performance.

Erlang shows not to suffer from such a warm up delay and performance is quite consistent. So in the Erlang case, performance becomes predictable. Predicting performance of Scala/JIT is much more difficult because the warm up delay plays a major role. My guess is that in a large scale system most code executed, is executed some of the time, some code is executed most of the time. That spells bad for JIT because in such a situation most of the code will suffer from warm up delay.

Best case, Scala will outperform Erlang with a factor 3. But worst case, Erlang will outperform Scala with a factor of better than 1000. In my Scala post I stated my money would be on Scala. I'm glad no one took the bet.