Friday, July 1, 2016

The Value of pass-by-name Semantics

One of the nice features of the Scala language is the ability to use pass-by-name semantics. This isn't something that was invented for Scala. I remember first encountering it while studying for the CS subject GRE back in the mid-1990s. However, it is a feature that isn't very common in modern languages. For those who aren't familiar with this passing style, an argument that is passed-by-name is not evaluated when the function/method is called. Instead, a "thunk" is passed into the function/method. This "thunk" is basically a closure over the code with no arguments. It is evaluated each time that the value is used in the function/method. If it helps, you could think of it as being a lazy argument that doesn't cache its value, so it is re-evaluated each time. This might not sound like much, but it enables a lot of the powerful features of the Scala language, especially in regards to creating DSLs.

Today I was asked about an ideal way of doing something in Java. The problem involves reading through a file and adding values to a Map. After some testing, it was discovered that these files include duplicate keys. Since the Java libraries don't include a multimap, the solution is to use a Map<String, List<Integer>>. The person working on this code was trying to find a succinct way of writing the code that will add values to this Map. Of course, the first time a key is found, you have to create a new List and add the one value. Subsequent matches on that key need to add the value to the List that is already there. Here is the straightforward approach to doing this.

  if(map.containsKey(key) {
    map.get(key).add(value);
  } else {
    map.put(key, value);
  }

It's not really all that long or complex, but the question was, can it be improved upon, especially since Java 8 added some new, more expressive, functional features. If I were doing this in Scala, I would use get on the Map to get back an Option and work with that, or I could use getOrElse on the Map directly. It happens that Java 8 added Optional as well as a getOrDefault method on their Map type. As a result, we can write the following code in Java 8.

  map.put(key, map.getOrDefault(key, new ArrayList<String>()).add(value));

This is perfectly fine code in many ways. It compiles. It runs. It does what you are expecting. People familiar with this library will even find it to be easy to read. However, there is a detail in here that isn't immediately obvious that makes this code rather wasteful with memory. (That really just means that your garbage collector is going to be invoked most often than it really needs to be.) The problem is that all arguments in Java are evaluated then passed either by value or by reference depending on whether they are primitives or object types. The key is that the new ArrayList<String>() is going to be evaluated every time this line is executed, even if the value is in the Map. On the other hand, in Scala, the getOrElse method passes the second argument by name. As a result, it will never be evaluated unless needed. So when you do something like this, you are not doing extra work to calculate a default unless you actually need it. Granted, the library writers for Java 8 could have opted to make the second argument of getOrDefault a function or something like it that could be passed as a lambda expression. However, they didn't. Probably because doing so would have made the calling syntax uglier. Pass-by-name does not have that drawback.

Of course, pass-by-name has a lot of other handy uses in Scala. It is probably the most important feature when it comes to Scala's ability to implement Domain Specific Languages and create libraries that function like language extensions. This was the first time though that I could recall really feeling the lack of this feature in some other language. Other new languages, that I have looked at some, Rust and Kotlin come to mind, seem to have also skipped over this feature. Perhaps it is time other languages began to bring it back into standard usage.