Thursday, December 26, 2013

Duck Typing vs. Structural Typing

This post is in response to the post Duck Typing in Scala: Structural Typing, which was intended to introduce readers to structural types in Scala. At the beginning the author describes how structural types aren't exactly duck-typing, but I believe he draws the wrong distinction. His statement is that structural types aren't duck typing because they are checked at compile time. However, if you look at the Wikipedia entry for duck-typing, which he links to, you will see that static checking isn't what is significant here. Indeed, there is a section on duck-typing in statically typed languages. So if the static typing isn't what causes Scala's structural types to not be duck-typing, what does? I am going to argue here that it is the explicit interface of the structural types that distinguishes them from duck-typing.

When a language is duck-typed, the type correctness is determined by whether or not the value passed into a function/method supports the operations that are used on it. The author of the post is correct that most languages that use duck-typing are dynamically typed languages like Python or Scheme. In these languages, the correctness of the program is determined while it is running. Any value can be passed into a segment of code, but if that code does something the value doesn't support, it will cause a runtime error. There are no correctness checks at a "compile" phase that make sure the values passed in actually support the operations needed.

Duck-Typing in C++

We can contrast this to another language that includes duck-typing but does have static checks on the type correctness. While Wikipedia mentions Boo, C#, and F#, I would point to templated code in C++. The first example used in the original article shows Scala code that includes a structural type that requires the quack method. I will write a similar example here using templates in C++.
template
void quacker(const Q &q) {
    cout << q.quack("Quack") << endl;
}

class BigDuck {
public:
    string quack(const string &value) const;
};

class SmallDuck {
public:
    string quack(const string &value) const;
};

class NotReallyADuck {
public:
    string quack(const string &value) const;
};

string BigDuck::quack(const string &value) const {
    string ret(value);
    for(auto &c:ret) c = toupper(c);
    return ret;
}

string SmallDuck::quack(const string &value) const {
    string ret(value);
    for(auto &c:ret) c = tolower(c);
    return ret;
}

string NotReallyADuck::quack(const string &value) const {
    return "prrrrrrp";
}

int main() {
    quacker(BigDuck());
    quacker(SmallDuck());
    quacker(NotReallyADuck());

    return 0;
}
The quacker function is perfectly happy taking all three of the defined classes as arguments, despite they fact that they have no common supertype. However, if you try to call quacker with a type that doesn't have a quack method with an appropriate signature, you will get a compile error.

So how is this different from the Scala example shown in that post and why do I feel that this is duck-typing when structural types aren't? To illustrate this, we will focus in on just the quacker functions. Here is the Scala version from that post.
def quacker(duck: {def quack(value: String): String}) {
  println (duck.quack("Quack"))
}
Now compare that to the C++ code shown above. Note how the Scala code explicitly states that the argument passed in needs to have a quack method. This doesn't appear in the C++ code. In C++, the requirement for a quack method is implicit, not explicit. It is not directly stated, but instead it is inferred by the compiler when it tries to take in an argument and use the type of that argument as type Q. This is the hallmark of duck-typing. In true instances of duck-typing, you never say whether the type can walk, swim, and quack, it just tries it out and creates an error when it can't. The question of whether that check is done at compile time or during runtime is orthogonal to whether it is using duck-typing.

Structural Typing

The structural types in Scala aren't duck-typing simply because the required interface is explicit, not implicit. So if structural types aren't duck-typing, what can we say they are? Well, they are a form of structural type system. Surprising, huh? You can find information comparing structural types to nominative type systems, but the basic difference is that in a structurally typed language (or system that is part of a langauge), it is the structure of the types that matters, not the point of declaration of the name given to them. Most programmers are more used to the nominative approach, but there are languages that use structural typing. In this approach, types like class Point3D(x:Double, y:Double, z:Double) and class Vect3D(x:Double, y:Double, z:Double) would be equivalent. So you could use one in place of the other. I have to admit that this makes me cringe a bit because I would really like these to be distinct types, but there are places where this style of type equivalence has value and it is for those situations that the structural types in Scala exist.

You can effectively define subtypes in a structural type system as well. Adding members/operations to a type effectively creates as a subtype. For example Particle(x:Double, y:Double, z:Double, mass:Double) could work as a subtype of the Point3D or Vect3D types above. Any code that uses a Point3D would only access x, y, and z and those are all safe to access on Particle so it is safe to treat Particle as a subtype in a structurally typed system.

Relative Merits

I think at this point it is worth comparing the relative merits of duck-typing to structural typing and I will focus on the comparison between templates in C++ and structural types in Scala so that the issue of static checking versus dynamic checking isn't in play. (Holy wars continue to wage over that particular distinction that would only muddy the waters here.) I'm also going to ignore the question of performance, because that is really an implementation detail. Templates in C++ are generally fast because the compiler generates distinct machine code for each distinct template usage. The only potential problem with that is the code bloat that can occur if a large piece of code has multiple template arguments and is called with a lot of different types. Structural types in Scala are implemented with reflection, making them much slower than normal calls. For that reason, structural types in Scala should generally be avoided. Use them when you need them and in situations where the performance impact won't be too significant. The real question I want to consider here is, which is better, having an explicit interface that types must adhere to in order to be used or having an implicit interface that comes about simply based on the usage of that type?

While I'm not addressing the static vs. dynamic type issue here, I have a funny feeling that similar lines will be drawn in the sand for this question as well. After all, those who like dynamic typing are probably more comfortable with the "freedom" of an implicit interface. Those who prefer the "safety" of static type checking are probably more prone to like to actually see the explicit interface. I fall generally into the latter camp. The implicit interfaces of templates in C++ can, in my opinion, lead to all types of challenges. How do you know what you can pass into a templated function? Well, you could go through all the code and find every required operation, or you could just try something and see if it works. Granted if it doesn't work, the error messages you get are likely to be both long and cryptic. While C++ compilers aren't generally known for producing easy to interpret error messages, few can argue that deeply nested template functions/methods can produce wickedly long and obfuscated error messages. (Did we really need 100+ lines of error output because the type I called this with doesn't have operator< defined?) The challenges of these implicit interfaces are also seen in the documentation for C++. For example, what is the interface for an iterator? It is documented as part of the standard libraries, but it isn't really anywhere in the code. There is nothing that says that a class must have these operations to be an iterator, it is just a convention that people follow. The addition of concepts in a future release of C++ might fix this, but for now it is one of the challenges of using the language.

I have to admit though that there are some freedoms to these implicit interfaces. Going back to iterators, the list of things that are required for the iterator types in C++ is fairly long. Writing the structural type in Scala that would be equivalent to that would be a real pain. So if the number of required members/operations is long, having to explicitly declare them makes for unhappy code. Of course, if it is something that is used frequently, the proper solution would be to create a trait/interface for that type and pass that around. Doing so would also get around the performance problems in Scala. However, that approach does remove some of the freedom you get from duck-typing or structral types. Going back to the iterator example, most code won't really need all of the methods/operations of an iterator. Most of the time you are only going to use a few key ones. While the standard library code has everything in it, if you write your own for a private library, it is really nice if you can get away with only implementing those things you really need and not everything else that would normally go there. I had my students writing lots of C++ iterators in a data structures class this last semester. I never forced them to write absolutely complete iterators. There was no point. They learned what they needed from a subset of the methods and it was sufficient for the purposes of what we were doing. It is true that in Scala, you can often write a trait that has a minimal set of abstract methods and defines other things in terms of these, but I have to admit that there is still a certain benefit to having to only write the code that is absolutely needed and there are even times when not having to write all the required members explicitly is also very handy.

Sunday, December 1, 2013

More Benefits of Scala in CS1

In my last post I went through most of the aspects of Scala that I feel make it a nice fit for teaching CS1. Since that post was getting a bit long I stopped it before I had completely finished. In this post I will hit the last few points.

Total Access to Java

I probably should have included this in the first post because it really is very significant for a number of reasons. The one that I think matters most for educators is that most educators have years of experience with Java and have built up a large knowledge base related to the Java API and other Java libraries. Moving completely away from that can be scary. Scala doesn't just run on the JVM, you can call Java code and libraries in a way that is completely seamless. Indeed, if it weren't for the java. at the front of the complete names of classes (or import statements) students wouldn't even realize that they are calling code written in a different language.

While there really isn't much need for java.util.Scanner in Scala, it is a library that every CS1 teacher who has used Java is likely familiar with. For that reason, I will use it as an example.
val sc = new java.util.Scanner(System.in)
val str = sc.nextLine()
val word = sc.next()
val fiveInts = Array.fill(5)(sc.nextInt())
Making objects and calling methods in Scala can use the same syntax as in Java. The Scala libraries include things for which they believed there was a benefit to creating something new using Scala style. For some functionality, you will use the Java libraries because there wasn't an advantage seen to creating a new one in Scala. For educators though, this means that if you aren't certain how something is done in Scala you can fall back on Java. Granted, over time you will want to learn the proper Scala style for doing whatever it is, but you can make the learning curve a bit less steep early on.

From the student perspective, the reality of the world is that they need to learn Java at some point. Maybe that won't be true in a decade, but it is part of the reality of things right now. Given that, the close tie between Scala and Java can make it easier for students to learn Java on their own. That will be especially true if they learn some other C-family language along the way. I would argue that they need to learn some unmanaged language, so C/C++ should appear at some point in the curriculum. That gives them the main syntactic differences between Java and Scala and they should have a reasonable knowledge of some of the more significant elements of the Java API from having learned Scala. Their Scala background will also serve them well for understanding the Java memory model other than the peculiarities of primitives, which are more similar to C/C++ behavior.

Strings as Collections

I described some of the benefits of collections in the last post. The Array and List types aren't the only types that those methods can be called on. A String can also be treated as a sequence of characters. This is made possible by an implicit conversion. The details of implicits are part of the higher levels of Scala programming and not something I deal with in CS1. In fact, I generally don't even have to tell students anything about implicits in CS1. After introducing the basic collections I can simply tell students that all the methods we just learned for working with Array and List can be called on strings and they get it.

To help illustrate this point let's look at some examples.
val upper = str.map(_.toUpper)
val vowelCount = str.count(c => "aeiou".contains(c))
val letterCount = for(c <- 'a' to 'z') yield 
  str.count(_.toLower == c)
The last two examples aren't ideally efficient, but the point is that they are very clear and simple to express and they make things very tractable for CS1 students. That allows you to push to bigger examples/exercises than you might do otherwise.

It isn't just the higher order methods that are available either. You can look at the API page for scala.colection.immutable.StringOps to see the various options. Not only does this improve expressivity over plain Java strings, it goes back to making things uniform. When students get comfortable calling methods with certain names to do various things on the collection types, it is remarkably handy that they can call the same methods to do the same things on Strings as well.

Files as Collections

One of the topics that I find really opens up options for what I can have students do in CS1 is text files. Without files you are limited to information that exists for a single invocation of the program, and you are limited with how you can handle larger data sets. I really like having my students play with big files. After all, if they could do something by hand with a calculator it is challenging to convince them that there was really a need to program a computer to do that task. However, any student who looks at a file with over 10,000 lines of data in it knows immediately that answering any question I ask about that file is a task better left to a computer.

I mentioned java.util.Scanner above. In a Java course that would be the primary class used to read text files. As the code shows, there is no problem at all using Scanner in Scala as well. However, the Scala API provides another class called scala.io.Source that can be used to read text data as well. The advantage of Source over Scanner is that Source fits in with the Scala collections library, which means all the methods on collections that students learned earlier in the semester work here as well. Probably the best way to illustrate the power this gives is with an example. If I have a file that has a matrix of numeric values in it separated by spaces, the following code will read it in.
def parseLine(line:String):Array[Double] =
  line.split(" +").map(_.toDouble)

val source = io.Source.fromFile("data.txt")
val data = source.getLines.map(parseLine).toArray
source.close()
This code starts with a function called parseLine that convers a line of text into the data that we want. The body of this function is simple enough that we could have done it inline below. However, I find this to be more readable and it is a pattern that students will repeat for lots of different file formats, including those where the logic for parsing one line is more complex. After this we open a file using scala.io.Source. Due to the way that packages and imports properly nest in Scala and the fact that the scala package is imported by default, we can use the shorter name of io.Source.

The Source type is an Iterator[Char]. The Iterator type does what you would expect from Java or other languages in that it has methods called next() and hasNext(). It is also part of the collections library and has most of the other methods that one expects to have on a sequence of values. The only difference is that for an Iterator, the values are consumed as you move through them and not all stored in memory.

Most of the time you don't really want to work with individual characters. The method getLines returns an Iterator[String] that does what the name implies. In this example, we call the map method on the result of getLines and pass it the parsing function we had written above. This would give us an Iterator[Array[Double]], which would be consumed as you go through it. For most applications we would want to be able to go through the data multiple times, hence the call to toArray

What should really jump out to you in this example, in terms of the impact on CS1 instruction, is how consistent the problem solving approach is from the collections to the file handling. The fact that the Source acts just like other collections in terms of the available functionality allows students to begin using files more rapidly than if the code were completely different from anything they had seen before.

Now I can see someone who has taught Java pointing out here that files aren't something completely new because in Java they use Scanner for both standard input and files. There is uniformity there in Scala as well. If I wanted my data to come from standard input, the previous example would have simplified to the following.
val data = Array.fill(numLines)(readLine()).map(parseLine)
This uses the same parseLine function as above, but now works on an Array[String] that we get from combining Array.fill with the basic readLine function that has been used since the first week of class.

case classes are Like Records/Structs

What if my the data in my file wasn't just a bunch of numbers? CS1 needs to include the grouping of heterogeneous data elements together. Early in the semester I do this with the tuple mechanism in Scala. I haven't gone into that in these posts, but tuples are also very handy in CS1, especially if you ever run into functions that need to return more than one value. Around the same time that I cover files, I also cover case classes. Remember that in CS1 I am not intending to teach full object-orientation. I don't want to go into a lot of details about classes and methods. I just want a simple way to group together data in a manner where the elements have meaningful names. In my mind, I want something similar to a struct in C.

One of the examples I frequently do that involves files and case classes is processing of historical weather data. The Oak Ridge National Laboratory has a nice site where you can pull down historical data from across the country. I generally download a file and include a few key fields like min, max, and average temperatures, as well as precipitation. The data also includes other fields for the date and location the data comes from. This data has some fields that are integer values only, some that have decimals, and some that aren't simple numbers. I want my students to create their own type that stores some of these values. Consider what this code might look like in Java.
public class TempData {
  public final int month;
  public final int year;
  public final double precip;
  public final int minTemp;
  public final int maxTemp;
  public final int aveTemp;
  public TempData(int m,int y,double p,int min,int max,int ave) {
    month = m;
    year = y;
    precip = p;
    minTemp = min;
    maxTemp = max;
    aveTemp = ave;
  }
}
I made all the values final so that it is safer to make them public. Had we wanted them private and mutable, this code would get a lot longer with the required getters and setters. In this case I'm really happy with having the type be immutable. I'm just not happy with the amount of code or the fact that I'm definitely going to have to cover constructors for students to be able to write this.

Now compare this to the equivalent case class in Scala.
case class TempData(month:Int,year:Int,precip:Double,minTemp:Int,maxTemp:Int,aveTemp:Int)
Yes, that's it. The syntax for classes in Scala was designed to reduce boiler-plate. The case keyword adds a number of nice things in here as well. The only one that really matters for CS1 is that the arguments to the class all become public vals. Recall that a val is the same as a final variable in Java so we really are creating similar things here. It turns out, the Scala code comes with a lot of bonuses as well because case classes get their own definition of equals and hashCode. They also come with the ability to pattern match, a feature that some instructors could make use of in CS1, depending on how they teach their course.

To help illustrate how this might be used, let's look at code I would write to read in the contents of a data files into this case class and then to calculate the averages for the min, max, and average temperatures across all the data.
case class TempData(month:Int,year:Int,precip:Double,minTemp:Int,maxTemp:Int,aveTemp:Int)

def parseLine(line:String):TempData = {
  val parts = line.split(",")
  TempData(parts(2).toInt,parts(4).toInt,parts(5).toDouble,parts(6).toInt,parts(7).toInt,parts(8).toInt)
}

val source = io.Source.fromFile("temps.csv")
val data = source.getLines.drop(2).map(parseLine).toArray
source.close()
val aveMin = data.map(_.minTemp).sum.toDouble/data.length
val aveMax = data.map(_.maxTemp).sum.toDouble/data.length
val aveAve = data.map(_.aveTemp).sum.toDouble/data.length
The call to drop(2) is in there because the data file starts off with two lines of header information. Everything else should be fairly self-explanitory at this point. As an exercise for the reader, write this in the language you use for CS1 and compare the code length and the difficulty of construction.

GUIs and Graphics Without Explicit Inheritance

I remember when I started college and most students in CS1 would actually be impressed by having written their Hello World program or writing something that counted to ten in a text interface. The standards for what students expect from their computers has changed a lot since then, and as most faculty teaching early CS courses realize, it really helps to have a graphical interface if you want to get students involved. This is why I was very happy when I found that I could get students to write GUIs and even do Java 2D based graphics without them having to know about inheritance.

The first time I taught Scala in CS1 I was still pulling on a lot of my knowledge of Java. In Java, I felt that I really couldn't do GUIs without students understanding inheritance so that they could make subtypes of things like listeners. I really didn't want to cover inheritance during CS1, but both the students and I really wanted to do something graphical. After thinking about it for a while, I realized that using the scala.swing package I would be able to do GUIs easily with no explicit inheritance and using a syntax that I felt students would understand without knowing inheritance.

To illustrate this, here is a short script that brings up a GUI with three elements, a test field, a list, and a button. Text in the text field is added to the list when the user hits enter. Selected items in the list are removed when the button is clicked.
import swing._
import event._
import BorderPanel.Position._

val list = new ListView[String]
val field = new TextField
field.listenTo(field)
field.reactions += {
  case ed:EditDone => 
    if(field.text.nonEmpty) {
      list.listData = list.listData :+ field.text
      field.text = ""
    }
}
val frame = new MainFrame 
frame.title = "Simple GUI"
val bp = new BorderPanel
bp.layout += field -> North
bp.layout += list -> Center
bp.layout += Button("Remove") {
  list.listData = list.listData.diff(list.selection.items)
} -> South
frame.contents = bp
frame.size = new Dimension(300,500)
frame.centerOnScreen
frame.open
This particular solution has no inheritance at all. I show it to make it clear how this can be done, though I have to admit this isn't my normal style. My normal style for this code would use anonymous classes where the inheritance is implicit and code related to many of the GUI elements is nested in those GUI elements.
val list = new ListView[String]
val field = new TextField {
  listenTo(this)
  reactions += {
    case ed:EditDone =>
      if(text.nonEmpty) {
        list.listData = list.listData :+ text
        text = ""
      }
  }
}
val frame = new MainFrame {
  title = "Simple GUI"
  contents = new BorderPanel {
    layout += field -> North
    layout += list -> Center
    layout += Button("Remove") {
      list.listData = list.listData.diff(list.selection.items)
    } -> South
  }
  size = new Dimension(300,500)
  centerOnScreen
}
frame.open
Since students are used to seeing curly braces being used to signify the body on things like function, it isn't a significant cognitive leap for them to see how the curly braces after something like new TextField define a scope and body associated with a text field. Whichever approach you prefer, the bottom line is that you can build GUIs and make them interactive without students understanding inheritance.

This example code hopefully also illustrates again the benefit of Scala's close ties to Java. While there are differences between the details of how things are done in javax.swing and scala.swing, the approaches are similar enough that anyone with a working knowledge of Swing and some knowledge of the Scala syntax can quickly figure out how to make things work.

I also like to have my students use graphics, and here again I stick with Java libraries and use Java 2D. It is possible that at some point I might switch to JavaFX and the wrapper classes in ScalaFX, but since I prefer to work with libraries that are part of the default install, I expect that even if I move to JavaFX, I will use the Java interface.

To get graphics into a GUI I wind up needing to do a little bit of inheritance and I do have to introduce the override keyword. The following code gives a simple demonstration where a dot follows the mouse around on the window.
import java.awt.Graphics2D
import java.awt.Color
import swing._
import event._
import java.awt.geom._

var mx = 0
var my = 0
val panel = new Panel {
  override def paint(g:Graphics2D) {
    g.setPaint(Color.white)
    g.fill(new Rectangle2D.Double(0,0,size.width,size.height))
    g.setPaint(Color.blue)
    g.fill(new Ellipse2D.Double(mx-5,my-5,10,10))
  }
  listenTo(mouse.moves)
  reactions += {
    case mm:MouseMoved =>
      mx = mm.point.x
      my = mm.point.y
      repaint()
  }
  preferredSize = new Dimension(400,400)
}
val frame = new MainFrame {
  title = "Follow the Mouse"
  contents = panel
  centerOnScreen
}
frame.open
It is not too hard to go from this to simple games. The scala.swing package includes a Swing object that makes it easy to create ActionListeners that can be used with a javax.swing.Timer to create regular updates in a game.

Parallelism

This list wouldn't be complete without some mention of parallelism. I hold off on most of my coverage of parallel until CS2, but I like to introduce it in CS1 to get the concept into their heads early. This can be done nicely with parallel collections. You can get a parallel collection from a regular collection with the par method. You might not have many things in CS1 that benefit from being done in parallel, but you can fairly easily demonstrate some of the challenges and why making things immutable is beneficial in a parallel world with the following simple example.
var cnt = 0
for(i <- (1 to 1000000).par) cnt += 1
println(cnt)
Clearly this code should print out one million at the end. Of course, it doesn't because the increments are happening to a mutable value across multiple threads and race conditions are causing some of those operations to be lost. If you happen to have larger problems where they actually take a while to accomplish (something that is fairly challenging on modern processors), you could easily use the parallel collections to enhance the speed of that workload.

This finishes off my list for CS1. A lot of these items are improvements on Java that put Scala more on-par with Python or other scripting languages for the purposes of CS1. In my next post I'll hit on why I like using Scala in CS2. That is where I think that Python falls a little flat.