Thursday, December 26, 2013

Duck Typing vs. Structural Typing

This post is in response to the post Duck Typing in Scala: Structural Typing, which was intended to introduce readers to structural types in Scala. At the beginning the author describes how structural types aren't exactly duck-typing, but I believe he draws the wrong distinction. His statement is that structural types aren't duck typing because they are checked at compile time. However, if you look at the Wikipedia entry for duck-typing, which he links to, you will see that static checking isn't what is significant here. Indeed, there is a section on duck-typing in statically typed languages. So if the static typing isn't what causes Scala's structural types to not be duck-typing, what does? I am going to argue here that it is the explicit interface of the structural types that distinguishes them from duck-typing.

When a language is duck-typed, the type correctness is determined by whether or not the value passed into a function/method supports the operations that are used on it. The author of the post is correct that most languages that use duck-typing are dynamically typed languages like Python or Scheme. In these languages, the correctness of the program is determined while it is running. Any value can be passed into a segment of code, but if that code does something the value doesn't support, it will cause a runtime error. There are no correctness checks at a "compile" phase that make sure the values passed in actually support the operations needed.

Duck-Typing in C++

We can contrast this to another language that includes duck-typing but does have static checks on the type correctness. While Wikipedia mentions Boo, C#, and F#, I would point to templated code in C++. The first example used in the original article shows Scala code that includes a structural type that requires the quack method. I will write a similar example here using templates in C++.
template
void quacker(const Q &q) {
    cout << q.quack("Quack") << endl;
}

class BigDuck {
public:
    string quack(const string &value) const;
};

class SmallDuck {
public:
    string quack(const string &value) const;
};

class NotReallyADuck {
public:
    string quack(const string &value) const;
};

string BigDuck::quack(const string &value) const {
    string ret(value);
    for(auto &c:ret) c = toupper(c);
    return ret;
}

string SmallDuck::quack(const string &value) const {
    string ret(value);
    for(auto &c:ret) c = tolower(c);
    return ret;
}

string NotReallyADuck::quack(const string &value) const {
    return "prrrrrrp";
}

int main() {
    quacker(BigDuck());
    quacker(SmallDuck());
    quacker(NotReallyADuck());

    return 0;
}
The quacker function is perfectly happy taking all three of the defined classes as arguments, despite they fact that they have no common supertype. However, if you try to call quacker with a type that doesn't have a quack method with an appropriate signature, you will get a compile error.

So how is this different from the Scala example shown in that post and why do I feel that this is duck-typing when structural types aren't? To illustrate this, we will focus in on just the quacker functions. Here is the Scala version from that post.
def quacker(duck: {def quack(value: String): String}) {
  println (duck.quack("Quack"))
}
Now compare that to the C++ code shown above. Note how the Scala code explicitly states that the argument passed in needs to have a quack method. This doesn't appear in the C++ code. In C++, the requirement for a quack method is implicit, not explicit. It is not directly stated, but instead it is inferred by the compiler when it tries to take in an argument and use the type of that argument as type Q. This is the hallmark of duck-typing. In true instances of duck-typing, you never say whether the type can walk, swim, and quack, it just tries it out and creates an error when it can't. The question of whether that check is done at compile time or during runtime is orthogonal to whether it is using duck-typing.

Structural Typing

The structural types in Scala aren't duck-typing simply because the required interface is explicit, not implicit. So if structural types aren't duck-typing, what can we say they are? Well, they are a form of structural type system. Surprising, huh? You can find information comparing structural types to nominative type systems, but the basic difference is that in a structurally typed language (or system that is part of a langauge), it is the structure of the types that matters, not the point of declaration of the name given to them. Most programmers are more used to the nominative approach, but there are languages that use structural typing. In this approach, types like class Point3D(x:Double, y:Double, z:Double) and class Vect3D(x:Double, y:Double, z:Double) would be equivalent. So you could use one in place of the other. I have to admit that this makes me cringe a bit because I would really like these to be distinct types, but there are places where this style of type equivalence has value and it is for those situations that the structural types in Scala exist.

You can effectively define subtypes in a structural type system as well. Adding members/operations to a type effectively creates as a subtype. For example Particle(x:Double, y:Double, z:Double, mass:Double) could work as a subtype of the Point3D or Vect3D types above. Any code that uses a Point3D would only access x, y, and z and those are all safe to access on Particle so it is safe to treat Particle as a subtype in a structurally typed system.

Relative Merits

I think at this point it is worth comparing the relative merits of duck-typing to structural typing and I will focus on the comparison between templates in C++ and structural types in Scala so that the issue of static checking versus dynamic checking isn't in play. (Holy wars continue to wage over that particular distinction that would only muddy the waters here.) I'm also going to ignore the question of performance, because that is really an implementation detail. Templates in C++ are generally fast because the compiler generates distinct machine code for each distinct template usage. The only potential problem with that is the code bloat that can occur if a large piece of code has multiple template arguments and is called with a lot of different types. Structural types in Scala are implemented with reflection, making them much slower than normal calls. For that reason, structural types in Scala should generally be avoided. Use them when you need them and in situations where the performance impact won't be too significant. The real question I want to consider here is, which is better, having an explicit interface that types must adhere to in order to be used or having an implicit interface that comes about simply based on the usage of that type?

While I'm not addressing the static vs. dynamic type issue here, I have a funny feeling that similar lines will be drawn in the sand for this question as well. After all, those who like dynamic typing are probably more comfortable with the "freedom" of an implicit interface. Those who prefer the "safety" of static type checking are probably more prone to like to actually see the explicit interface. I fall generally into the latter camp. The implicit interfaces of templates in C++ can, in my opinion, lead to all types of challenges. How do you know what you can pass into a templated function? Well, you could go through all the code and find every required operation, or you could just try something and see if it works. Granted if it doesn't work, the error messages you get are likely to be both long and cryptic. While C++ compilers aren't generally known for producing easy to interpret error messages, few can argue that deeply nested template functions/methods can produce wickedly long and obfuscated error messages. (Did we really need 100+ lines of error output because the type I called this with doesn't have operator< defined?) The challenges of these implicit interfaces are also seen in the documentation for C++. For example, what is the interface for an iterator? It is documented as part of the standard libraries, but it isn't really anywhere in the code. There is nothing that says that a class must have these operations to be an iterator, it is just a convention that people follow. The addition of concepts in a future release of C++ might fix this, but for now it is one of the challenges of using the language.

I have to admit though that there are some freedoms to these implicit interfaces. Going back to iterators, the list of things that are required for the iterator types in C++ is fairly long. Writing the structural type in Scala that would be equivalent to that would be a real pain. So if the number of required members/operations is long, having to explicitly declare them makes for unhappy code. Of course, if it is something that is used frequently, the proper solution would be to create a trait/interface for that type and pass that around. Doing so would also get around the performance problems in Scala. However, that approach does remove some of the freedom you get from duck-typing or structral types. Going back to the iterator example, most code won't really need all of the methods/operations of an iterator. Most of the time you are only going to use a few key ones. While the standard library code has everything in it, if you write your own for a private library, it is really nice if you can get away with only implementing those things you really need and not everything else that would normally go there. I had my students writing lots of C++ iterators in a data structures class this last semester. I never forced them to write absolutely complete iterators. There was no point. They learned what they needed from a subset of the methods and it was sufficient for the purposes of what we were doing. It is true that in Scala, you can often write a trait that has a minimal set of abstract methods and defines other things in terms of these, but I have to admit that there is still a certain benefit to having to only write the code that is absolutely needed and there are even times when not having to write all the required members explicitly is also very handy.

No comments:

Post a Comment