Haskell is beautiful in practice

05 Nov 2009

I’ve been writing a bit of Haskell lately (see my GitHub page), and it’s an absolutely beautiful experience. And I mean that in every way: The language itself of course, but also the community, the documentation, the package system and the Haskell platform ultimately help make the practice of writing Haskell more pleasing than any other I’ve had any experience with.

I find that Haskell lets me translate so many ideas and abstractions naturally and without the fuzz of too much indirection. My hope is to give you a small taste of what I’m talking about. Please note that this is going to contain some deliberate (but mostly inconsequential) inaccuracies and I will also leave out details here and there. A lot of the code is also “paraphrased” to highlight the ideas behind it.

I recently wrote an implementation of BERT in Haskell (github). I had the need for an RPC mechanism, and it seemed to fit my requirements rather well. “BERTs” are a subset of Erlang terms, which are composable and have a straightforward external representation (albeit not as space efficient as something like Google’s protocol buffers). So I figured this might be a good starting point for demonstrating some of the substantial ways in which Haskell provides facilities to create programs that are not only succinct but also readable and robust. At the same time, I hope to convince you that Haskell provides the kind of abstraction & encapsulation that is often much more natural than other approaches.

Types

We need to be able to represent BERT terms in Haskell. Thus we introduce an algebraic type representing a Term (github).

-- | A single BERT term.
data Term
  -- Simple (erlang) terms:
  = IntTerm        Int
  | FloatTerm      Float
  | AtomTerm       String
  | TupleTerm      [Term]
  | BytelistTerm   ByteString
  | ListTerm       [Term]
  | BinaryTerm     ByteString
  | BigintTerm     Integer
  | BigbigintTerm  Integer

This is pretty straightforward to read: a Term can be either an IntTerm, a FloatTerm and so forth. The right column specifies the data for the type. The IntTerm carries an integer, etc. An important detail here is that these are not different types: they are all type constructors for the type Term. That is, these are different ways to create a Term type. Algebraic types may also be recursively defined. The ListTerm constructor represents a list of other Terms. Already with this simple declaration, we’ve expressed most of the semantics of BERT terms. We can now express composite BERT terms. For example BERT defines dictionaries to be represented as {bert, dict, [{key, value}]. In Haskell:

TupleTerm [ AtomTerm "bert", AtomTerm "dict"
          , ListTerm [TupleTerm [AtomTerm "key", AtomTerm "value"]]]

Typeclasses

Terms encapsulate the BERT type representation, and we will write code that can encode Term values to a binary representation (as per the BERT spec). However, these values are not the most convenient to work with in other Haskell code. Furthermore, many Terms have a natural “more primitive” Haskell type (eg. IntTerm vs. Int, ListTerm vs. []).

We would like to introduce Terms from these types and vice-versa. The most primitive way to do this is via pattern matching (this is ubiquitous in Haskell and other functional programming languages. Also see my poor attempt at implementing pattern matching in Python).

listify (ListTerm l)      = listify' l
listify' []               = []
listify' ((IntTerm x):xs) = x:listify' xs

This code introduces a list of integers ([Int]) from a ListTerm containing IntTerms. Clearly going around creating little unpackers like this for everything is going to be quite cumbersome. Haskell provides a very nice solution. Typeclasses let you declare common traits to a set of types. For example, I can define a typeclass that declares the ability to translate a value of that a given type into or out of a Term (github).

class BERT a where
  -- | Introduce a 'Term' from a Haskell value.
  showBERT :: a -> Term
  -- | Introduce a Haskell value from a 'Term'.
  readBERT :: Term -> a

This introduces two new functions. One to create a term from a Haskell value, and one to do the inverse. So what kinds of types can introduce Haskell values from BERT or vice-versa? A few are quite simple. The most trivial is for Term itself.

instance BERT Term where
  showBERT = id
  readBERT = id

To convert a Term to a Term is just the identity function. We need to cover a few more primitive Haskell types. The BERT definition for lists is:

instance (BERT a) => BERT [a] where
  showBERT xs            = ListTerm (map showBERT xs)
  readBERT (ListTerm xs) = map readBERT xs
  readBERT _             = error "Invalid list type"

Unlike Haskell lists, BERT lists can be heterogeneous: they are lists of Terms which, being an algebraic type, may eventually contain different types of data. Thus, to introduce a BERT list, we return a ListTerm that applies showBERT to every element in the list. This also explains the typeclass restriction on a: we require that the list we are encoding contains a type that also has a typeclass instance of BERT. Similarly, to decode a list, we call readBERT on each element, introducing a list with the decoded Haskell types. But what about the other way? Certainly we couldn’t introduce a Haskell list from a heterogeneous BERT list. The code above looks deceiving in this way, but type inference is going on in the background here. The type of readBERT is Term -> [a]. This inference is propagated when we pass readBERT to map as well; it’s equivalent to the following code:

  readBERT :: Term -> [a]
  readBERT (ListTerm xs) = map (readBERT :: Term -> a) xs

Note that you can also create your own typeclass instances that suits your needs. For example if your application keeps some internal representation of some well defined value, you could create a typeclass instance to convert this representation to and from Terms.

Lazyness

The RPC specification for BERT provides a rather opaque notion of a transport. The transport, in essence, is a channel through which you can send and receive BERT terms. The terms are wrapped in a 4-byte header specifying its length.

Haskell has a Handle type that is an opaque representation of a file-like object. Sockets can also be fronted by handles. A popular module, Data.ByteString provides a lazy bytestring type that can be backed by a handle. It’s a type that looks like a bytestring, smells like a bytestring, and acts like a bytestring, except that when it needs more data, it requests it from the handle it’s been constructed with. Similarly, our binary term decoder reads from such bytestrings, so it does not take much imagination to produce a list that represents the infinite stream of packets coming from the BERT peer (github).

packets :: ByteString -> [Packet]
packets b
  | null b = []
  | otherwise = p:packets b' 
  where (p, b') = parsePacket b

(In Haskell, : is the list constructor: it prepends an item to a list, eg. 1:[2,3] == [1,2,3].) packets reads like a declaration: if our bytestring, b is null, it is the empty list, otherwise, it is the first parsed packet from b, plus the list of packets represented by the remainder of the string. This works in Haskell because everything is evaluated lazily. This in effect means that, unless you specify otherwise, values are not computed (code is not evaluated) until needed. In practice, the Haskell runtime leaves a “promise” of a computation when the value is not needed immediately. The list constructor : is no different: its arguments are not evaluated until needed. In our example this means that the list defined by packets is lazily generated. So if we create such a value with packet b, not until we examine the resulting list do we even begin parsing (or generating the list, or reading from the socket). This is an example by which we use lazyness to provide an abstraction. It allows us to operate on the familiar list type while not being terribly concerned about how the list is being constructed. It also relieves us of having to create any further abstraction: we can translate our model of a transport almost perfectly into a primitive Haskell concept without having to go about creating unfamiliar interfaces.

Monadic parsing

We created an algebraic type to represent terms, but their construction (if you need to work with the term type) can be a little cumbersome. Our representation

TupleTerm [ AtomTerm "bert", AtomTerm "dict"
          , ListTerm [TupleTerm [AtomTerm "key", AtomTerm "value"]]]

would be represented more succinctly in the erlang grammar as:

{bert, dict, [(key, value)]}

If just writing code, you probably don’t care too much about the difference: you typically don’t make big static constructs, but rather create types programmatically anyway. However, with the BERT implementation I wanted to have a command line tool that allowed for testing & inspection of results. For example:

$ bert call localhost:8181 mod proc "{1, test, [5,6,7]}"

Luckily, Haskell provides an excellent facility for parsing: Parsec. Parsec uses monadic actions to simultaneously lex & parse its input. I won’t go into monads here, but for our purposes here it is essentially a means by which you can run code in a certain context, and manipulate that context. It can provide a pure functional construct with which you can achieve the appearance of imperative programming. In the context of Parsec, its monad maintains the parser state (eg. where in the input are we, which rules have failed, which rules are yet to be tried, what is the output, etc.), and provides a set of functions that can operate within the context of the monad it provides. Here’s our parser for a term.

p_term :: Parser Term
p_term = t <* spaces    
  where 
    t =     IntTerm               <$> p_num (readSigned readDec)
        <|> FloatTerm             <$> p_num (readSigned readFloat)
        <|> AtomTerm              <$> p_atom
        <|> TupleTerm             <$> p_tuple
        <|> BytelistTerm . C.pack <$> p_string
        <|> ListTerm              <$> p_list
        <|> BinaryTerm   . B.pack <$> p_binary

Again, reading very naturally: a term is t possibly followed by some whitespace. (a <* b means roughly “run a, then b, but the value of the expression is the value of a”). t in turn can be either a p_num wrapped with an IntTerm, and so forth. (In this context, <|> can very roughly be thought of as “otherwise, try..” and <$> as “wrap the results of the argument to the right with the thing on the left”.)

The various p_*s are other parsers. For example the one for tuples is

p_tuple =
  between (char '{' >> spaces) (spaces >> char '}') $
    p_term `sepBy` (spaces >> char ',' >> spaces)

between is one of the Parsec functions that means: between the parses of the following arguments, try to parse the third. The third argument states: “try and parse a p_term that is separated by maybe some whitespace, a comma and maybe some more whitespace.”

This is really just the tip of the ice berg. Most of my descriptions, especially those regarding the use of monads belie the rigor and generality of these constructs. If you’d like to explore further, the Real World Haskell book is a great start point. The Haskell community is simply fantastic. Much conversation happens on IRC, where there is typically an armada of people just waiting to discuss the finer points of this wonderful language with you.