Pattern matching in Python

Marius Eriksen (marius@monkey.org)
11 May 2009

One of my favorite things about various functional programming languages is pattern matching. It often allows for very succinct and elegant declarative expressions, and in the dynamic variants it allows for easy in-line lightweight type checking.

So, naturally I wanted the same in Python, the programming language we use at work.

Pattern matching is most powerful when it enjoys first-class support in a language. In addition to succinct syntax, this affords you the ability to integrate pattern matchers with control constructs, allowing conditional execution of code based on various patterns. It also may give you a degree of composability not possible otherwise. For example, Erlang has not only a case statement, but allows for clauses in functions, so that pattern matching is done on the arguments of functions with the same name and arity. For instance in Erlang, you could implement a simple REST-style HTTP request handlers like so:

%% handle(Path, Method)
handle("/", _) ->
  not_a_resource;
handle(Path, 'PUT') ->
  create_new_resource(Path);
handle(Path, 'POST') ->
  update_resource(Path);
handle(Path, 'GET') ->
  retrieve_resource(Path);
handle(_, _) ->
  invalid_request.

Now I can simply call handle(Path, Method) to deal with my request. Notice also how powerful ordering is here: I exclude the resource "/" entirely by matching on it first. Note also that some arguments here are binding, while others are not. For example in the first clause, nothing is bound, in the second, we bind Path if the method matches 'PUT'.

How do we graft something like this onto a language like Python? It’s especially tricky because we’d like to maintain the ever elusive “Pythonics.” While I’m quite sure Guido would never even touch this stuff, we can at least maintain the spirit! Start off by importing the match primitives:

>>> from match import M, A, A_

M creates a destructuring match expression (a “matcher”), A is the a binding argument, and A_ is the “any” argument. Any A arguments need to have a positional specifier. This is achieved with the division operator. So A/0 names the first value in the returned destructured tuple. With the help of a few operators, we compose a match expression:

>>> M((1, (A_, 3), A/1, A/0))
<match.M object at 0x726f0>

So, these expressions are entirely useless until they are bound. The == operator takes care of that:

>>> M((1, (A_, 3), A/1, A/0)) == (1, (2, 3), 1, 2)
(2, 1)

If you precede the match-expression with the ~ operator, the expression becomes a “pure” matching expression, and does not destructure the second operand, it just returns True or False depending on whether it matched successfully.

>>> ~M((1, (A_, 3), A/1, A/0)) == (1, (2, 3), 1, 2)
True

Now, to make things more interesting, match-expressions can be or-ed together, resulting in the first successful match. For example M([A/0]) | M(A/0) doesn’t care whether the value is a list of length 1 or a literal.

>>> M([A/0]) | M(A/0) == [5]
5
>>> M([A/0]) | M(A/0) == 5
5

Finally, matchers can specify “default” values to be returned they match successfully. This is to help deal with polymorphic return values. For example, the following expression:

>>> M([A/0]) | M([], d=False) == arr[:1]

picks the first element if arr is nonempty, otherwise it returns False.

I’ve put the code up on GitHub. I’m not super happy with the aesthetics of it but it’s interesting to experiment with. Among other things, it definitely needs list and dictionary destructuring.