marius a. eriksen

Hints for writing Unix tools

2014-10-20T00:00:00-07:00

Note: this article has been translated into Japanese

The workaday world of a modern programmer abounds with Unix tools, stitched together in myriad ways. While good tools integrate seamlessly with your own environment, bad ones will constantly frustrate your efforts. Good tools have a seemingly limitless application, constrained only by your own imagination. Bad tools, on the other hand, will often require that you deploy a salvo of brittle hacks to keep them barely working in your own environment.

“One thing well” misses the point: it should be “One thing well AND COMPOSES WELL”
— marius eriksen (@marius) October 10, 2012

I don’t want to attempt to explain what makes for good design; this has been discussed elsewhere. Instead, I want to outline a few established customs that you should take care to follow when writing new tools. While making a truly good tool can be an elusive goal, it isn’t difficult to avoid making a truly bad one. Unix demands good citizenry from its tools: it relies on a set of conventions to make things work, and importantly, to compose, well. Here follows a few key customs, often violated. These aren’t absolute requirements, but you should think long and hard before violating them.

Consume input from stdin, produce output to stdout. Put another way, your program should be a filter. Filters are easily integrated into shell pipelines, arguably the most important utility for Unix tools composition.

Output should be free from headers or other decoration. Superflous output will frustrate users who are trying to parse tool output. Headers and decoration tend to be less regular and more idiosyncratic than the structured data you’re really trying to get at. Don’t do it.

Output should be simple to parse and compose. This usually means representing each record as a single, plain-text formatted line of output whose columns are separated by whitespace. (No JSON, please.) Most venerable Unix tools—grep, sort, and sed among them—assume this. As a simple example, consider the following output from a benchmark suite. It is formatted by starting each record with the benchmark name, followed by a set of key-value pairs associated with the named benchmark. This is a flexible structure to work with as it allows you to add or remove keys at will without violating the output format.

	$ ./runbenchmarks
	Benchmark: fizzbuzz
	Time: 10 ns/op
	Alloc: 32 bytes/op
	Benchmark: fibonnacci
	Time: 13 ns/op
	Alloc: 40 bytes/op
	...
	$

While convenient, it is quite clumsy to work with in Unix. Consider a very common thing we might want to do: look up the timing results for a single benchmark. Here’s how you do it.

	$ ./runbenchmarks | awk '/^Benchmark:/ { bench = $2}  bench=="fizzbuzz"'
	Benchmark: fizzbuzz
	Time: 10 ns/op
	lloc: 32 bytes/op
	$

If instead each line presents exactly one record, where columns are separated by whitespace, this becomes a much simpler task.

	$ ./runbenchmarks 
	fizzbuzz	10	32
	fibonnaci	13	40
	...
	$ ./runbenchmarks | grep '^fizzbuzz'
	fizzbuzz	10	32
	$

The advantage becomes even more evident when reordering or aggregating the input. For example, when the output is record-per-line, sorting the results by time spent is a simple matter of invoking sort:

	$ ./runbenchmarks | sort -n -r -k2,2
	fibonnaci	13	40
	fizzbuzz	10	32
	...
	$

Treat a tool’s output as an API. Your tool will be used in contexts beyond your own imagination. If a tool’s output format is changed, other tools that compose or otherwise build on its output will invariably break—you have broken the API contract.

Place diagnostics output on stderr. Diagnostics output includes anything that is not the primary data output of your tool. Among these are: progress indicators, debugging output, log messages, error messages, and usage information. When diagnostics output is intermingled with data, it is very difficult to parse, and thus compose, the tool’s output. What’s more, stderr makes diagnostics output more useful since, even if stdout is being filtered or redirected, stderr keeps printing to the user’s terminal—the ultimate target of diagnostics output.

Signal failure with an exit status. If your tool fails, exit with a status other than 0. This allows for simple integration shells, and also simpler error handling in scripts. Consider the difference between two tools that build binaries. We’d like to build upon this tool to execute the built binary only if the build succeeds. Badbuild prints the word ‘FAILED’ as the last line when it fails.

	$ ./badbuild binary
	...
	FAILED
	$ echo $?
	0
	$ # Run binary on successful build.
	$ test "$(./badbuild binary | tail -1)" != "FAILED" && ./binary
	$

Goodbuild sets its exit status appropriately.

	$ ./goodbuild
	$ echo $?
	1
	$ # Run binary on successful build.
	$ ./goodbuild binary && ./binary
	$

Make a tool’s output portable. Put another way, a tool’s output should stand on its own, requiring as little context as possible to parse and interpret. For example, you should use absolute paths to represent files, and fully qualified hostnames to name internet hosts. Portable output is directly usable by other tools without further context. A frequent violator of this is build tools. For example, both the GCC and Clang compilers try to be clever by reporting paths that are relative to your working directory. In this example, the source file paths are presented relative to the current working directory when the compiler was invoked.

	$ cc tmp/bad/x.c
	tmp/bad/x.c:1:1: error: unknown type name 'INVALID_C'
	INVALID_C
	^
	tmp/bad/x.c:1:10: error: expected identifier or '('
	INVALID_C
	         ^
	2 errors generated.
	$

This cleverness breaks down quickly. For example if I use make(1) with the -C flag.

	$ cat tmp/bad/Makefile
	all:
		cc x.c
	$ make -C tmp/bad
	cc x.c
	x.c:1:1: error: unknown type name 'INVALID_C'
	INVALID_C
	^
	x.c:1:10: error: expected identifier or '('
	INVALID_C
	         ^
	2 errors generated.
	make: *** [all] Error 1
	$

Now the output is less useful: to which file does “x.c” refer? Other tools that build on this need additional context, the -C argument, in order to interpret the compiler’s output—the output does not stand on its own.

Omit needless diagnostics. Resist the temptation to inform the user of everything that is being done. (But if you must, do it on stderr.) A good tool is quiet when all is well, but produces useful diagnostics output when things go wrong. Excessive diagnostics conditons users to ignore all diagnostics; useful diagnostics output does not require the user to grub around in endless log files to discern what went wrong, and where. There’s nothing wrong with having a verbose mode (typically enabled by a ‘-v’ flag) in order to aid development and debugging, but do not make this the default.

Avoid making interactive programs. Tools should be usable without user interaction beyond what’s provided by the user’s shell. Unix programs are expected to run without user input: it allows programs to be run in non-interactively by cron, or to be easily distributed for execution by a remote machine. Even a single interaction forfeits this very useful capability. Interactivity also makes composition more difficult. Since Unix’s program composition model does not distinguish the output of the various programs involved, it isn’t always clear which program a user is even interacting with. A common use of interactive programs is to ask the user to confirm some dangerous action. This is easily avoided by asking the user instead to supply a flag on the command line to the appropriate tool.

I wrote this because I find myself continually frustrated by attempting to use and compose bad tools—bad tools that waste time and limit their own usefulness. Most of these tools could be made a lot better by following the above advice.

For a more general discussion of Unix tools design, I encourage you to read Kernighan and Pike’s “The Unix Programming Environment .”

Discussion on Hacker News.

SOSP 2013 Trip report, and a note on systems research

2013-11-09T00:00:00-08:00

The twenty-fourth ACM Symposium for on Operating Systems Principles (SOSP) was held at the Nemacolin Woodlands resort in Western Pennsylvania. (Nemacolin is a luxury resort, and it wants you to know it. It’s sort of like a Disneyland rendition of the palace of Versailles.) SOSPs are usually held in slightly isolated settings. It forces everyone to stay together for the entire affair—an all-inclusive conference. This works: your time there is consumed by SOSP activities. There is one track of sessions during the day, immediately followed by receptions, poster sessions, BoFs, and WIPs—open bars abound—followed by banquet dinners. This is Woodstock for systems researchers.

On the Sunday before the main conference I attended the PLOS’13 workshop where I presented a paper, “Your Server as a Function” Slides are available; the talk was not recorded. about some of the core abstractions and software systems that we use to construct our server software—i.e. our systems software stack. Russ Cox gave the workshop keynote, on Go. Russ did a great job motivating the philosophy behind Go’s type system, concurrency constructs, and its package system. I’ve had an eye on Go for a while: its focus on simplicity is a healthy response to the complexity creep that seems to plague just about every other “industry” language. I’m curious to see how well Go’s approach will hold up to time, and whether they truly manage to keep it simple in the face of larger code bases, greater demand for generic code, and a changing hardware and software landscape. Go’s designers have a healthy attitude, though. Anil Madhavapeddy maintained a liveblog of the entire event.

Monday through Wednesday featured the main conference. SOSP is a single track conference, and the talks are very well attended. There are lots of good questions—some are asked in anger!—and the whole affair is quite engaging. The talks were all well-rehearsed; there were few awkward pauses or speakers who stumbled.

My 3 favorite talks were also the best paper award winners.

“The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors” describes a tool, commuter, which uses symbolic execution on models (effectively, “pseudo-code” Python implementations of the underlying operations) to determine what operations and argument combinations commute—that is, which pairs of operations may be executed independently of order? commuter allows you to statically analyze systems—more accurately: models of systems—and determine where operations may execute concurrently. The authors implement a filesystem in order to show the applicability of commuter.

“Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior” describes a tool, stack, which sniffs out where applications rely on behavior that is undefined in C. (Did you know that pointer arithmetic overflow is undefined? Me neither.) The authors created a model for understanding such unstable code, and developed a static checker to find such code. Finally they ran stack on a number of open source projects (Kerberos, Postgres, the Linux kernel) finding a large amount of unstable code. This was a fun presentation and paper.

Perhaps my favorite paper was “Naiad: a timely dataflow system.” Naiad is a low-latency dataflow system that admits cycles, the upshot of which is support for dataflow iteration. For example, you can mix computing pagerank with more standard mapreduce-like data processing. It presents a single, unified model for doing so. Naiad uses a vertex-communication model similar to Pregel but without synchronous iteration: incoming messages are computed incrementally. Naiad maintains a versioning scheme and distinguishes loop ingress- and egress-nodes. The story for fault recovery is a bit weak—currently a synchronous recovery mechanism is employed—and may be its achilles heel; I’m curious to see it develop.

I also enojyed Gernot Heiser’s microkernel retrospective. It’s great to see such long and impactful lines of research, and better still to attempt to draw lessons from the process.

The first day of the conference was liveblogged.

A note on systems research

As an industry participant, it’s dissapointing to see a number of very important problems given short shrift.

The distributed systems tooling gap. We have good tools for understanding behavior of single-system processes: Debugging and diagnostics tools like Dtrace and perf, are among these. Equivalent tooling for distributed systems are weak to nonexistent. Google’s Dapper is one of the only published systems that come to mind which attempts to tackle parts of this problem. In reality, almost all organizations that deploy sizable distributed systems have their own set of tools and systems—aggregation and display of system statistics, RPC tracing, alerting, etc.—but these are rather primitive and unwieldy compared to tools like Dtrace and perf. I think there are a lot of important (and interesting!) problems to be solved here, but it seems to be beneath the radar of the academy.

Cost control and isolation in “the cloud”. Cost control is paramount in large systems deployments. It’s important to understand the cost structure of your system, and the trade-offs involved in optimizing. This is especially important in large multi-tenant systems. Service-oriented architectures, where different uses (“features”) share common systems, exacerbate this further. We have good, coarse-grained, isolation in operating systems—processes, virtual machines, containers, etc.—but vertical isolation in distributed systems is poor to nonexistent.

Distributed systems modularity. We reach for many tools when structuring software—classes, mixins, the ML module system, Go packages and interfaces, etc.—but in distributed systems we have far fewer, far more primitive ones. The state of the art is pretty much using some form of interface definition langauge (e.g. thrift, protobufs) to define module boundaries. These are then manually glued together. This is like having only C header files and dlopen(3). Worse, because services tend to follow organizational boundaries, and because it’s tedious to maintain a large number of systems, these interfaces tend to become kitchen-sinks comprising everything a particular team is responsible for, whether or not these functionalities are even related. It’s considered good programming practice to focus on compositionality: build software out of small, well-defined modules that combine to give rise to other modules with different behaviors. This is simply too difficult to do in distributed systems. Why?

Authentication. Kerberos showed us how to do authentication in distributed systems. It’s still a good model, but a lot has changed since 1988. In particular, service-oriented architectures make session authentication less useful: usually a request is performed on someone else’s behalf. Additionally, because central scheduling systems (e.g. Mesos, Omega) are actually responsible for provisioning and managing processes, there needs to be a chain of trust. Above all, such a system must be simple to operate and easy to understand and audit.

You should participate in CUFP this year

2013-06-02T00:00:00-07:00

The Workshop for Commercial Users of Functional Programming 2013 (CUFP) has been meeting annually for nearly 10 years. These same years have seen a staggering growth in the commercial application of functional programming languages and techniques — their influence is felt in even the most established languages, libraries, and communities. Attendance has grown nearly ten-fold.

The deadline for submitting a proposal to present is June 29. See the call for presentations to details.

Many of the largest internet properties are using a functional language for their primary service development — Twitter, Tumblr, and Foursquare among them. Functional language have always had a toe-hold in the financial industry, and its influence there is continuing to grow — in 2011 Lennart Augustsson from Standard Charter gave the CUFP keynote address about Mu, a dialect of Haskell used there for trading and analysis; Yaron Minsky highlighted Jane Street’s use of OCaml with a talk in 2012.

The CUFP workshop, which I am chairing together with Michael Sperber this year, comprises a day of talks and two days of tutorials. The talks focus on the use of functional languages in a practical setting: lessons learned, novel applications and techniques, organizational adaptation and, as well, what didn’t didn’t work?

I recommend perusing past workshop schedules and videos on the CUFP website. Some of my recorded favorites are Bryan O’Sullivan’s 2009 keynote, “Real World Haskell” — where Bryan talks about the value of community — and from 2011, Gregory Wright’s “Fourteen Days of Haskell” describes a heroic effort to implement antenna control software for low power devices in a mere 2 weeks — a problem domain that turned out to be a great fit for Haskell.

The CUFP tutorials are high quality affairs — often taught by the people that literally wrote the book — and aim to instill much knowledge in a small amount of time. This year, we have a Haskell tutorial taught by luminaries Andres Löh and Simon Marlow; Yaron Minsky and Anil Madhavapeddy are taking a break from writing Real World Ocaml to give a tutorial about the very same subject; Oleg Kiselyov, a long-time pillar of the FP community is giving a tutorial on MetaOCaml. Erlang gets double treatment: Simon Thompson opens with an Erlang tutorial focusing on concurrency and multi-core programming; Steve Vinoski, of Basho, finishes by teaching us Erlang web programming. Clojure is also represented beginning with a half-day language tutorial from Luke Vander Hart, followed by Leonardo Borges teaching one on Clojure macros and DSLs. Finally, Dean Wampler is giving a double feature on Scala, beginning with a language tutorial focusing on functional programming and concurrency; Dean will then hold court on Scalding, a powerful Scala-based tool for doing analytics over large datasets.

No workshop would be complete without a keynote: “SmallTalk” Dave Thomas is giving our keynote this year on a as-yet-unrevealed topic relating to functional programming.

As well as the main ICFP conference, there are other affiliated events that are well worth attending. The Haskell Symposium is held the day after the CUFP talks, and the OCaml workshop the day after that.

Submissions for the main workshop (see the CFP) are due by 29th of June, 2013. Please consider making a submission if you have experience you want to share, technologies or techniques you want to talk about, or just about anything else you think would be compelling to the audience. To get a feel for the nature of the talks, I recommend reading the scribe report from 2011.

Follow @cufpconference on Twitter for more updates on the workshop, and please do not hestitate to contact Mike or I if you have any questions:

marius(at)monkey(dot)org
sperber(at)deinprogramm(dot)de

See you in Boston!

Futures aren't ersatz threads

2013-04-02T00:00:00-07:00

(Originally published at aboutwhichmorelater.tumblr.com.)

Concurrent programming is an increasingly important topic in the context of building modern distributed systems. Most such systems have a large amount of inherent concurrency. For example: To accommodate for large corpus sizes, a search engine splits its index into many small pieces. In order to efficiently satisfy queries, it must issue requests to each of these shards concurrently.

Threads (or processes) are a common abstraction for programming concurrent systems: there are multiple threads of execution, each of which has its own stack. The system manages how these threads are mapped onto physical hardware, and how they are preempted and interleaved. This allows the operating system to suspend execution of a thread while it is waiting for I/O operations, allowing other threads to proceed. Traditionally threads have had both high fixed costs (stacks must be allocated in increments of page sizes) as well high context switching costs. The emergence of event based programming addressed these issues: there is only one thread, with a run-loop, and I/O events are explicitly dispatched (eg. data is ready to be read, a write completed, etc.). While this reduces the cost of threads (you have only one stack, you don’t need the OS to interleave your executions), the programming model is awkward: the programmer has to split the program up into pieces delimited by I/O operations. In C, in particular, you have to declare separate functions for each of these pieces. The introduction of the libevent book has more details. This is also the model of node.js though, because javascript has first-class closures, the problem is ameliorated somewhat: by nesting callbacks, you can easily share context. This improves brevity but not modularity: each sequence of actions is fixed in place, errors must be handled very carefully. Nontrivial applications become difficult to reason about very quickly.

The assumptions upon which event-based programming is based have largely withered: context switches aren’t very expensive anymore, and memory has become plentiful. However, the degree to which our systems need to be concurrent has also increased. It is not uncommon to handle 100s of thousands, if not millions, of operations simultaneously. The archetypical example of this sort of application are so-called “long-polling” web servers.

Languages like Haskell and Go implement lightweight threads in their runtimes, allowing for cheaply maintaining millions of threads of execution, with the runtime itself multiplexing I/O operations. Go manages this by using segmented stacks: it allocates stack space as needed. This requires both the complicity of the compiler and a different ABI, but that’s the beauty of being able to start over.

So, clearly, “events vs. threads” is a false dichotomy; the statement doesn’t even make much sense, and conflates two separate concerns: They denote two concrete programming models, differing in (1) how context is encoded (heap vs. stack), and (2) where multiplexing is done (in a library, or runtime/OS).

In Finagle and elsewhere, we use composable futures as the basis of our concurrent programming model (these are quite different than the venerable java.util.Future and its brethren). Futures, it is often argued, make up for the deficiencies of traditional “event programming”: callbacks are tedious, compromise modularity, and make for spaghetti-like code that is difficult to understand. Futures correct this by introducing constructs that make callbacks manageable.

This misses the point.

Futures offer a concurrent programming model that translates naturally to the types of concerns that arise when building distributed systems. Take for instance remote procedure calls; RPCs are inherently asynchronous. This means that each component operates at their own speed, and the components fail independently of each other. There is no shared clock, and no shared bus. RPCs usually have dramatically different latency characteristics than a typical local operation. In fact, this holds for any sort of I/O, not just RPCs.

Futures model the real world truthfully. A Future[T] represents a T-typed result, which may or may not be delivered at some time in the future, or which may fail altogether. Like real-world asynchronous operations, a future is always in one of 3 states: incomplete, completed successfully, or completed with a failure.

Futures provide a firewall between synchronous and asynchronous operations. Because futures are typed differently than regular values (a value furnished by a future has type Future[T], not T), you can easily, at a glance, discern which operations imply asynchronous semantics. This allows you to reason more easily about the runtime semantics of your code: not only are such operations slower, but they have very different failure semantics. Furthermore, futures are “tainting”: if you want to incorporate the result of a future, you have to “lift” other values in to them. That is, if you are computing anything that requires an asynchronous, the operation itself becomes asynchronous - its type must be Future[T]. The upshot is that semantics are witnessed by types; we can use the type system to model and enforce behavior. Put another way: we can use the compiler to guarantee certain behavior.

Futures compose well. Futures, like physically asynchronous operations, are closed under composition. This is important: the composition of two asynchronous operations can in turn be described as an asynchronous operation. Futures work the same way: I might compose two Futures sequentially (eg. because there is a data dependency) or concurrently (because there is not), both of which produce a new future. When composing Futures, failure handling is baked in: When composed sequentially, the sequence of operations short circuit at the first failure; when composed in a concurrent configuration, any underlying failure also fails the composed operation.

Futures are persistent A Future[T] always means the same thing, refers to the same underlying operation, no matter how it is later used. To string operations together (concurrently or sequentially), new futures are produced, with new semantics. This makes reasoning about Futures very simple; you needn’t worry about how they might be used or modified elsewhere.

Futures liberate semantics from mechanics. By stringing futures together via the various combinators, the programmer is expressing the semantics of an operation, not its execution mechanics. In this example, adapted from the Finagle User’s Guide, we fetch all images in a web page:

	val allImages: Future[Seq[Image]] =
	  fetchUrl(url) flatMap { bytes =>
	    val fetches = findImageUrls(bytes) map { url =>
              fetchUrl(url)
            }
	    Future.collect(fetches)
	  }

This can be read thus:

allImages is the operation that first fetches url, then for each image URL in the body of the page that was fetched, the result of fetching all of those URLs as images in turn.

The code is fairly close to the English description of what we want to do. We have described the operation, but not how to perform it. In the description, there is inherent concurrency: the subsequent image URLs may all be fetched at the same time; this translates naturally into the concurrent combinators used with Futures, in this case Future.collect. Failure handling is omitted from the description, and is also omitted from the code. In this case, it is clear that we cannot proceed if any operation fails, and those are also the semantics of the composite operation (allImages). The data dependencies, inherent to the problem being solved, are all-revealing.

This separation enhances the ease of reasoning: the meaning of the operation isn’t intertwined instructions for how to go about executing it. No threads are spun up, no errors explicitly handled, no explicit coordination mechanism is required to gather the result of the image fetches. Instead, there are only data dependencies — to fetch all of the images, we need to fetch their URLs; to get the URLs we need to fetch the web page.

This property also allows us to separately consider the runtime behavior of such operations. The Future implementation might choose to limit the number of concurrent operations, for example, or to exploit inherent locality (like biasing connection pools so that a given connection is always handled by one underlying thread). A library might thread through tracing data to diagnose operations. This is not hypothetical: Finagle does all of this, and it’s done entirely as a library without any special runtime support.

Q.E.D.?

Futures present a concurrent programming model that is appealing on its own. They should not be seen as a poor-man’s version of threads.

Ultimately such abstractions exist in the service of writing robust, safe, performant, and modular software. Futures have served us very well in this regard at Twitter, and I think they’ll have a long shelf-life as a go-to abstraction for constructing concurrent systems. We’re in the Cambrian period of concurrency, and I predict futures will emerge as one of the survivors.

This isn’t to say that other models aren’t also good. I’m personally a huge fan of Go/Haskell’s cheap threads model: because they are built into the language and runtime, their usage is enforced. This is perhaps the main sticking point of futures. Because they are implemented as 3rd-party libraries, their usage can be spotty, and you may find yourself always writing little bridges or adapters. At Twitter, we’re lucky in that we have a common system upon which our systems are built, so this is less of a problem. But this isn’t inherent to the model: C# and F# has it into the language and runtime. Scala’s SIP-14 also promises to unify that ecosystem.

Implementing python style generators with delimited continuations

2010-09-12T00:00:00-07:00

Jake’s article on implementing LWT-compatible fibers with Oleg’s delimited continuation library got me thinking about one of my favorite Python features: generators.

Delimited continuations

Delimited continuations have a very simple API (note: I’m going to consider only the high level API provided by delimcc here consistent with the abstract view of delimited continuations. Jake describes the low-level API in his article). The core API is:

val new_prompt   : unit -> 'a prompt
val push_prompt  : 'a prompt -> (unit -> 'a) -> 'a
val shift0       : 'a prompt -> (('b -> 'a) -> 'a) -> 'b

new_prompt simply creates the “handle”, push_prompt (traditionally called reset) sets the limit of the continuation (how far up the stack we will go), shift0 captures the continuation and gives it to the passed function. Crucially, the continuation captured by shift0 is delimited by the outer push_prompt. The “region” of the continuation delimited by push_prompt and shift0 consists of the stack frames between the two (this also means you cannot call shift0 anywhere outside the scope of push_prompt).

An example should clarify. Let’s first do something illegal.

# open Delimcc;;
# let p = new_prompt ();;
value p : Delimcc.prompt '_a = <abstr>
# push_prompt p identity;;
- : unit = ()
# shift0 p (fun k -> k ());;
Exception: Failure "No prompt was set".

Here, we tried to shift0 outside of the push_prompt. Nope, can’t do that.

# push_prompt p (fun () -> shift0 p (fun k -> k 123));;
- : int = 123

Here, the inner function (passed to shift0) was given the captured continuation k (as delimited by push_prompt), calling it with the argument 123 which was subsequently returned.

The neat thing about delimited continuations is that they capture the state entire delimited stack. That is, calling k ARG from within shift0 is equivalent to replacing the shift0 statement with ARG, returning the result of push_prompt. Furthermore, it’s a first class continuation, so we may use it multiple times. To wit:

# push_prompt p (fun () -> 10 * shift0 p (fun k -> k 1 + k 3));;
- : int = 40

Here, k is equivalent to the function ( * ) 10. The argument & return types needn’t be uniform:

# push_prompt p (fun () -> string_of_int (shift0 p (fun k ->  "hey " ^ k 123)));;
- : string = "hey 123"

Leaking continuations

What makes delimited continuations yet more interesting is that the continuations themselves can escape the scope of shift0, and be restarted from outside.

Because of the type signatures of shift0, we need to unify the types, so let’s first define an ADT to do so:

# type t = Done | More of (unit -> t);;

Note that we can’t just use option here, it seems, because we need a recursive type.

# let count = ref 0;;
# let More k = push_prompt p (fun () -> 
  while true do 
    count := !count + 1; 
    shift0 p (fun k -> More k)
  done; 
  Done);;

Here we are using our unified type trick to return the continuation itself (note how push_prompt and shift have the same return type). Let’s play!

# !count;;
- : int = 1
# k ();;
- : t = More <fun>
# k ();;
- : t = More <fun>
# !count;;
- : int = 3

Generators

We now have a pretty good idea of what’s needed to implement generators with an API similar to:

let producer () =
  let state = ... in
  gen begin fun yield ->
    yield state; ...; yield state; ...; yield state
  end

So we need a function gen that takes a “yielder” (in Python terminology) that needs to capture the continuation, communicate the value and resume execution when the next value is asked for by the consumer:

let consumer () = 
  let g = producer () in
  let v0 = g () in
  ...
  let vN = g ()

It turns out that this is a very natural fit, and we can implement full duplex generators in just a few lines, using the ideas developed above. Here is the full source code:

type ('a, 'b) t = Done | More of 'a * ('b -> ('a, 'b) t)

let gen f =
  (*
   * Note: the first value to yield gets thrown away as the generator
   * has not yet started.
   *)
  let start _ =
    let p = Delimcc.new_prompt () in
    Delimcc.push_prompt p begin fun () ->
      f (fun x -> Delimcc.shift0 p (fun k -> More (x, k))); Done
    end in
  let next = ref start in

  fun rv ->
    match !next rv with
      | More (x, k) -> next := k; Some x
      | Done        -> None

And sample use:

let rec take_all t count =
  match t count with
    | Some i -> printf "took: %d\n" i; take_all t (count + 1)
    | None   -> ()

let () =
  let take = gen begin fun yield ->
    for i = 0 to 10 do
      let rv = yield i in
      printf "got: %d\n" rv
    done
  end in
  take_all take 100

The only additional work we have to do is to type it for bi-directional communication, and to store the next continuation in a ref cell for the next invocation. Given that the generators developed here are full duplex, we can implement co-routines as outlined in PEP 342.

You can also get it as a gist here. I also highly recommend Jake’s article which explores delimcc from a lower level of abstraction.

Self-contained emacs

2010-02-21T00:00:00-08:00

One annoying thing about using emacs on remote systems is the absence of your own initialization & configuration code. It’s of course possible to push your own, but often it’s annoying. Configuration often gets rather involved: my own .emacs.d directory contains many packages and libraries that are not shipped with standard emacs distributions. Another common scenario is that you use shared accounts, making editor configurations rather intrusive.

So, I sought to simplify the situation.

make-emacs creates for you a simple, self-contained and relocatable script that allows you to invoke emacs with your own configuration anywhere. For example:

    $ make-emacs ~/.emacs.d /tmp/e
    $ scp /tmp/e remoteserver:
    $ ssh remoteserver
    $ ./e MYFILE
    extracting emacs.d..
    <emacs bliss>

It uses shar to create a self-extracting archive and wraps that extraction code to invoke emacs properly. It also caches the extracted configuration files, so that it only has to perform a potentially costly shar extraction once.

Get it on GitHub here.

Emacs as a tiling window manager

2010-01-26T00:00:00-08:00

I’ve been using emacs in in full-screen mode lately. It provides a nice, distraction-free environment. If you’re using carbon emacs it’s quite easy:

(defun mac-toggle-max-window ()
  (interactive)
  (set-frame-parameter nil 'fullscreen
                       (if (frame-parameter nil 'fullscreen)
                           nil
                         'fullboth)))
(global-set-key (kbd "<C-M-return>") 'mac-toggle-max-window)

I also really enjoy the efficacy of navigation provided by tiling window managers such as xmonad. So, why not make emacs behave like it?

emacsd-tile.el is a really tiny configuration snippet to provide just this.

It provides xmonad-inspired keyboard shortcuts:

`M-{j,k,h,l}`	→	move to window (down, up, left, right)
`M-S-{j,k,h,l}`	→	enlarge/shrink horizontally/vertically
`M-C-{j,k,h,l}`	→	swap with (down, up, left, right)

This mostly uses just stuff that was already in emacs! The only part I had to implement myself was window swapping.

Another really handy feature of emacs is the ability to save window configuration in registers. With this, you can “stash” away an entire window configuration and recall it later.

Here’s a screenshot, though it’s not really descriptive because it doesn’t show the keybindings in action, but it may convince you that emacs can at least look like a tiling window manager :-)

Any other improvements or suggestions?

Beautiful fixed-width fonts for OSX

2010-01-10T00:00:00-08:00

(Or, “an ode to 9x15”)

I’ve always had great appreciation for the fixed-width fonts distributed with X11 (“misc-fixed-*”): 6x13, 7x14, and especially 9x15 (my all-time favorite).

I never really found any type so satisfying. It is extremely crisp and legible, is not dull, and doesn’t go all fuzzy in its bold variant. Its glyphs are extraordinarily well-suited to the kinds of strange things we tend to do with the ASCII character set in code.

Charlie Cheever previously converted 9x15 to an OSX dfont, so I followed suit to fill in with the rest of the sizes using FontForge.

http://monkey.org/~marius/x11-misc-fixed.tar.gz

Included are 6x13, 7x14, 9x15 and 10x20. Make sure to use them only with their respective sizes (ie. 13pt for 6x13, 15pt for 9x15 and so on) and without anti aliasing turned on (bolds won’t render correctly).

Enjoy!

Haskell is beautiful in practice

2009-11-05T00:00:00-08:00

I’ve been writing a bit of Haskell lately (see my GitHub page), and it’s an absolutely beautiful experience. And I mean that in every way: The language itself of course, but also the community, the documentation, the package system and the Haskell platform ultimately help make the practice of writing Haskell more pleasing than any other I’ve had any experience with.

I find that Haskell lets me translate so many ideas and abstractions naturally and without the fuzz of too much indirection. My hope is to give you a small taste of what I’m talking about. Please note that this is going to contain some deliberate (but mostly inconsequential) inaccuracies and I will also leave out details here and there. A lot of the code is also “paraphrased” to highlight the ideas behind it.

I recently wrote an implementation of BERT in Haskell (github). I had the need for an RPC mechanism, and it seemed to fit my requirements rather well. “BERTs” are a subset of Erlang terms, which are composable and have a straightforward external representation (albeit not as space efficient as something like Google’s protocol buffers). So I figured this might be a good starting point for demonstrating some of the substantial ways in which Haskell provides facilities to create programs that are not only succinct but also readable and robust. At the same time, I hope to convince you that Haskell provides the kind of abstraction & encapsulation that is often much more natural than other approaches.

Types

We need to be able to represent BERT terms in Haskell. Thus we introduce an algebraic type representing a Term (github).

-- | A single BERT term.
data Term
  -- Simple (erlang) terms:
  = IntTerm        Int
  | FloatTerm      Float
  | AtomTerm       String
  | TupleTerm      [Term]
  | BytelistTerm   ByteString
  | ListTerm       [Term]
  | BinaryTerm     ByteString
  | BigintTerm     Integer
  | BigbigintTerm  Integer

This is pretty straightforward to read: a Term can be either an IntTerm, a FloatTerm and so forth. The right column specifies the data for the type. The IntTerm carries an integer, etc. An important detail here is that these are not different types: they are all type constructors for the type Term. That is, these are different ways to create a Term type. Algebraic types may also be recursively defined. The ListTerm constructor represents a list of other Terms. Already with this simple declaration, we’ve expressed most of the semantics of BERT terms. We can now express composite BERT terms. For example BERT defines dictionaries to be represented as {bert, dict, [{key, value}]. In Haskell:

TupleTerm [ AtomTerm "bert", AtomTerm "dict"
          , ListTerm [TupleTerm [AtomTerm "key", AtomTerm "value"]]]

Typeclasses

Terms encapsulate the BERT type representation, and we will write code that can encode Term values to a binary representation (as per the BERT spec). However, these values are not the most convenient to work with in other Haskell code. Furthermore, many Terms have a natural “more primitive” Haskell type (eg. IntTerm vs. Int, ListTerm vs. []).

We would like to introduce Terms from these types and vice-versa. The most primitive way to do this is via pattern matching (this is ubiquitous in Haskell and other functional programming languages. Also see my poor attempt at implementing pattern matching in Python).

listify (ListTerm l)      = listify' l
listify' []               = []
listify' ((IntTerm x):xs) = x:listify' xs

This code introduces a list of integers ([Int]) from a ListTerm containing IntTerms. Clearly going around creating little unpackers like this for everything is going to be quite cumbersome. Haskell provides a very nice solution. Typeclasses let you declare common traits to a set of types. For example, I can define a typeclass that declares the ability to translate a value of that a given type into or out of a Term (github).

class BERT a where
  -- | Introduce a 'Term' from a Haskell value.
  showBERT :: a -> Term
  -- | Introduce a Haskell value from a 'Term'.
  readBERT :: Term -> a

This introduces two new functions. One to create a term from a Haskell value, and one to do the inverse. So what kinds of types can introduce Haskell values from BERT or vice-versa? A few are quite simple. The most trivial is for Term itself.

instance BERT Term where
  showBERT = id
  readBERT = id

To convert a Term to a Term is just the identity function. We need to cover a few more primitive Haskell types. The BERT definition for lists is:

instance (BERT a) => BERT [a] where
  showBERT xs            = ListTerm (map showBERT xs)
  readBERT (ListTerm xs) = map readBERT xs
  readBERT _             = error "Invalid list type"

Unlike Haskell lists, BERT lists can be heterogeneous: they are lists of Terms which, being an algebraic type, may eventually contain different types of data. Thus, to introduce a BERT list, we return a ListTerm that applies showBERT to every element in the list. This also explains the typeclass restriction on a: we require that the list we are encoding contains a type that also has a typeclass instance of BERT. Similarly, to decode a list, we call readBERT on each element, introducing a list with the decoded Haskell types. But what about the other way? Certainly we couldn’t introduce a Haskell list from a heterogeneous BERT list. The code above looks deceiving in this way, but type inference is going on in the background here. The type of readBERT is Term -> [a]. This inference is propagated when we pass readBERT to map as well; it’s equivalent to the following code:

readBERT :: Term -> [a]
  readBERT (ListTerm xs) = map (readBERT :: Term -> a) xs

Note that you can also create your own typeclass instances that suits your needs. For example if your application keeps some internal representation of some well defined value, you could create a typeclass instance to convert this representation to and from Terms.

Lazyness

The RPC specification for BERT provides a rather opaque notion of a transport. The transport, in essence, is a channel through which you can send and receive BERT terms. The terms are wrapped in a 4-byte header specifying its length.

Haskell has a Handle type that is an opaque representation of a file-like object. Sockets can also be fronted by handles. A popular module, Data.ByteString provides a lazy bytestring type that can be backed by a handle. It’s a type that looks like a bytestring, smells like a bytestring, and acts like a bytestring, except that when it needs more data, it requests it from the handle it’s been constructed with. Similarly, our binary term decoder reads from such bytestrings, so it does not take much imagination to produce a list that represents the infinite stream of packets coming from the BERT peer (github).

packets :: ByteString -> [Packet]
packets b
  | null b = []
  | otherwise = p:packets b' 
  where (p, b') = parsePacket b

(In Haskell, : is the list constructor: it prepends an item to a list, eg. 1:[2,3] == [1,2,3].) packets reads like a declaration: if our bytestring, b is null, it is the empty list, otherwise, it is the first parsed packet from b, plus the list of packets represented by the remainder of the string. This works in Haskell because everything is evaluated lazily. This in effect means that, unless you specify otherwise, values are not computed (code is not evaluated) until needed. In practice, the Haskell runtime leaves a “promise” of a computation when the value is not needed immediately. The list constructor : is no different: its arguments are not evaluated until needed. In our example this means that the list defined by packets is lazily generated. So if we create such a value with packet b, not until we examine the resulting list do we even begin parsing (or generating the list, or reading from the socket). This is an example by which we use lazyness to provide an abstraction. It allows us to operate on the familiar list type while not being terribly concerned about how the list is being constructed. It also relieves us of having to create any further abstraction: we can translate our model of a transport almost perfectly into a primitive Haskell concept without having to go about creating unfamiliar interfaces.

Monadic parsing

We created an algebraic type to represent terms, but their construction (if you need to work with the term type) can be a little cumbersome. Our representation

TupleTerm [ AtomTerm "bert", AtomTerm "dict"
          , ListTerm [TupleTerm [AtomTerm "key", AtomTerm "value"]]]

would be represented more succinctly in the erlang grammar as:

{bert, dict, [(key, value)]}

If just writing code, you probably don’t care too much about the difference: you typically don’t make big static constructs, but rather create types programmatically anyway. However, with the BERT implementation I wanted to have a command line tool that allowed for testing & inspection of results. For example:

$ bert call localhost:8181 mod proc "{1, test, [5,6,7]}"

Luckily, Haskell provides an excellent facility for parsing: Parsec. Parsec uses monadic actions to simultaneously lex & parse its input. I won’t go into monads here, but for our purposes here it is essentially a means by which you can run code in a certain context, and manipulate that context. It can provide a pure functional construct with which you can achieve the appearance of imperative programming. In the context of Parsec, its monad maintains the parser state (eg. where in the input are we, which rules have failed, which rules are yet to be tried, what is the output, etc.), and provides a set of functions that can operate within the context of the monad it provides. Here’s our parser for a term.

p_term :: Parser Term
p_term = t <* spaces    
  where 
    t =     IntTerm               <$> p_num (readSigned readDec)
        <|> FloatTerm             <$> p_num (readSigned readFloat)
        <|> AtomTerm              <$> p_atom
        <|> TupleTerm             <$> p_tuple
        <|> BytelistTerm . C.pack <$> p_string
        <|> ListTerm              <$> p_list
        <|> BinaryTerm   . B.pack <$> p_binary

Again, reading very naturally: a term is t possibly followed by some whitespace. (a <* b means roughly “run a, then b, but the value of the expression is the value of a”). t in turn can be either a p_num wrapped with an IntTerm, and so forth. (In this context, <|> can very roughly be thought of as “otherwise, try..” and <$> as “wrap the results of the argument to the right with the thing on the left”.)

The various p_*s are other parsers. For example the one for tuples is

p_tuple =
  between (char '{' >> spaces) (spaces >> char '}') $
    p_term `sepBy` (spaces >> char ',' >> spaces)

between is one of the Parsec functions that means: between the parses of the following arguments, try to parse the third. The third argument states: “try and parse a p_term that is separated by maybe some whitespace, a comma and maybe some more whitespace.”

…

This is really just the tip of the ice berg. Most of my descriptions, especially those regarding the use of monads belie the rigor and generality of these constructs. If you’d like to explore further, the Real World Haskell book is a great start point. The Haskell community is simply fantastic. Much conversation happens on IRC, where there is typically an armada of people just waiting to discuss the finer points of this wonderful language with you.

Pattern matching in Python

2009-05-11T00:00:00-07:00

One of my favorite things about various functional programming languages is pattern matching. It often allows for very succinct and elegant declarative expressions, and in the dynamic variants it allows for easy in-line lightweight type checking.

So, naturally I wanted the same in Python, the programming language we use at work.

Pattern matching is most powerful when it enjoys first-class support in a language. In addition to succinct syntax, this affords you the ability to integrate pattern matchers with control constructs, allowing conditional execution of code based on various patterns. It also may give you a degree of composability not possible otherwise. For example, Erlang has not only a case statement, but allows for clauses in functions, so that pattern matching is done on the arguments of functions with the same name and arity. For instance in Erlang, you could implement a simple REST-style HTTP request handlers like so:

%% handle(Path, Method)
handle("/", _) ->
  not_a_resource;
handle(Path, 'PUT') ->
  create_new_resource(Path);
handle(Path, 'POST') ->
  update_resource(Path);
handle(Path, 'GET') ->
  retrieve_resource(Path);
handle(_, _) ->
  invalid_request.

Now I can simply call handle(Path, Method) to deal with my request. Notice also how powerful ordering is here: I exclude the resource "/" entirely by matching on it first. Note also that some arguments here are binding, while others are not. For example in the first clause, nothing is bound, in the second, we bind Path if the method matches 'PUT'.

How do we graft something like this onto a language like Python? It’s especially tricky because we’d like to maintain the ever elusive “Pythonics.” While I’m quite sure Guido would never even touch this stuff, we can at least maintain the spirit! Start off by importing the match primitives:

>>> from match import M, A, A_

M creates a destructuring match expression (a “matcher”), A is the a binding argument, and A_ is the “any” argument. Any A arguments need to have a positional specifier. This is achieved with the division operator. So A/0 names the first value in the returned destructured tuple. With the help of a few operators, we compose a match expression:

>>> M((1, (A_, 3), A/1, A/0))
<match.M object at 0x726f0>

So, these expressions are entirely useless until they are bound. The == operator takes care of that:

>>> M((1, (A_, 3), A/1, A/0)) == (1, (2, 3), 1, 2)
(2, 1)

If you precede the match-expression with the ~ operator, the expression becomes a “pure” matching expression, and does not destructure the second operand, it just returns True or False depending on whether it matched successfully.

>>> ~M((1, (A_, 3), A/1, A/0)) == (1, (2, 3), 1, 2)
True

Now, to make things more interesting, match-expressions can be or-ed together, resulting in the first successful match. For example M([A/0]) | M(A/0) doesn’t care whether the value is a list of length 1 or a literal.

>>> M([A/0]) | M(A/0) == [5]
5
>>> M([A/0]) | M(A/0) == 5
5

Finally, matchers can specify “default” values to be returned they match successfully. This is to help deal with polymorphic return values. For example, the following expression:

>>> M([A/0]) | M([], d=False) == arr[:1]

picks the first element if arr is nonempty, otherwise it returns False.

I’ve put the code up on GitHub. I’m not super happy with the aesthetics of it but it’s interesting to experiment with. Among other things, it definitely needs list and dictionary destructuring.