ezyang's blog

the arc of software bends towards understanding

Posts

So long America!

For tomorrow I get on a plane traversing three thousand miles from my home to a little place named Cambridge, United Kingdom. I’ll be studying abroad for the 2010-2011 academic year at Cambridge University (specifically, I’ll be at Fitzwilliam college). While I know Baltimore is a fashionable place to be these days, if you’re in the area, drop me a line: I’d love to meet up after I get over my jet lag. :-)

Read more...

High performance monads

Continuations are well known for being notoriously tricky to use: they are the “gotos” of the functional programming world. They can make a mess or do amazing things (after all, what are exceptions but a well structured nonlocal goto). This post is intended for readers with a passing familiarity with continuations but a disbelief that they could be useful for their day-to-day programming tasks: I’d like to show how continuations let us define high performance monads ala the Logic monad in a fairly methodical way. A (possibly) related post is The Mother of all Monads. :

Read more...

Data is Code

Yesterday I had the pleasure of attending a colloquium given by Chung-Chieh Shan on Embedding Probabilistic Languages. A full account for the talk can be found in this paper, so I want to focus in on one specific big idea: the idea that data is code.


Lispers are well acquainted with the mantra, “code is data,” the notion that behind every source code listing there is a data structure of cons-cells and tags representing the code that can constructed, modified and evaluated. With this framework, a very small set of data is code: '(cons 1 (cons 2 ())) is code but '((.5 ((.5 #t) (.5 #f))) (.5 ((.5 #t)))) isn’t.

Read more...

Tension of patch and tree version control

This post is not meant as a rag on Darcs, just a observation of the difference between two philosophies of version control. Also, I’m a bit new to Darcs, so there might be some factual inaccuracies. Please let me know about them!

At some point, I would like to write a Darcs for Git users guide, distilling my experiences as an advanced Git user wrangling with Darcs. But perhaps the most important take away point is this: don’t try to superimpose Git’s underlying storage model on Darcs! Once I realized this point, I found Darcs fit rather nicely with my preferred Git development style—constant rebasing of local patches until they hit the official repository.

Read more...

Session types, subtyping and dependent types

While I was studying session type encodings, I noticed something interesting: the fact that session types, in their desire to capture protocol control flow, find themselves implementing something strongly reminiscent of dependent types.

Any reasonable session type encoding requires the ability to denote choice: in Simon Gay’s paper this is the T-Case rule, in Neubauer and Thiemann’s work it is the ALT operator, in Pucella and Tov’s implementation it is the :+: type operator, with the offer, sel1 and sel2 functions. There is usually some note that a binary alternation scheme is—in terms of user interface—inferior to some name-based alternation between an arbitrary number of cases, but that the latter is much harder to implement.

Read more...

Evolution of a Shared Web Host

Edward continues his spree of systems posts. Must be something in the Boston air.

Yesterday, I gave a SIPB cluedump on the use and implementation of scripts.mit.edu, the shared host service that SIPB provides to the MIT community. I derive essentially all of my sysadmin experience points from helping maintain this service.

Scripts is SIPB’s shared hosting service for the MIT community. However, it does quite a bit more than your usual $10 host: what shared hosting services integrate directly with your Athena account, replicate your website on a cluster of servers managed by Linux-HA, let you request hostnames on *.mit.edu, or offer automatic installs of common web software, let you customize it, and still upgrade it for you? Scripts is a flourishing development platform, with over 2600 users and many interesting technical problems.

Read more...

Keyword arguments in Haskell

Keyword arguments are generally considered a good thing by language designers: positional arguments are prone to errors of transposition, and it’s absolutely no fun trying to guess what the 37 that is the third argument of a function actually means. Python is one language that makes extensive use of keyword arguments; they have the following properties:

  1. Functions are permitted to be a mix of positional and keyword arguments (a nod to the compactness of positional arguments),
  2. Keywords are local to any given function; you can reuse a named function argument for another function,
  3. In Python 3.0, you can force certain arguments to only be specifiable with a keyword.

Does Haskell have keyword arguments? In many ways, they’re much less necessary due to the static type system: if you accidentally interpose an Int and Bool your compiler will let you know about it. The type signature guides you!

Read more...

Towards platform-agnostic interruptibility

Last post, I talked about some of the notable difficulties in emulating pthread_cancel on Windows. Today, I want to talk about what a platform agnostic compiler like GHC actually ought to do. Recall our three design goals:

  1. GHC would like to be able to put blocking IO calls on a worker thread but cancel them later; it can currently do this on Linux but not on Windows,
  2. Users would like to write interrupt friendly C libraries and have them integrate seamlessly with Haskell’s exception mechanism, and
  3. We’d like to have the golden touch of the IO world, instantly turning blocking IO code into nice, well-behaved, non-blocking, interruptible code.

I am going to discuss these three situations, described concisely as blocking system calls, cooperative libraries and blocking libraries. I propose that, due to the lack of a cross-platform interruption mechanism, the correct interruptibility interface is to permit user defined handlers for asynchronous exceptions.

Read more...

pthread_cancel on Windows

Edward, I’m afraid I have some bad news. Your interruptible GHC patch; it was involved in a terrible accident on the way to Windows portability. I hope you understand: we’re doing our best to patch it up, but there have been some complications…

Pop quiz! What does this pthreads code do? :

#include <pthread.h>
#include <stdio.h>

void *thread1(void *arg) { sleep(10000); }
void *thread2(void *arg) { while (1) {} }

void *psycho_killer(void *arg) {
    pthread_t *id = (pthread_t*)arg;
    pthread_cancel(*id);
    printf("[%p] Psycho killer...\n", id);
    pthread_join(*id, NULL);
    printf("[%p] ...qu'est-ce que c'est.\n", id);
}

int main(char* argv, int argc) {
    pthread_t t1, t2, k1, k2;
    pthread_create(&t1, NULL, thread1, NULL);
    printf("[%p] I can't sleep 'cause my bed's on fire\n", &t1);
    pthread_create(&t2, NULL, thread2, NULL);
    printf("[%p] Don't touch me I'm a real live wire\n", &t2);
    pthread_create(&k1, NULL, psycho_killer, &t1);
    pthread_create(&k2, NULL, psycho_killer, &t2);
    pthread_join(k1, NULL);
    pthread_join(k2, NULL);
    printf("Run run run away!\n");
    return 0;
}

It never manages to terminate the second thread…

Read more...

Embracing Windows

Some things come round full circle.

As a high schooler, I was a real Windows enthusiast. A budding programmer, I accumulated a complete development environment out of necessity, a mix of Cygwin, handwritten batch scripts, PuTTY, LogMeIn, a homegrown set of PHP build scripts and Notepad++. I was so devoted to the cause I even got a single patch into Git, for the purpose of making Git play nicely with plink on Windows. The setup worked, but it always felt like a patchwork of different components, all not quite seeing eye-to-eye with each other. When I discovered that Linux was able to offer me an unbelievably coherent development environment, I jumped ship and said goodbye to Windows.

Read more...