ezyang's blog

Towards platform-agnostic interruptibility

September 10, 2010

Last post, I talked about some of the notable difficulties in emulating pthread_cancel on Windows. Today, I want to talk about what a platform agnostic compiler like GHC actually ought to do. Recall our three design goals:

GHC would like to be able to put blocking IO calls on a worker thread but cancel them later; it can currently do this on Linux but not on Windows,
Users would like to write interrupt friendly C libraries and have them integrate seamlessly with Haskell’s exception mechanism, and
We’d like to have the golden touch of the IO world, instantly turning blocking IO code into nice, well-behaved, non-blocking, interruptible code.

I am going to discuss these three situations, described concisely as blocking system calls, cooperative libraries and blocking libraries. I propose that, due to the lack of a cross-platform interruption mechanism, the correct interruptibility interface is to permit user defined handlers for asynchronous exceptions.

Interruptible blocking system calls. In the past, GHC has had some bugs in which a foreign call to a blocking IO system call caused Windows to stop being interruptible. This is a long standing difference in asynchronous IO philosophy between POSIX and Windows: POSIX believes in functions that look blocking but can be interrupted by signals, while Windows believes in callbacks. Thus, calls that seem harmless enough actually break interruptibility and have to be rewritten manually into a form amenable to both the POSIX and Windows models of asynchrony.

While it is theoretically and practically possible to manually convert every blocking call (which, by the way, works perfectly fine on Linux, because you can just send it a signal) into the asynchronous version, but this is very annoying and subverts the idea that we can simply ship blocking calls onto another thread to pretend that they’re nonblocking.

Since Windows Vista, we can interrupt blocking IO calls using a handy new function CancelSynchronousIo. Notice that cancelling IO is not the same as cancelling a thread: in particular, the synchronous operation merely returns with a failure with the last error set to ERROR_OPERATION_ABORTED, so the system call needs to have been performed directly by GHC (which can then notice the aborted operation and propagate the interrupt further) or occur in C code that can handle this error condition. Unfortunately, this function does not exist on earlier versions of Windows.

Aside. Is there anything we can do for pre-Vista Windows? Not obviously: the under-the-hood changes that were made to Windows Vista were partially to make a function like CancelSynchronousIo possible. If we were to enforce extremely strong invariants on when we call TerminateThread; that is, we’d have to manually vet every function we consider for termination, and at that point, you might as well rewrite it asynchronous style.

Interruptible cooperative libraries. This is the situation where we have a C library that we have a high level of control over: it may be our own library or we may be writing a intermediate C layer between GHC and an expressive, asynchronous underlying library. What we would like to do is have GHC seamlessly convert its asynchronous exceptions into some for that our C can notice and act gracefully upon.

As you may have realized by now, there are a lot of ways to achieve this:

Signals. POSIX only, signals can be temporarily blocked with sigprocmask or pthread_sigmask and a signal handler can be installed with sigaction to cleanup and possible exit the thread or long jump.
Pthread cancellation. POSIX only, cancellation can be temporarily blocked with pthread_setcanceltype and a cancellation handler can be installed with pthread_cleanup_push.
Select/poll loop. Cancellation requests are sent over a socket which is being polled, handler can choose to ignore them.
Event objects. Windows only, threads can receive cancellation requests from the handle from OpenEvent but choose to ignore them.
IO cancellation. Windows Vista only, as described above.
Completion queue. Windows only, similar to select/poll loop.

It doesn’t make much sense to try to implement all of these mechanisms natively. Therefore, my proposal: have GHC call a user-defined function in a different thread upon receipt of an asynchronous function and let the user figure out what to do. In many ways, this is not much of a decision at all: in particular, we’ve asked the programmer to figure it out for themselves. Libraries that only worked with POSIX will still only work with POSIX. However, this is still an advance from the current state, which is that asynchronous exceptions necessarily behave differently for Haskell and FFI code.

Interruptible blocking libraries. Because blocking IO is much easier to program than non-blocking IO, blocking interfaces tend to be more prevalent and better tested. (A friend of mine who spent the summer working on Chromium had no end of complaints about the bugginess of the non-blocking interface to NSS.) It might be practical to rewrite a few system calls into asynchronous style, but when you have a blob of existing C code that you want to interface with, the maintenance cost of such a rewrite quickly becomes untenable. What is one to do?

Alas, there is no magic bullet: if the library never had any consideration for interruptibility, forcibly terminating it is more than likely to leave your program in a corrupted state. For those who would like to play it fast and loose, however, the user-defined function approach would still let you call TerminateThread if you really wanted to.

In conclusion, I propose that the interruptibility patch be extended beyond just a simple interruptible keyword to allow a user-defined asynchronous exception handler, compiled against the RTS, as well as providing a few built-in handlers which provide sensible default behaviors (both platform specific and non-platform specific, though I expect the latter to give much weaker guarantees).

pthread_cancel on Windows

September 8, 2010

Edward, I’m afraid I have some bad news. Your interruptible GHC patch; it was involved in a terrible accident on the way to Windows portability. I hope you understand: we’re doing our best to patch it up, but there have been some complications…

Pop quiz! What does this pthreads code do? :

#include <pthread.h>
#include <stdio.h>

void *thread1(void *arg) { sleep(10000); }
void *thread2(void *arg) { while (1) {} }

void *psycho_killer(void *arg) {
    pthread_t *id = (pthread_t*)arg;
    pthread_cancel(*id);
    printf("[%p] Psycho killer...\n", id);
    pthread_join(*id, NULL);
    printf("[%p] ...qu'est-ce que c'est.\n", id);
}

int main(char* argv, int argc) {
    pthread_t t1, t2, k1, k2;
    pthread_create(&t1, NULL, thread1, NULL);
    printf("[%p] I can't sleep 'cause my bed's on fire\n", &t1);
    pthread_create(&t2, NULL, thread2, NULL);
    printf("[%p] Don't touch me I'm a real live wire\n", &t2);
    pthread_create(&k1, NULL, psycho_killer, &t1);
    pthread_create(&k2, NULL, psycho_killer, &t2);
    pthread_join(k1, NULL);
    pthread_join(k2, NULL);
    printf("Run run run away!\n");
    return 0;
}

It never manages to terminate the second thread…

ezyang@javelin:~/Desktop$ ./test 
[0xbf900b4c] I can't sleep 'cause my bed's on fire
[0xbf900b48] Don't touch me I'm a real live wire
[0xbf900b4c] Psycho killer...
[0xbf900b4c] ...qu'est-ce que c'est.
[0xbf900b48] Psycho killer...
^C

If you just had the pthread_cancel and the pthread_setcancelstate manpages, this might seem a little mysterious. The pthreads page, however, makes things clear: sleep is among one-hundred and two “cancellable” functions, which pthread_cancel must terminate within if a thread’s cancellability status is PTHREAD_CANCEL_DEFERRED (there are another two-hundred and forty-two which may or may not be cancelled). If the thread is stuck in userspace, it has to explicitly allow a deferred cancellation with pthread_testcancel. Previous versions of the POSIX spec were a little unclear whether or not cancellation should take place upon entry to the system call, or while the system call was running, but the 2008 spec is fairly clear:

Cancellation points shall occur when a thread is executing the following functions…

The million-dollar question is: “Can we implement the same semantics on Windows?” Actually, since it seems that a lot of people would have wanted pthreads functionality on Windows, you would think that this has been already been implemented by pthreads-win32. We turn to the source! :

if (tp->cancelType == PTHREAD_CANCEL_ASYNCHRONOUS
    && tp->cancelState == PTHREAD_CANCEL_ENABLE
    && tp->state < PThreadStateCanceling)
  {
    /* snip  */
  }
else
  {
    /*
     * Set for deferred cancellation.
     */
    if (tp->state < PThreadStateCancelPending)
  {
    tp->state = PThreadStateCancelPending;
    if (!SetEvent (tp->cancelEvent))
      {
        result = ESRCH;
      }
  }
    else if (tp->state >= PThreadStateCanceling)
  {
    result = ESRCH;
  }

    (void) pthread_mutex_unlock (&tp->cancelLock);
  }

Interestingly enough, pthreads-win32 doesn’t seem to do anything special: when we translate our test program and run it with pthreads-win32, it gets stuck on the Sleep call as well:

C:\Users\ezyang\pthreads-win32\Pre-built.2\lib>test.exe
[0022FF40] I can't sleep 'cause my bed's on fire
[0022FF38] Don't touch me I'm a real live wire
[0022FF40] Psycho killer...
[0022FF38] Psycho killer...
^C

At this point, it’s worth stepping back for a moment and asking, “What are we really trying to do here?” If you were to ask how to terminate threads on, say, Stack Overflow, you’d get a bunch of responses telling you, “Stop that and do it the right way”; namely, by explicitly handling thread termination on the thread itself via another message passing mechanism.

So there are number of different needs for interruptible calls:

GHC would like to be able to put blocking IO calls on a worker thread but cancel them later; it can currently do this on Linux but not on Windows,
Users would like to write interrupt friendly C libraries and have them integrate seamlessly with Haskell’s exception mechanism, and
We’d like to have the golden touch of the IO world, instantly turning blocking IO code into nice, well-behaved non-blocking code.

Next time I’ll talk about what different approaches might be needed for each of these goals.

Embracing Windows

September 6, 2010

Some things come round full circle.

As a high schooler, I was a real Windows enthusiast. A budding programmer, I accumulated a complete development environment out of necessity, a mix of Cygwin, handwritten batch scripts, PuTTY, LogMeIn, a homegrown set of PHP build scripts and Notepad++. I was so devoted to the cause I even got a single patch into Git, for the purpose of making Git play nicely with plink on Windows. The setup worked, but it always felt like a patchwork of different components, all not quite seeing eye-to-eye with each other. When I discovered that Linux was able to offer me an unbelievably coherent development environment, I jumped ship and said goodbye to Windows.

Some things come round full circle. Windows has a way of coming back to you eventually. The product I worked on over the summer at Galois had to support Windows, and I consequently devoted days of effort getting my changes to build properly on Windows. I then went on to hacking GHC, and Simon Marlow asked me to implement the equivalent feature in Windows.

I’ve decided that I should stop shunning Microsoft Windows as the developer’s black sheep of the operating systems. Like it or not, Windows is here to stay; even if I never boot my laptop into Windows, as a developer it is good practice to think about and test my code on Windows. It might even be the case that Windows is a perfectly reasonable underlying platform to develop on.

There seem to be two reasons why developers might find targeting other platforms to be annoying:

They don’t have access to a computer running that operating system, which makes debugging the problems extremely annoying—after all, this is why a reproduceable test-case is the gold standard of bug reporting. We should have easy to access and easy to use build servers setup to let people play in these different environments. This involves putting down some money to buy the appropriate licenses, which open-source authors might be reluctant to do: people at places with site licenses might be able to help by donating boxes for these people to play in (the same way companies and universities donate disk space and bandwidth for mirrors).
They have to learn about another platform, with all of its intricacies and gotchas. On the one hand, this is annoying because “I already know how to do this in Unix, and now I have to spend N minutes to figure out how to do it on Windows, and spend another N minutes figuring out why it doesn’t work in some edge case.” On the other hand, learning a platform that does something you already know how to do can be kind of fun: you get to see different design decisions and develop multiple perspectives on the same problem, which I have found has always helped me out for problem solving.

There remain parts of Windows programming that I continue to have no interest in: for example, I find the vagaries of manifest files to be fairly uninteresting. But then again, I find packaging in Linux distributions to be uninteresting. Stop blaming Windows!

Annotating slides

September 3, 2010

A little trick for your toolbox: after you’ve generated your slide deck and printed it out to PDF, you might want to annotate the slides with comments. These is a good idea for several reasons:

If you’ve constructed your slides to be text light, they might be optimized for presentation but not for reading later on. (“Huh, here is this diagram, I sure wish I knew what the presenter was saying at that point.”)
Writing out a dialog to go along the slides is a nonvocal way of practicing your presentation!

But how do you interleave the slide pages with your annotations? With the power of enscript and pdftk, you can do this entirely automatically, without even having to leave your terminal! Here’s the recipe.

Create an “annotations” text file (we’ll refer to it as annot.txt). This will contain your text commentary to accompany the slides. Write the text explaining your first slide, and then insert a form feed (^L, you can do so by pressing C-l in vim (insert mode) or C-q C-l in emacs.) Write the text for your second slide. Rinse and repeat.
We now want to render this into a PDF file, with the same dimensions as your slide deck. Figure out what the size of your slides are in pixels, and then edit your ~/.enscriptrc to contain the following line:
```
Media: Slide width height llx lly urx ury
```
where ll stands for lower left and ur stands for upper right: these four numbers denote the bounding box for the text. One possible combination for these might be:
```
Media: Slide 576 432 18 17 558 415
```
We can now invoke enscript to generate a nicely formatted PostScript file of our annotations in the right dimensions, with enscript annot.txt -p annot.ps -M Slide -B -f Palatino-Roman14 (pick a different font, if you like.)
Convert the resulting PostScript file into a PDF, with ps2pdf annot.ps.
Now, with pdftk, we will split our annotations PDF and our slides PDF into individual pages, and then merge them back together into one PDF. We can use burst to output the pages, suggestively naming the output files so they interleave correctly:
```
mkdir stage
pdftk slides.pdf burst output stage/%02da.pdf
pdftk annot.pdf burst output stage/%02db.pdf
```
and then we join them back together:
```
pdftk stage/*.pdf cat output annotated-slides.pdf
```

Here’s the full script:

#!/bin/sh
set -e
ANNOT="$1"
SLIDES="$2"
OUTPUT="$3"
if [ -z "$3" ]
then
    echo "usage: $0 annot.txt slides.pdf output.pdf"
    exit 1
fi
TMPDIR="$(mktemp -d)"
enscript "$ANNOT" -p "$ANNOT.ps" -M Slide -B -f Palatino-Roman14
ps2pdf "$ANNOT.ps" "$ANNOT.pdf"
pdftk "$SLIDES" burst output "$TMPDIR/%03da.pdf"
pdftk "$ANNOT.pdf" burst output "$TMPDIR/%03db.pdf"
pdftk "$TMPDIR"/*.pdf cat output "$OUTPUT"
rm -Rf "$TMPDIR"

Don’t forget to define Slide in your .enscriptrc, and happy annotating!

My type signature overfloweth

September 1, 2010

I’ve recently started researching the use of session types for practical coding, a thought that has been in the back of my mind ever since I was part of a team that built a networked collaborative text editor and spent a lot of time closely vetting the server and the client to ensure that they had implemented the correct protocols. The essence of such protocols is often relatively simple, but can quickly become complicated in the presence of error flow (for example, resynchronizing after a disconnection). Error conditions also happen to be difficult to automatically test! Thus, static types seem like an attractive way of tackling this task.

There are three implementations of session types in Haskell: sessions, full-sessions and simple-sessions. If you were feeling particularly naive, you might try going to the Haddock page to get a feel for what the API looks like. Before you continue reading, please inspect that page.

Done gouging your eyes out? Let’s proceed.

In an interview in Coders at Work, Simon Peyton Jones mentioned that one of the notable benefits of types is that it gives a concise, crisp description of what a function might do. That API is anything from concise and crisp, and it’s certainly not something that I could figure out just by looking at the corresponding function definition. Accordingly, one of the key selling points of current encodings of session types is that they do not break type inference: we give up on our user understanding what the gaggle of typeclasses means, and only expect to transfer one bit of information, “Do the protocols match?”

This is not a problem that is fundamental to session types: any functionality that makes extensive use typeclasses can easily fall prey to these long type signatures. I have two (rather half-baked) thoughts on how this complexity might be rendered more nicely to the user, although not eliminated:

A favorite pastime of type system hackers is a type-level encoding of naturals, using Peano numbers Z and S a, attached to something like Vector (S (S Z)). Vector is a type constructor of kind * -> *. However, since there is only one primitive kind in Haskell, we could actually pass any type to Vector, say Vector Int, which would be a nonsensical. One way to prevent this from occurring is to declare our Peano numbers instances of a typeclass Nat, and then declare Nat a => Vector a. But, since a is used precisely once in any such a statement, wouldn’t it be great if instead we could write Vector :: Nat -> *? If you need to specify type equality, you could imagine some sort of type pattern matching concat :: Vector a -> Vector b -> Vector c with c ~ a :+: b. Collapsing types and kinds is an interesting step in this direction.
When mathematicians present proofs, they might explicitly specify “for all F such that F is a field…”, but more frequently, they’ll say something like, “in the following proof, assume the following variable naming conventions.” With this, they get to avoid having to repeatedly explicitly redeclare what all of their variable names mean. An analogous system for type variables would go a long way towards reducing long type signatures.

But actually, that has nothing to do with what I’m currently looking at.

Here’s what I am looking at: session types suffer from another type signature explosion phenomenon: any function in the protocol contains, in its type, a complete specification of the entire protocol continuing from that point in time. As Neubauer and Thiemann admit (PDF), the “session type corresponding to full SMTP is quite unreadable.” The two lines of inquiry I am pursuing are as follows:

Can building exception support into session types (currently an open problem) allow for much simpler session types by allowing most cases to elide the session types corresponding to error cases?
Can we use type to permit a single global specification of the protocol, which individual functions then simply refer to? Do we need something a little more powerful?

At this point, I’ve just been doing thinking and paper reading, but I hope to start hacking on code soon. I’d love to hear your thoughts though.

Defining “Haskelly”

August 30, 2010

At risk of sounding like a broken record, the topic of this post also sprang from abcBridge. John Launchbury asked a question during my presentation that got me thinking about API design in Haskell. (By the way, the video for the talk is out! Unfortunately, the second half had to be cut out due to technical difficulties, but you can still check out the slides.)

His question was this:

You’ve presented this in a very imperative style, where you’ve got this AIG structure in the ABC tool, and what you’ve really done is given me a nicely typed Haskell typed interface that allows you to go in a put a new gate or grab a structure, and I’m left wondering, what is the reason for needing this tight tie-in with what’s going on in that space? Here is a thought experiment: I could imagine myself having a purely functional data structure that is describing the data structure…and you end up with a functional description of what you want your graph to look like, and then you tell ABC to go and build the graph in one go.

I had claimed that abcBridge was a “functional API” for manipulating and-inverter graphs; perhaps I was lying! Is abcBridge—with its close correspondence to the underlying imperative code—truly “functional?” Or, if it’s not functional, does it at least have a “Haskelly” API? (What does it even mean for an API to be Haskelly?) Why does the purely functional interface seem morally better than the imperative interface? It’s a question of philosophical import as well as practical import—why do we prefer the functional interface which might require a more complex underlying implementation?

My conjecture is that the faithfulness of an API to its host language is based on how much it utilizes particular features that a language makes easy to use. Haskell is frequently introduced as a “purely functional, lazy, strongly statically typed programming language.” Looking at each of these terms in turn (informally)…

Purely functional indicates APIs that eschew destructive updates, instead opting for immutability and persistent data structures. Language features that make it easier to write in this style include the final and const keywords, algebraic data types, pattern-matching and a library of persistent data structures to write more persistent data structures with.
Lazy indicates APIs that utilize laziness to build custom control structures and generate infinite data structures. The poster child language feature for laziness is, well, lazy evaluation by default, but explicit laziness annotations in a strict language or even a convenient lambda abstraction encourages lazy style. (Python does not have a convenient lambda, which is why asynchronous frameworks like Twisted are so painful!)
Strongly statically typed indicates APIs that encode invariants of all shapes and sizes into the static type system, so that programmer errors can be caught at compile time, not run time. The type system is the obvious language feature here, with its expressiveness defining what you can easily add to your system.

We associate programs that take advantage of these language features as “Haskelly” because Haskell makes it easy—both syntactically and conceptually—to use them! But at the same time, these are all (mostly) orthogonal language features, and for any given API you might write, you may opt to ditch some of these properties for others: maybe the feature just doesn’t matter in your problem domain, maybe the feature imposes an unacceptable performance penalty or is an insufficiently sealed abstraction.

With abcBridge as our concrete example, here is how you might make such classifications in practice:

The monadic interface for constructing networks is about as far from purely functional as you can get, which was an explicit design choice in the name of performance and control. (Fortunately, we can build a nicer API on top of this one—in fact, I did an experimental implementation of one.) However, when you’re dealing with fully constructed networks the API takes a purely functional style, doing copying and unsafePerformIO behind the scenes to preserve this illusion.
abcBridge does not directly use laziness: in particular the monadic code is very structured and doesn’t have a lot of flow control in it.
The static type system is a huge part of abcBridge, since it is operating so close to the bare metal: it has two monads, with an intricate set of functions for running and converting the monads, and the low level FFI bindings make every attempt to enhance the existing C-based type system. Notice the interesting play between the types and a functional interface: if we had a purely functional interface, we probably could have ditched most of these complicated types! (Imperative code, it seems, needs stronger type system tricks.)

As far as pure Haskell libraries go, abcBridge is very un-Haskelly: I would certainly expect more from an equivalent library implemented in pure Haskell. But it is leaps and bounds better than the C library it sprang from. How far should one push the envelope? It is all about striking the right balance—that is why API design is an art.

Interrupting GHC

August 27, 2010

In my tech talk about abcBridge, one of the “unsolved” problems I had with making FFI code usable as ordinary Haskell code was interrupt handling. Here I describe an experimental solution involving a change to the GHC runtime system as suggested by Simon Marlow. The introductory section may be interesting to practitioners looking for working examples of code that catches signals; the later section is a proof of concept that I hope will turn into a fully fleshed out patch. :

> {-# LANGUAGE ForeignFunctionInterface #-}
> {-# LANGUAGE DeriveDataTypeable #-}
> {-# LANGUAGE ScopedTypeVariables #-}
> 
> import qualified Control.Exception as E
> 
> import Foreign.C.Types (CInt)
> 
> import Control.Monad
> import Control.Concurrent (threadDelay, myThreadId, throwTo, forkIO)
> import Control.Concurrent.MVar (newEmptyMVar, putMVar, readMVar)
> 
> import System.IO (hPutStrLn, stderr)
> import System.Posix.Signals (installHandler, sigINT, Handler(..))

In many interactive applications (especially for REPLs), you would like to be able to catch when a user hits ^C and terminate just the current computation, not the entire program. fooHs is some function that may take a long time to run (in this case, fooHs never terminates). :

> fooHs :: Int -> IO Int
> fooHs n = do
>     putStrLn $ "Arf HS " ++ show n
>     threadDelay 1000000
>     fooHs n

By default, GHC generates an asynchronous exception which we can catch using the normal exception handling facilities to say “don’t exit yet”:

> reallySimpleInterruptible :: a -> IO a -> IO a
> reallySimpleInterruptible defaultVal m = do
>     let useDefault action =
>             E.catch action
>                 (\(e :: E.AsyncException) ->
>                     return $ case e of
>                         E.UserInterrupt -> defaultVal
>                         _ -> E.throw e
>                         )
>     useDefault m
> 
> reallySimpleMain = do
>     r <- reallySimpleInterruptible 42 (fooHs 1)
>     putStrLn $ "Finished with " ++ show r

Sometimes, you don’t want an exception generated at all and would like to deliberate on the signal as soon as it arrives. You might be in some critical section of the program that should not be interrupted! In such a case, you can install a signal handler with installHandler from System.Posix.Signals.

> installIntHandler :: Handler -> IO Handler
> installIntHandler h = installHandler sigINT h Nothing

Care should be taken to make sure you restore the original signal handler when you’re done.

If you do decide you want to generate an exception from inside a signal handler, a little care must be taken: if we try to do just a simple throw, our exception will seemingly vanish into the void! This is because the interrupt handler is run on a different thread, and we have to use throwTo from Control.Concurrent to ensure our exception is sent to the right thread. :

> simpleInterruptible :: a -> IO a -> IO a
> simpleInterruptible defaultVal m = do
>     tid <- myThreadId
>     let install = installIntHandler (Catch ctrlc)
>         ctrlc = do
>             -- This runs in a different thread!
>             hPutStrLn stderr "Caught signal"
>             E.throwTo tid E.UserInterrupt
>         cleanup oldHandler = installIntHandler oldHandler >> return ()
>         useDefault action =
>             E.catch action
>                 (\(e :: E.AsyncException) ->
>                     return $ case e of
>                         E.UserInterrupt -> defaultVal
>                         _ -> E.throw e
>                         )
>     useDefault . E.bracket install cleanup $ const m
> 
> simpleMain = do
>     r <- simpleInterruptible 42 (fooHs 1)
>     putStrLn $ "Finished with " ++ show r

This code works fine for pure Haskell work.

However, our question is whether or not we can interrupt Haskell threads that are inside the FFI, not just pure Haskell code. That is, we’d like to replace fooHs with:

> foreign import ccall "foo.h" foo :: CInt -> IO ()

where foo.h contains:

void foo(int);

and foo.c contains:

#include <stdio.h>
#include "foo.h"

void foo(int d) {
    while (1) {
        printf("Arf C %d!\n", d);
        sleep(1);
    }
}

In real practice, foo will be some highly optimized function written in C that may take a long time to run. We also can’t kill functions willy nilly: we should be able to forcibly terminate it at any time without corrupting some global state.

If we try our existing interruptible functions, we find they don’t work:

reallySimpleInterruptible registers the SIGINT, but the foreign call continues. On the second SIGINT, the program terminates. This is the default behavior of the runtime system: the RTS will attempt to gracefully abort the computation, but has no way of killing an FFI call, and forcibly terminates the program when the second SIGINT arrives.
simpleInterruptible fares even worse: without the “exit on the second signal” behavior, we find that we can’t kill the program by pressing ^C! The thread that requested the FFI call is ignoring our exceptions.

Nota bene. Please let the author know of any factual inaccuracies in this section.

Time to dive into the runtime system! The code that manages asynchronous exception lives in RaiseAsync.c in the rts directory. In particular, there is the function:

nat throwToMsg (Capability *cap, MessageThrowTo *msg)

Which is called when a thread invokes throwTo to create an exception in another thread.

It’s instructive to first look at what happens when there is no funny business going along, that is, when the thread is not blocked:

case NotBlocked:
{
    if ((target->flags & TSO_BLOCKEX) == 0) {
        // It's on our run queue and not blocking exceptions
        raiseAsync(cap, target, msg->exception, rtsFalse, NULL);
        return THROWTO_SUCCESS;
    } else {
        blockedThrowTo(cap,target,msg);
        return THROWTO_BLOCKED;
    }
}

If the thread is running normally, we use raiseAsync to raise the exception and we’re done! However, the thread may have called block (from Control.Exception), in which case we add the exception to the target’s blocked exceptions queue, and wait for the target to become unblocked.

Another state that a Haskell thread can be in is this:

case BlockedOnCCall:
case BlockedOnCCall_NoUnblockExc:
{
    blockedThrowTo(cap,target,msg);
    return THROWTO_BLOCKED;
}

The runtime system waits for the thread to stop being blocked on the FFI call before delivering the exception—it will get there eventually! But if the FFI call takes a long time, this will be too late. We could replace this call with raiseAsync, but what we find is that, while the exception gets raised and the Haskell thread resumes normal execution, the FFI computation continues!

If this seems mysterious, it’s useful to review how the multithreaded scheduler in the GHC runtime system works. Haskell threads are light-weight, and don’t have a one-to-one corresponding with OS threads. Instead, Haskell threads, represented with a TSO (thread-state object), are scheduled on a smaller number of OS threads, abstracted in the RTS as Tasks. Each OS thread is associated with a CPU core, abstracted in the RTS as a Capability.

At the very start of execution, the number of OS threads is the same as the number of virtual cores (as specified by the -N RTS option): in terms of Haskell code, we gain parallelism by having multiple capabilities, not multiple tasks! A capability can only belong to one task at a time. However, if a task blocks on the operating system, it may give up it’s capability to another task, which can continue running Haskell code, thus we frequently refer to these tasks as worker threads.

A Task (OS thread) does work by executing InCalls requested by a TSO (Haskell thread) in the run queue, scheduling them in a round-robin fashion. During the course of this execution, it may run across an FFI call. The behavior here diverges depending on whether or not the FFI call is safe or unsafe.

If the call is unsafe, we just make the call, without relinquishing the capability! This means no other Haskell code can run this virtual core, which is bad news if the FFI call takes a long time or blocks, but if it’s really fast, we don’t have to give up the capability only to snatch it back again.
If the call is safe, we release the capability (allowing other Haskell threads to proceed), and the Haskell thread is suspended as waiting on a foreign call. The current OS thread then goes and runs the FFI call.

Thus, if we attempt to directly wake up the original Haskell thread by throwing it an exception, it will end up getting scheduled on a different OS thread (while the original thread continues running the FFI call!)

The trick is to kill the OS thread that is running the FFI call. :

case BlockedOnCCall:
case BlockedOnCCall_NoUnblockExc:
{
#ifdef THREADED_RTS
    Task *task = NULL;
    if (!target->bound) {
        // walk all_tasks to find the correct worker thread
        for (task = all_tasks; task != NULL; task = task->all_link) {
            if (task->incall->suspended_tso == target) {
                break;
            }
        }
        if (task != NULL) {
            raiseAsync(cap, target, msg->exception, rtsFalse, NULL);
            pthread_cancel(task->id);
            task->cap = NULL;
            task->stopped = rtsTrue;
            return THROWTO_SUCCESS;
        }
    }
#endif
    blockedThrowTo(cap,target,msg);
    return THROWTO_BLOCKED;
}

Which OS thread is it, anyhow? It couldn’t possibly be thread attempting to throw the exception and it doesn’t have anything to do with the suspended Haskell thread, which is waiting to be woken up but doesn’t know what it’s waiting to be woken up from. However, the task running the FFI call knows which Haskell thread is waiting on it, so we can just walk the list of all tasks looking for the one that matches up with the target of our exception. Once we find it, we kill the thread with fire (pthread_cancel) and wakeup the orignating Haskell thread with an exception.

There is one subtlety that Marlow pointed out: we do not want to destroy bound threads, because they may contain thread local state. Worker threads are identical and thus expendable, but bound threads cannot be treated so lightly.

We’ve been a bit mean: we haven’t given the library a chance to clean up when it got interrupted. Fortunately, the library can use pthread_setcancelstate and pthread_setcanceltype, to give it a chance to cleanup before exiting.

It turns out that even with the RTS patch, we still aren’t quite able to interrupt FFI calls. If we add in an explicit new Haskell thread, hwoever, things work:

> interruptible :: a -> IO a -> IO a
> interruptible defaultVal m = do
>     mresult <- newEmptyMVar -- transfer exception to caller
>     mtid    <- newEmptyMVar
>     let install = installIntHandler (Catch ctrlc)
>         cleanup oldHandler = installIntHandler oldHandler >> return ()
>         ctrlc = do
>             hPutStrLn stderr "Caught signal"
>             tid <- readMVar mtid
>             throwTo tid E.UserInterrupt
>         bracket = reportBracket . E.bracket install cleanup . const
>         reportBracket action = do
>             putMVar mresult =<< E.catches (liftM Right action)
>                 [ E.Handler (\(e :: E.AsyncException) ->
>                     return $ case e of
>                         E.UserInterrupt -> Right defaultVal
>                         _ -> Left (E.toException e)
>                     )
>                 , E.Handler (\(e :: E.SomeException) -> return (Left e))
>                 ]
>     putMVar mtid =<< forkIO (bracket m)
>     either E.throw return =<< readMVar mresult -- one write only
> 
> main = main' 3
> 
> main' 0 = putStrLn "Quitting"
> main' n = do
>     interruptible () $ do
>         (r :: Either E.AsyncException ()) <- E.try $ foo n
>         putStrLn $ "Thread " ++ show n ++ " was able to catch exception"
>     main' (pred n)

The output of this literate Haskell file, when compiled with -threaded on the patched RTS is as follows:

Arf C 3!
Arf C 3!
^CCaught signal
Thread 3 was able to catch exception
Arf C 2!
Arf C 2!
Arf C 2!
^CCaught signal
Thread 2 was able to catch exception
Arf C 1!
Arf C 1!
^CCaught signal
Thread 1 was able to catch exception
Quitting

Proof of concept accomplished! Now to make it work on Windows…

abcBridge: Functional interfaces for AIGs and SAT solving

August 25, 2010

Yesterday I gave a Galois Tech Talk about abcBridge, a set of bindings in Haskell for ABC that I built over the summer as part of my internship.

There should be a video soon, but until then, you can download my annotated slides. The software’s not public yet, but hopefully it will be soon.

Type Kata: Distinguishing different data with the same underlying representation

August 23, 2010

Punning is the lowest form of humor. And an endless source of bugs.

The imperative. In programming, semantically different data may have the same representation (type). Use of this data requires manually keeping track of what the extra information about the data that may be in a variable. This is dangerous when the alternative interpretation is right most of the time; programmers who do not fully understand all of the extra conditions are lulled into a sense of security and may write code that seems to work, but actually has subtle bugs. Here are some real world examples where it is particularly easy to confuse semantics.

Variables and literals. The following is a space efficient representation of boolean variables (x, y, z) and boolean literals (x or not x). Boolean variables are simply counted up from zero, but boolean literals are shifted left and least significant bit is used to store complement information. :

int Gia_Var2Lit( int Var, int fCompl )  { return Var + Var + fCompl; }
int Gia_Lit2Var( int Lit )              { return Lit >> 1;           }

Consider, then, the following function:

int Gia_ManHashMux( Gia_Man_t * p, int iCtrl, int iData1, int iData0 )

It is not immediately obvious whether or not the iCtrl, iData1 and iData0 arguments correspond to literals or variables: only an understanding of what this function does (it makes no sense to disallow muxes with complemented inputs) or an inspection of the function body is able to resolve the question for certain (the body calls Gia_LitNot). Fortunately, due to the shift misinterpreting a literal as a variable (or vice versa) will usually result in a spectacular error. (Source: ABC)

Pointer bits. It is well known that the lower two bits of a pointer are usually unused: on a 32-bit system, 32-bit integers are the finest granularity of alignment, which force any reasonable memory address to be divisible by four. Space efficient representations may use these two extra bits to store extra information but need to mask out the bits when dereferencing the pointer. Building on our previous example, consider a pointer representation of variables and literals: if a vanilla pointer indicates a variable, then we can use the lowest bit to indicate whether or not the variable is complemented or not, to achieve a literal representation.

Consider the following function:

Gia_Obj_t *  Gia_ObjFanin0( Gia_Obj_t * pObj );

where iDiff0 is an int field in the Gia_Obj_t struct. It is not clear whether or not the input pointer or the output pointer may be complemented or not. In fact, the input pointer must not be complemented and the output pointer will never be complemented.

Misinterpreting the output pointer as possibly complemented may seem harmless at first: all that happens is the lower two bits are masked out, which is a no-op on a normal pointer. However, it is actually a critical logic bug: it assumes that the returned pointer’s LSB says anything about whether or not the fanin was complemented, when in fact the returned bit will always be zero. (Source: ABC)

Physical and virtual memory. One of the steps on the road to building an operating system is memory management. When implementing this, a key distinction is the difference between physical memory (what actually is on the hardware) and virtual memory (which your MMU translates from). The following code comes from a toy operating system skeleton that students build upon:

/* This macro takes a kernel virtual address -- an address that points above
 * KERNBASE, where the machine's maximum 256MB of physical memory is mapped --
 * and returns the corresponding physical address.  It panics if you pass it a
 * non-kernel virtual address.
 */
#define PADDR(kva)                      \
({                              \
        physaddr_t __m_kva = (physaddr_t) (kva);        \
        if (__m_kva < KERNBASE)                 \
                panic("PADDR called with invalid kva %08lx", __m_kva);\
        __m_kva - KERNBASE;                 \
})

/* This macro takes a physical address and returns the corresponding kernel
 * virtual address.  It panics if you pass an invalid physical address. */
#define KADDR(pa)                       \
({                              \
        physaddr_t __m_pa = (pa);               \
        uint32_t __m_ppn = PPN(__m_pa);             \
        if (__m_ppn >= npage)                   \
                panic("KADDR called with invalid pa %08lx", __m_pa);\
        (void*) (__m_pa + KERNBASE);                \
})

Note that though the code distinguishes with a type synonym uintptr_t (virtual addresses) from physaddr_t (physical addresses), the compiler will not stop the student from mixing the two up. (Source: JOS)

String encoding. Given an arbitrary sequence of bytes, there is no canonical interpretation of what the bytes are supposed to mean in human language. A decoder determines what the bytes probably mean (from out-of-band data like HTTP headers, or in-band data like meta tags) and then converts a byte stream into a more structured internal memory representation (in the case of Java, UTF-16). However, in many cases, the original byte sequence was the most efficient representation of the data: consider the space-difference between UTF-8 and UCS-32 for Latin text. This encourages developers to use native bytestrings to pass data around (PHP’s string type is just a bytestring), but has caused endless headaches if the appropriate encoding is not also kept track of. This is further exacerbated by the existence of Unicode normalization forms, which preclude meaningful equality checks between Unicode strings that may not be in the same normalization form (or may be completely un-normalized).

Endianness. Given four bytes corresponding to a 32-bit integer, there is no canonical “number” value that you may assign to the bytes: what number you get out is dependent on the endianness of your system. The sequence of bytes 0A 0B 0C 0D may be interpreted as 0x0A0B0C0D (big endian) or 0x0D0C0B0A (little endian).

Data validation. Given a data structure representing a human, with fields such as “Real name”, “Email address” and “Phone number”, there are two distinct interpretations that you may have of the data: the data is trusted to be correct and may be used to directly perform an operation such as send an email, or the data is unvalidated and cannot be trusted until it is processed. The programmer must remember what status the data has, or force a particular representation to never contain unvalidated data. “Taint” is a language feature that dynamically tracks the validated/unvalidated status of this data.

The kata. Whenever a data structure (whether simple or complex) could be interpreted multiple ways, newtype it once for each interpretation. :

newtype GiaLit = GiaLit { unGiaLit :: CInt }
newtype GiaVar = GiaVar { unGiaVar :: CInt }

-- accessor functions omitted for brevity; they should be included

newtype CoGia_Obj_t = CoGia_Obj_t (Gia_Obj_t)

newtype PhysAddr a = PhysAddr (Ptr a)
newtype VirtualAddr a = VirtualAddr (Ptr a)

newtype RawBytestring = RawBytestring ByteString
-- where e is some Encoding
newtype EncodedBytestring e = EncodedBytestring ByteString
-- where n is some Normalization
newtype UTF8Bytestring n = UTF8Bytestring ByteString
type Text = UTF8Bytestring NFC

-- where e is some endianness
newtype EndianByteStream e = EndianByteStream ByteString

newtype Tainted c = Tainted c
newtype Clean c = Clean c

Identifying when data may have multiple interpretations may not be immediately obvious. If you are dealing with underlying representations you did not create, look carefully at variable naming and functions that appear to interconvert between the same type. If you are designing a high-performance data structure, identify your primitive data types (which are distinct from int, char, bool, the primitives of a general purpose programming language.) Multiple interpretations can creep in over time as new features are added to code: be willing to refactor (possibly breaking API compatibility) or speculatively newtype important user-visible data.

A common complaint about newtypes is the wrapping and unwrapping of the type. While some of this is a necessary evil, it should not be ordinarily necessary for end-users to wrap and unwrap the newtypes: the internal representation should stay hidden! (This is a closely related but orthogonal property that newtypes help enforce.) Try not to export newtype constructors; instead, export smart constructors and destructors that do runtime sanity checks and are prefixed with unsafe.

When an underlying value is wrapped in the newtype, you are indicating to the compiler that you believe that the value has a meaningful interpretation under that newtype: do your homework when you wrap something! Conversely, you should assume that an incoming newtype has the appropriate invariants (it’s a valid UTF-8 string, its least significant bit is zero, etc.) implied by that newtype: let the static type checker do that work for you! Newtypes have no runtime overhead: they are strictly checked at compile time.

Applicability. A newtype is no substitute for an appropriate data structure: don’t attempt to do DOM transformations over a bytestring of HTML. Newtypes can be useful even when there is only one interpretation of the underlying representation—however, the immediate benefit derives primarily from encapsulation. However, newtypes are essential when there are multiple interpretations of a representation: don’t leave home without them!

Day in the life of a Galois intern

August 20, 2010

Vrrmm! Vrrmm! Vrrmm!

It’s 9:00AM, and the cell phone next to my pillow is vibrating ominously. I rise and dismiss the alarm before it starts ringing in earnest and peek out the window of my room.

Portland summer is a fickle thing: the weather of the first month of my internship was marked by mist and rain (a phenomenon, Don tells me, which is highly unusual for Portland), while the weather of the second month was a sleepy gray in the mornings. “Is it summer yet?” was the topic of #galois for most of July. But in the heart of August, summer has finally arrived, and the sun greets my gaze. Shorts and a T-shirt, no sweater necessary! I silently go “Yes!”

I finish getting dressed, say goodbye to Pixie, the white cat who is curled up in my desk chair, skip breakfast, grab my bike, and head off in the direction of downtown Portland. (Warning: The rest of this post is similarly free of any sort of technical details (except at the very end)! Also, I’m a pretty terrible photographer with no idea how to post-process images.)

Commuting. I bike to work every day. The ride is about 30 minutes.

Along the way, I cross the Willamette:

This is a double-decker bridge, and the bottom platform lifts up when boats need to pass through. Which, unfortunately, is occasionally during my morning commute, in which case I have to bike further down the Riverside Esplanade.

At the end of my journey, I am greeted with the familiar face of the Commonwealth Building.

It’s a pretty famous building as far as buildings go: it was one of the first glass skyscrapers ever built. I hear architecture students from universities around the city come by to look at the building.

Galois is on the third floor.

I park my bike in one of the handy bikeracks in the office:

and head off to my desk. (Crazy deskmate included. :-)

Office. Now that we’re at Galois, perhaps it’s time for a quick tour of the office. The Galois office is a single floor, with various rooms of note. Of ever-present importance is the kitchen:

from which coffee can be acquired (Portlanders are very serious about their coffee! It makes me almost wish that I was a coffee-drinker):

The kitchen is where the all-hands meeting takes place (Galois is small enough that you can fit all of the company’s employees in a single room—only Ksplice, the startup I interned at, has also earned this distinction). One of the really good reflections on Galois’ culture is the practice of appreciations, during which Galwegians (our name for Galois employees) appreciate one another for things that happened during the week.

There is a small library (a nice quiet place to chill out if some particularly hard thinking is merited):

A conference room:

And even a little room where you can take a nap!

By twelve, we Galwegians are hungry, so we head out to get lunch.

There is one tremendous advantage to being in downtown Portland: the food carts. I’ve never seen anything quite like it: blocks literally have fleets of carts lined up to serve you, whatever style of food you like.

Portland is also famously vegan friendly. You can get Vegan Bacon Cheeseburgers! (They are quite delicious, speaking as a carnivore.)

Or a fruit smoothie.

After we get our food, it’s back to the office to chow down.

Our chief scientist and the engineer who sits across from me are having a post-lunch game of ping pong! (I’ve played a few rounds: they are quite good—back spin, top spin, it’s more than I can keep track of!)

Offer building. On Tuesdays, instead of converging on the kitchen, many of converge to the conference room: it’s the MOB lunch!

MOB stands for “Merged Offer Building”, though the name itself has a nice flavor: “The MOB makes you an offer you can’t refuse.” Unlike traditional product companies, in which you have an engineering department which makes a product and then a sales department who finds clients and convinces them they want to buy your product, at Galois, for many contracts the engineers are the salespeople: they are the ones responsible for writing the proposal we submit for funding. The MOB meeting coordinates all of the various offer building efforts—though it’s had no direct bearing to my internship, sitting in on MOB lunches has been a fascinating peek into the world of SBIRs, procurements, EC&A and many, many more acronyms.

tl;dr Interning at Galois this summer has been a blast, and I’m very sorry that there is only one week left. I’ll miss all of you! ♥

Postscript. After a summer of Tech Talk writeups, I’ll be giving a Galois Tech Talk myself, this coming Tuesday! It will get into the nitty gritty of abcBridge, the Haskell library I built over the summer. If you’re in the area, come check it out!

« Newer Posts Older Posts »