ezyang's blog

My first proof in Isabelle

November 29, 2010

One of the distinctive differences between academic institutions in the United States and in Great Britain is the supplementary learning outside of lectures. We have recitations in the US, which are something like extra lectures, while in the UK they have tutorials, or supervisions as they are called in Cambridge parlance. As always, they are something of a mixed bag: some supervisors are terrible, others are merely competent, and others inspire and encourage a sort of interest in the subject far beyond the outlines of the course.

Nik Sultana, our Logic and Proof supervisor, is one of these individuals. For our last supervision, on something of a whim (egged on by us, the supervisees), he suggested we attempt to prove the following logical statement in Isabelle, the proof assistant that he has been doing his research with.

$\forall x [\lnot P(x) \to Q(x)] \wedge \exists x \lnot Q(x) \to \exists x P(x)$

I first worked out the sequent calculus proof for the statement (left as an exercise for the reader), and then I grabbed Isabelle, downloaded the manual, fired up Proof General, and began my very first proof in Isabelle.

Syntax. The first problem I had was getting a minimal theory to compile. This was because Isabelle requires you to always have an imports line, so I provided Main as an import.

I then tried proving a trivial theory, A –> A and got tripped by stating “by (impI)” instead of “by (rule impI)” (at this point, it was still not clear what ‘rule’ actually did).

I tried proving another theory, conj_rule, straight from the documentation, but transcribed the Unicode to ASCII wrong and ended up with a theory that didn’t match the steps they did. (This was one annoying thing about reading the manual, though I understand why they did it.) Eventually I realized what was wrong, and decided to actually start the proof:

lemma "(ALL x. ~ P x --> Q x) & (EX x. ~ Q x) --> (EX x. P x)"

I first tried non-dot notation, but that failed to syntax check so I introduced dots for all bound variables.

Semantics. The proof was simple:

by blast

But that was cheating :-)

At this point, I felt pretty out-of-the-water: Isabelle uses a natural deduction system, whereas (through my studies) I had the most experience reasoning with equivalences, the sequent calculus, or the tableau calculus (not to mention I had a sequent calculus proof already in hand). As it would turn out, removing the quantifiers would look exactly like it would in normal sequent calculus, but I hadn’t realized it yet.

I stumbled around, blindly applying allE, allI, exE and exI to see what they would. I hadn’t realized the difference between rule, drule and erule yet, so occasionally I’d apply a rule and get a massive expansion in subgoals, and think to myself, “huh, that doesn’t seem right.”

Finally, reading backwards from the universals section, I realized that ==> was a little different from -->, representing a meta-implication that was treated specially by some rules, so I converted to it:

-- "Massage formula"
apply (rule impI)

Once again, I tried applying the universal rules and generally didn’t manage to make the formula look pretty. Then I looked more closely at the Isabelle examples and noticed they used [| P; Q |], not P & Q on the left hand side of ==>, so I found the appropriate rule to massage the formula into this form (the semicolon is the sequent calculi’s colon). I then realized that there was this thing erule, although I still thought you simply applied it when the rule had an E at the end:

apply (erule conjE)

Proof. Everyone loves coding by permuting, so I permuted through the rules again. This time, exE seemed to keep the formula simple, and after a few seconds of head-scratching, would have also been the right thing to do in a sequent calculus proof. I also realized I was doing backwards proof (that is, we take our goals and break them down into subgoals), and suddenly the implication statements in the manual made a lot more sense (look at the right side, not the left!):

apply (erule exE)

This next step took a while. I was fairly easily able to apply (erule allE), which eliminated the universal on the right side of the equation, but it introduced a fresh skolem function and that didn’t seem like what I wanted. I also knew that I should theoretically be able to eliminate the right-hand-side existential, but couldn’t figure out what rule I should use. Trying the usual rules resulted in nonsense, though I think at this point I had figured out when to use the various variants of rule. Eventually, I reread the substituting a variable explicitly section of the manual, cargo-culted the drule_tac syntax, and it worked! :

apply (drule_tac x = "x" in spec)

Experienced Isabellers will have realized that I could eliminate the right exists, but since I had concluded that this was not possible, I went on a detour. I decided to try combining some of the my assumptions to get some useful conclusions. Right now the proof state was:

!!x. [| ~ Q x; ~ P x -> Q x |] ==> (EX x. P x)

I felt I ought to be able to get P x on the left-hand side, and then apply some existential rule because of the true statement P x ==> (EX x. P x). But none of the implication rules or modus ponens applied properly. I also tried swapping over the assumption and goal using contrapositive:

!!x. [| ~  (EX x. P x); ~ P x -> Q x |] ==> Q x

Actually, I had wanted the implication on the right side. Nevertheless, the existential was on the left, so I should have been able to eliminate it… except there was a negation so the rule failed.

More reading revealed that I could use some special syntax to make the implication go on the right side:

!!x. [| ~ Q x; ~ (EX. P x) |] ==> ~ (~ P x -> Q x)

But once again, the conditional prevented me from doing my evil plan of destructing the implication and deriving a contradiction in the assumptions.

So I went back, played some more, and eventually figured out that to eliminate an existential from the goal side, you introduce it (backwards!) And I also found out that _tac could work for basically any rule:

apply (rule_tac x = "x" in exI)

At this point it was a simple propositional proof to prove, and my previous adventures with contraposative gave me an idea how to do it:

apply (erule contrapos_np)
by (drule mp)

Sweet. After I finished the proof, I went back and removed the tacs and checked if Isabelle was able to unify the variables itself; it was, but the intermediate proof goals looked uglier so I added them back in. And that concluded my very first proof in Isabelle. It’s not much, but it involved enough wandering that I felt quite pleased when I finished. Here’s the whole thing:

theory LogicAndProof
imports Main
begin

lemma "(ALL x. ~ P x --> Q x) & (EX x. ~ Q x) --> (EX x. P x)"
-- "Massage the formula into a nicer form to apply deduction rules"
apply (rule impI)
apply (erule conjE)
-- "Start introducing the safe quantifiers"
apply (erule exE)
apply (drule_tac x = "x" in spec)
apply (rule_tac x =" x" in exI)
apply (erule contrapos_np)
by (drule mp)

Reflexivity. Qed.

November 26, 2010

In which Mendeley, Software Foundations and Coq are discussed.

I was grousing on #haskell-blah one day about how annoying it was to organize all of the papers that I have downloaded (and, of course, not read yet.) When you download a paper off the Internet, it will be named all sorts of tremendously unhelpful things like 2010.pdf or icfp10.pdf or paper.pdf. So to have any hope of finding that paper which you skimmed a month ago and vaguely recall the title of, you’ll need some sort of organization system. Pre-Mendeley, I had adopted the convention of AuthorName-PaperTitle.pdf, but I’d always feel a bit bad picking an author out of a list of five people to stick at the beginning, and I still couldn’t ever find the paper I was looking for.

It was at this point that someone (I don’t have logs, so I don’t remember precisely who) pointed me to Mendeley. Mendeley is free (as in beer) software that helps you organize your papers and upload them to the cloud; in return, they get all sorts of interesting data about what papers people are reading and hounds of metadata obsessed freaks like me curating their databases.

It doesn’t have to do much to improve over my existing ad hoc naming scheme. But it does it exceedingly well. After having shifted my paper database over to it, it’s reasonably easy (read, spend an afternoon curating a database of 200 papers) to ensure all of your papers have reasonable metadata attached to them. This reasonable metadata means you can slice your database by author (apparently Simon Peyton Jones and Chris Okasaki are two of my favorite authors) and conference (in case I, like, actually write a paper and need to figure out where to send it). You can also classify papers according to your own topics, which is very good if you’re like me and have bodies of completely unrelated research literature. Simple, but effective.

Oh, I do have some complaints about Mendeley. It’s PDF viewer leaves something to be desired: if I page down it skips entirely to the next page instead of doing a continuous scroll; the metadata extraction could be better (essentially, it should be just good enough to be able to look it up on an online database and then fill in the database); there should be a better workflow for papers (rather than just a read or unread toggle, which is utterly not useful); etc. But it works well enough to bring value, and I’m willing to overlook these nits.

After having organized all of my papers, I suddenly realized that I hadn’t added any new papers to my collection recently. Papers either find my way to me because a friend forwards it on, or I’m looking for some specific topic and a relevant paper pops up, but I don’t actually have any streams of new papers to take a look at. To fix this, I decided to pick some names and go look at their recent publications.

On the way, I noticed an interesting slide deck on Benjamin Pierce’s publications. The deck was for a keynote address named Proof Assistant as Teaching Assistant: A View from the Trenches. I thought this was a quite fascinating approach to the problem of teaching proof, and even better, the course notes were online!

It’s difficult for me to precisely vocalize how unimaginably awesome Software Foundations. I’ve found it a bit difficult to get started with proof assistants because it’s always unclear what exactly you should prove with them: pick something too easy and it feels pointless, pick something too hard and you find yourself without a clue on how to attack the problem. Proof assistants are also rather sophisticated (it reminds me of a time when I was listening to Eric and Trevor discuss proof tactics back at Galois… that was a very hard to follow conversation), so if you dive into the manual you find yourself with all this rope but not very much idea how to use it all.

Software Foundations is great because it’s not teaching you how to use a proof assistant: it’s teaching you about logic, functional programming and the foundations of programming languages, built on top of a proof assistant Coq. So you have a bag of interesting, fundamental theorems about these topics that you want to prove, and then this course shows you how to use the proof assistant to prove them.

It’s also a rather ideal situation for self-study, because unlike many textbook exercises, your Coq interpreter will tell you when you’ve got the right answer. Proof assistants are fun precisely because they’re a bit like puzzles that you can create without knowing the solution before hand, and then solve. So if you’ve got some extra time on your hands and have wanted to learn how to use a proof assistance but never got around to it, I highly recommend checking it out.

Integer sequences every computer scientist should know?

November 24, 2010

The On-Line Encyclopedia of Integer Sequences is quite a nifty website. Suppose that you’re solving a problem, and you come up with the following sequence of integers: 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2... and you wonder to yourself: “huh, what’s that sequence?” Well, just type it in and the answer comes back: A007814, along with all sorts of tasty tidbits like constructions, closed forms, mathematical properties, and more. Even simple sequences like powers of two have a bazillion alternate interpretations and generators.

This got me wondering: what are integer sequences that every computer scientist should know? That is, the ones they should be able to see the first few terms of and think, “Oh, I know that sequence!” and then rack their brains for a little bit trying to remember the construction, closed form or some crucial property. For example, almost anyone with basic math background will recognize the sequences 1, 2, 3, 4, 5; 0, 1, 4, 9, 16; or 1, 1, 2, 3, 5. The very first sequence I cited in this article holds a special place in my heart because I accidentally derived it while working on my article Adventures in Three Monads for The Monad.Reader. Maybe a little less familiar might be 1, 1, 2, 5, 14, 42 or 3, 7, 31, 127, 8191, 131071, 524287, 2147483647, but they are still quite important for computer scientists.

So, what integer sequences every computer scientist should know? (Alternatively, what’s your favorite integer sequence?)

Is multiply-carry strongly universal?

November 22, 2010

I’ve been wanting to implement a count-min sketch for some time now; it’s a little less widely known than the bloom filter, a closely related sketch data structure (that is, a probabilistic data structure that approximates answers to certain queries), but it seems like a pretty practical structure and has been used in some interesting ways.

Alas, when you want to implement a data structure that was proposed less than a decade ago and hasn’t found its way into textbooks yet, there are a lot of theoretical vagaries that get in the way. In this particular case, the theoretical vagary was selection of a universal hash family. Having not taken a graduate-level algorithms course yet, I did not know what a universal hash family was, so it was off to the books for me.

From my survey of course notes, papers and textbooks, I noticed two things.

First, there are a lot of different independence guarantees a universal hash family may have, each of which may go under many different names. Assume that our hash family H is a family of functions from h : M → N where M = {0, 1, ..., m-1} and N = {0, 1, ..., n-1} with m >= n. M corresponds to our “universe”, the possibly values being hashed, while N is the range of the hash function.

A weak universal hash family, also called a weak 2-universal hash family and sometimes stated with the weak elided, is a hash family that for a hash function h chosen uniformly at random from H:
```
∀ x,y ∈ M, x != y. Pr[h(x) = h(y)] ≤ 1/n
```
A strongly 2-universal hash family, also called a (strongly) 2-independent universal hash family and sometimes stated with 2-universal elided, is one that fulfills this condition:
```
∀ x,y ∈ M, a,b ∈ N.
         Pr[h(x) = a ∧ h(y) = b] ≤ 1/n²
```

A (strongly) k-independent universal hash family generalizes the above notion, to the following condition:

∀ x₁,x₂...x_k ∈ M, a₁,a₂...a_k ∈ N.
         Pr[h(x₁) = a₁ ∧ h(x₂) = a₂ ∧ ...] ≤ 1/n^k

Second, the reason why weak is commonly elided from weak hash function is that 2-universal hash families tend to also be 2-independent. Randomized Algorithms states “Most known constructions of 2-universal hash families actually yield a strongly 2-universal hash family. For this reason, the two definitions are generally not distinguished from one another” and asks the student to prove that if n = m = p is a prime number, the Carter and Wegman’s hash family is strongly 2-universal. (I’ll state what this is shortly.) So Wikipedia happily adopts the weak criteria and only briefly mentions 2-independence in the last section. (I have not edited the article because I’m not sure what, if any change, would be made.)

So, what’s Carter and Wegman’s universal hash family? Quite simple:

$h_{a,b}(x) = (ax + b \mod p) \mod n$

given that p ≥ m is prime and $a,b \in {0, 1, \cdots, p-1}$. Except, uh, no one actually uses a modulus in practice. Here’s one example from Cormode’s implementation:

#define MOD 2147483647
#define HL 31
long hash31(long long a, long long b, long long x)
{

  long long result;
  long lresult;  

  // return a hash of x using a and b mod (2^31 - 1)
// may need to do another mod afterwards, or drop high bits
// depending on d, number of bad guys
// 2^31 - 1 = 2147483647

  result=(a * x) + b;
  result = ((result >> HL) + result) & MOD;
  lresult=(long) result; 

  return(lresult);
}

This implementation is clearly correct:

The multiplication and addition can’t overflow the long long result, and
The second line takes advantage of our ability to do fast modulus with Mersenne primes with a few alternate bitwise operations. Of course, in order to do this, we need to be very careful what prime we pick. Mmm magic numbers.

OK, so that’s very nice. There is a minor bit of sloppiness in that we haven’t explicitly ensured that n = m = p, so I’m not 100% convinced we preserve strong universality. But I haven’t worked out the Randomized Algorithms exercise so I don’t know how important this property is in practice.

As an aside, this function also claims to be this very universal hash but I have a hard time believing it:

Tools::UniversalHash::value_type Tools::UniversalHash::hash(
        UniversalHash::value_type x
) const
{
        uint64_t r = m_a[0];
        uint64_t xd = 1;

        for (uint16_t i = 1; i < m_k; i++)
        {
                xd = (xd * x) % m_P;
                r += (m_a[i] * xd) % m_P;
                        // FIXME: multiplications here might overflow.
        }

        r = (r % m_P) & 0xFFFFFFFF;
                // this is the same as x % 2^32. The modulo operation with powers
                // of 2 (2^n) is a simple bitwise AND with 2^n - 1.

        return static_cast<value_type>(r);
}

We now turn our attention to multiply-carry, which Wikipedia claims is the fastest universal hash family currently known for integers. It’s designed to be easy to implement on computers: (unsigned) (a*x) >> (w-M) (with a odd) is all you need. Well, to be precise, it’s the fastest 2-universal has family currently known: the relevant paper only gives the weak universality proof about Pr[h(x) = h(y)].

So, my question is thus: is multiply-carry strongly universal? Motwani and Raghavan suggest it probably is, but I couldn’t dig up a proof.

Postscript. Fortunately, for count-min-sketch, we don’t actually need strong universality. I checked with Graham Cormode and they only use 2-universality in their paper. But the original question still stands… for strictly theoretical grounds, anyway.

Non sequitur. Here’s an interesting combinator for combining functions used in folds:

f1 <&> f2 = \(r1, r2) a -> (f1 r1 a, f2 r2 a)

It lets you bundle up two combining functions so that you can apply both of them to a list in one go:

(foldl xs f1 z1, foldl xs f2 z2) == foldl xs (f1 <&> f2) (z1, z2)

Flipping the combinator would make it work for right folds. This gives us the following cute implementation of the average function:

average = uncurry (/) . foldl' ((+) <&> (flip (const (+1)))) (0,0)

Maybe we could write a rewrite rule to do this for us.

Cambridge potpourri

November 19, 2010

In which Edward tells some stories about Cambridge and posts lots of pictures.

Apparently, Alyssa B. Hacker (sic) went on the Cambridge-MIT exchange.

This is perhaps a phenomenon of having been around MIT for too long, a campus which has a reputation for not being too picturesque (I can probably count the actually pretty spots on campus with one hand), but it’s real easy to stop appreciating how nice the buildings and architecture are around here. Look!

Granted, I don’t live in any of these places; I just pass by them on my way to lecture in central Cambridge. My college, Fitzwilliam, is not so pretty from the outside (pictures omitted), but absolutely gorgeous inside:

Erm, never mind the bags of shredded paper awaiting collection.

A gorgeously illustrated map of Fitzwilliam college (sorry about the glare):

If you squint, you can make out a location on the map called The Grove. There is actually a quite story behind this part of the college: it was actually owned by the Darwin family (as in, the Charles Darwin), and the college was built around the grove, which was not part of the college until Emma Darwin died and the house became incorporated as part of the college.

It is so tremendously easy to wander into a room and see someone famous. Like Tony Hoare, inventor of Quicksort and Hoare logic. I told people I came to Cambridge University in part to get a taste of its theoretical flavour, and I have not been disappointed. I tremendously enjoyed Marcelo Fiore’s lectures on denotational semantics (maybe I’ll get a blog post or two out of what I learned in the class). Other lectures have been a somewhat mixed bag (but then again, when are they not?), but they’ve managed to fill out areas of my education that I didn’t know I didn’t know about. The section about skolemization from my Logic and Proof course reminds me of this blog post from a while back, bemoaning the fact that no one actually tells you that Skolem constants is how you actually implement a typechecker. Well, apparently, skolemization is a classic technique in the logic world, and acts precisely the way you might expect it to with a type system (after all, Curry-Howard isomorphism).

Also, our logic and proof supervisor gives us tea. :-) His room at Trinity college (where we hold our supervisions) is the same room that the brilliant mathematician Ramanujan stayed in while he was a fellow at Trinity. Speaking of which, it’s the hundredth anniversary of Russell and Whitehead’s Principia.

I’ve utterly failed (thus far) at actually doing any research, but I’m enjoying the theoretical talks and soaking in all of this foundational knowledge.

Speaking of which, the computer lab is not in classical style. I guess you’d have a hard time convincing architects these days of constructing old-style buildings. You’ll find oddities like libraries on stilts inside hardcore traditional colleges:

Ah, the march of modernity. If I wasn’t taking History and Philosophy of Science, I might have been stuck in the modern buildings of West Cambridge, but fortunately there is a wonderful little bike path between the lab and Fitzwiliam college (which Nidhi mentioned to me, and which I managed to miss and end up biking down the grass at the back of Churchill college and then along side the road in the wrong direction because of a fence blocking my way):

The food might be bad in Britain, but you really can’t beat its sweets:

Alas, I have banned myself from purchasing any sweets, lest they mysteriously disappear before I return to my college. Nutella for breakfast has also proved a stunningly bad idea and I have canned the practice in favor for a pastry and two pieces of fruit. Cambridge has a market square which is open every day (good!) but does not have very much local produce (understandable but boo!)

Actually, I discovered the library on stilts because I was attempting to determine the location of an ICUSU event and, unlike at MIT, I don’t carry around my laptop everywhere with me while in Cambridge (gasp!) I eventually found the location:

If you say you are punting, it means a quite different thing in Cambridge then at MIT. Though I consulted Wikipedia, which agrees with the Cambridge sense of the term. Oh, did I mention, dogs punt too!

Punting is a bit different from sailing; in particular, you push the punt pole in the direction you want to go (whereas the tiller goes in the opposite direction you want to go.)

This post is getting a bit long, so maybe I’ll save more for later.

Postscript. I noticed a (not particularly insightful) relationship between currying and church encoding today: namely, if you curry the destructor function (either, maybe, foldr, etc) with the data, you get the function with the same type as the Church encoding of that data type.

Haskell.org committee

November 17, 2010

Ever try to go to haskell.org and notice that it was down? Well, now you have someone to complain to: the haskell.org committee has formed and apparently I’m on it. 8-)

One of the first things we’ll be doing is moving haskell.org from a server being hosted at Yale (the hosting has been good, but what will happen is the server will go down during the weekend and there will be no one to kick it until Monday) to some dedicated hardware. I must admit, I do feel a bit sheepish for being on the committee but not having any bits (or direct experience) to help do the maintenance work—hopefully that will change soon.

It's just a longjmp to the left

November 15, 2010

And then a signal to the ri-i-i-ight.

One notable wart with readline is that if you ^C during the prompt, nothing happens, and if you do a second ^C (and weren’t buggering around with the signal handlers) your entire program unceremoniously terminates. That’s not very nice! Fortunately, readline appears to be one of the rare C libraries that actually put some work into making sure that you could longjmp out of a signal handler and not completely break the library’s internal state (they do this with liberal masking and unmasking, and their own signal handler which cleans up and then rethrows the signal).

So I decided I was going to see if I could patch up readline to longjmp out of the signal handler (signal provided by yours truly) and give control back to Haskell. This monstrosity resulted.

static jmp_buf hs_readline_jmp_buf;
static struct sigaction old_sigpipe_action;

void hs_readline_sigaction (int signo, siginfo_t *info, void *data)
{
    sigaction(SIGALRM, &old_sigpipe_action, NULL);
    siglongjmp(hs_readline_jmp_buf, signo);
}

char *hs_readline (const char *s)
{
    struct sigaction action;
    int signo;
    sigset_t mask;
    memset(&action, 0, sizeof(struct sigaction));
    sigemptyset(&mask);
    sigaddset(&mask, SIGALRM);
    action.sa_sigaction = hs_readline_sigaction;
    action.sa_mask = mask;
    action.sa_flags = SA_RESETHAND;
    sigaction(SIGALRM, &action, &old_sigpipe_action);
    if (signo = sigsetjmp(hs_readline_jmp_buf, 1)) {
        return NULL;
    }
    char *r = readline(s);
    sigaction(SIGALRM, &old_sigpipe_action, NULL);
    return r;
}

It actually works pretty wonderfully, despite the somewhat circuitous route the signal takes: the SIGINT will first get handled by readline’s installed signal handler, which will clean up changes to the terminal and then rethrow it to GHC’s signal handler. GHC will tell the IO manager that a signal happened, and then go back to the innards of readline (who reinstates all changes in the terminal). Then, the IO manager reads out the signal, and sends a ThreadKilled exception, which then results in the RTS trying to interrupt the foreign call. The SIGALRM (actually, that’s a lie, the code that’s in GHC sends a SIGPIPE, but readline doesn’t think a SIGPIPE is a signal it should cleanup after, so I changed it—better suggestions welcome) hits readline’s signal handler again, we clean up the terminal, and then we hit our signal handler, which longjmps to a return NULL which will take us back to Haskell. And then the signal is caught and there is much rejoicing.

Unfortunately, almost all of that code is boilerplate and I can’t stick it in a nice Haskell combinator because the when Haskell is executing there’s no stack to speak of, and I bet a setjmp FFI call would make the RTS very confused. It’s also not reentrant although I doubt readline is reentrant either. And of course, nonlocal control transfer from a signal handler is something your Mom always told you not to do. So this approach probably doesn’t generalize. But it’s pretty amusing.

You are in a maze of twisty little passages, all alike... (a GHC hacking post)

November 12, 2010

About a month ago I decided that it would be cool if I could solve the bug GHC’s runtime never terminates unused worker threads. Well, I just got around to looking at it today, and after wandering aimlessly around the twisty maze that is the GHC RTS for an hour or so, I finally found a light at the end of a tunnel, in the form of a heart-warmingly simple patch. I’ve sent mail off to Simon Marlow to make sure the light isn’t actually a train, but it occurred to me that it would be interesting to look at my command history and blog about the process by which I came to the conclusion that line 464 of Capability.c was the correct place to add my change, since this sort of mental journey is not the one that is really ever recorded anywhere in any shape or form.

Warmups before running the maze. In a shifty shifty maze like GHC, you want to make sure the guided route (i.e. a clean build) is working before trying anything fancy. I use a separate build tree from source tree, so getting everything up to date involves:

cd ghc-clean
./darcs-all get
./darcs-all pull -a
cd ../ghc-build
lndir ../ghc-clean
perl boot && ./configure && make
inplace/bin/ghc-stage2 --interactive

When this has been resolved in a satisfactory manner (a non-trivial task for platforms with Windows), the code hunting can begin.

Grab your equipment. What? You mean to say you’ve wandered into this maze and you don’t even know how to tell you’ve gotten to your destination? That’s no good… you’ll need a dousing rod of some sort… something to tell you when you’ve got it right.

In this particular case, the original bug reporter had written up a small, incomplete test script, so the first thing I did was flesh it out into a script that required no human interaction. The benchmark for the new script was clear: /proc/PID/task should report a number substantially smaller than 200. To see that the current implementation is broken:

ezyang@javelin:~/Dev/ghc-build/testsuite/tests/ghc-regress/ffi/should_run$ ./cleanupThreads 
203

Getting your bearings. Ok, so what do we want? We want threads to die instead of hanging around. There are two ways to do this: have the thread commit seppuku when it realizes it isn’t wanted, or have some manager kill the thread as necessary. The later is generally considered poor form, since you want to make sure the threads aren’t doing anything critical that will get corrupted if they die. So seppuku it is. Here, now, there are two questions:

When does the thread decide to go into a waiting pool? This is presumably where we’d want it to terminate itself instead.
How would the thread decide whether or not it should hang around or bug out?

Mapping out the land. GHC has this little runtime flag called -Ds. It’s pretty useful: it dumps out a whole gaggle of debug information concerning threads, which is precisely what we’d like to look for. Our plan of action is to look at what the thread activity looks like in our test script, and identify the points at which threads should be dying instead of hanging around.

The very beginning of the log looks like this:

b75006d0: allocated 1 capabilities
b75006d0: new task (taskCount: 1)
b75006d0: returning; I want capability 0
b75006d0: resuming capability 0
b75006d0: starting new worker on capability 0
b75006d0: new worker task (taskCount: 2)
b75006d0: task exiting
b75006d0: new task (taskCount: 2)
b75006d0: returning; I want capability 0
b71ffb70: cap 0: schedule()
b71ffb70: giving up capability 0

Note the number b75006d0; that’s our main thread and it’s going to be quite a busy beaver. Here is the very first thread we spin off to make a foreign call, but it finishes fairly quickly and isn’t the foreign call we are looking for:

b75006d0: cap 0: created thread 1
b75006d0: cap 0: thread 1 appended to run queue
b75006d0: new bound thread (1)
b75006d0: cap 0: schedule()
b75006d0: cap 0: running thread 1 (ThreadRunGHC)
b75006d0: cap 0: thread 1 stopped (suspended while making a foreign call)
b75006d0: freeing capability 0
b75006d0: returning; I want capability 0
b75006d0: resuming capability 0
b75006d0: cap 0: running thread 1 (ThreadRunGHC)
b75006d0: cap 0: thread 1 stopped (suspended while making a foreign call)
b75006d0: freeing capability 0
b75006d0: returning; I want capability 0
b75006d0: resuming capability 0
b75006d0: cap 0: running thread 1 (ThreadRunGHC)
b75006d0: cap 0: created thread 2
b75006d0: cap 0: thread 2 appended to run queue
b75006d0: cap 0: thread 1 stopped (finished)

Not before long, we see a veritable avalanche of new threads being created and added to the run queue—these are our threads:

b75006d0: woken up on capability 0
b75006d0: resuming capability 0
b75006d0: cap 0: running thread 3 (ThreadRunGHC)
b75006d0: cap 0: created thread 4
b75006d0: cap 0: thread 4 appended to run queue
b75006d0: cap 0: created thread 5
b75006d0: cap 0: thread 5 appended to run queue
b75006d0: cap 0: created thread 6
b75006d0: cap 0: thread 6 appended to run queue
b75006d0: cap 0: created thread 7
b75006d0: cap 0: thread 7 appended to run queue
b75006d0: cap 0: created thread 8
b75006d0: cap 0: thread 8 appended to run queue
b75006d0: cap 0: created thread 9
b75006d0: cap 0: thread 9 appended to run queue
b75006d0: cap 0: created thread 10
b75006d0: cap 0: thread 10 appended to run queue
b75006d0: cap 0: created thread 11
b75006d0: cap 0: thread 11 appended to run queue
b75006d0: cap 0: created thread 12
b75006d0: cap 0: thread 12 appended to run queue
b75006d0: cap 0: created thread 13

The process continues until we’ve spawned them all:

54139b70: starting new worker on capability 0
54139b70: new worker task (taskCount: 201)
53938b70: cap 0: schedule()
53938b70: cap 0: running thread 202 (ThreadRunGHC)
53938b70: cap 0: thread 202 stopped (suspended while making a foreign call)
53938b70: starting new worker on capability 0
53938b70: new worker task (taskCount: 202)
53137b70: cap 0: schedule()
53137b70: cap 0: running thread 203 (ThreadRunGHC)
53137b70: cap 0: thread 203 stopped (suspended while making a foreign call)
53137b70: starting new worker on capability 0
53137b70: new worker task (taskCount: 203)
52936b70: cap 0: schedule()

And then, since there’s nothing to do (all of our threads are in FFI land), we go and run a major GC:

52936b70: woken up on capability 0
52936b70: resuming capability 0
52936b70: deadlocked, forcing major GC...
52936b70: cap 0: requesting parallel GC
52936b70: ready_to_gc, grabbing GC threads
all threads:
threads on capability 0:
other threads:
    thread  203 @ 0xb72b5c00 is blocked on an external call (TSO_DIRTY)
    thread  202 @ 0xb72b5800 is blocked on an external call (TSO_DIRTY)
    thread  201 @ 0xb72b5400 is blocked on an external call (TSO_DIRTY)
    thread  200 @ 0xb72b5000 is blocked on an external call (TSO_DIRTY)
    thread  199 @ 0xb72b4c00 is blocked on an external call (TSO_DIRTY)
    thread  198 @ 0xb72b4800 is blocked on an external call (TSO_DIRTY)
    thread  197 @ 0xb72b4400 is blocked on an external call (TSO_DIRTY)
    thread  196 @ 0xb72b4000 is blocked on an external call (TSO_DIRTY)
    thread  195 @ 0xb72b3c00 is blocked on an external call (TSO_DIRTY)
    thread  194 @ 0xb72b3800 is blocked on an external call (TSO_DIRTY)
    thread  193 @ 0xb72b3400 is blocked on an external call (TSO_DIRTY)
    [snip (you get the idea)]

(I’ve always kind of wondered whether or not FFI calls should be considered deadlocked.)

Now the threads start coming back from FFI-land and idling:

b69feb70: cap 0: running thread 4 (ThreadRunGHC)
b69feb70: cap 0: waking up thread 3 on cap 0
b69feb70: cap 0: thread 3 appended to run queue
b69feb70: cap 0: thread 4 stopped (finished)
b69feb70: giving up capability 0
b69feb70: there are 2 spare workers
b69feb70: passing capability 0 to bound task 0xb75006d0
b61fdb70: returning; I want capability 0
b61fdb70: resuming capability 0
b61fdb70: cap 0: running thread 5 (ThreadRunGHC)
b59fcb70: returning; I want capability 0
b61fdb70: cap 0: thread 5 stopped (finished)
b61fdb70: giving up capability 0
b61fdb70: there are 3 spare workers
b61fdb70: passing capability 0 to worker 0xb59fcb70
b75006d0: woken up on capability 0
b75006d0: capability 0 is owned by another task
b51fbb70: returning; I want capability 0
b59fcb70: resuming capability 0
b59fcb70: cap 0: running thread 6 (ThreadRunGHC)
b59fcb70: cap 0: thread 6 stopped (finished)
b59fcb70: giving up capability 0
b49fab70: returning; I want capability 0
b59fcb70: there are 4 spare workers
b59fcb70: passing capability 0 to worker 0xb51fbb70
b51fbb70: resuming capability 0
b51fbb70: cap 0: running thread 7 (ThreadRunGHC)
b51fbb70: cap 0: thread 7 stopped (finished)
b51fbb70: giving up capability 0
b41f9b70: returning; I want capability 0
b51fbb70: there are 5 spare workers

I’ve actually cheated a little: the there are X spare workers debug statements I added myself. But this section is golden; we’re specifically interested in these lines:

b61fdb70: cap 0: thread 5 stopped (finished)
b61fdb70: giving up capability 0

The thread stops, but it doesn’t die, it just gives up the capability. These are two extremely good candidates for where the thread might alternately decide to kill itself.

Placemarkers. It’s time to bust out the trusty old grep and figure out where these debug messages are being emitted from. Unfortunately, 5 and finished are probably dynamically generated messages, so stopped is the only real identifier. Fortunately, that’s specific enough for me to find the right line in the RTS:

ezyang@javelin:~/Dev/ghc-clean/rts$ grep -R stopped .
./Capability.c:    // list of this Capability.  A worker can mark itself as stopped,
./Capability.c:    if (!isBoundTask(task) && !task->stopped) {
./RaiseAsync.c:     - all the other threads in the system are stopped (eg. during GC).
./RaiseAsync.c:    // if we got here, then we stopped at stop_here
./Task.c:        if (task->stopped) {
./Task.c:    task->stopped       = rtsFalse;
./Task.c:    task->stopped = rtsFalse;
./Task.c:    task->stopped = rtsTrue;
./Task.c:    task->stopped = rtsTrue;
./Task.c:   debugBelch("task %p is %s, ", taskId(task), task->stopped ? "stopped" : "alive");
./Task.c:   if (!task->stopped) {
./sm/GC.c:      // The other threads are now stopped.  We might recurse back to
./Schedule.c:          "--<< thread %ld (%s) stopped: requesting a large block (size %ld)\n", 
./Schedule.c:          "--<< thread %ld (%s) stopped to switch evaluators", 
./Schedule.c:   // stopped.  We need to stop all Haskell threads, including
./Trace.c:        debugBelch("cap %d: thread %lu stopped (%s)\n",   ### THIS IS THE ONE
./Task.h:    rtsBool    stopped;         // this task has stopped or exited Haskell
./Task.h:// Notify the task manager that a task has stopped.  This is used
./Task.h:// Put the task back on the free list, mark it stopped.  Used by
./Interpreter.c:                  // already stopped at just now
./Interpreter.c:            // record that this thread is not stopped at a breakpoint anymore
./win32/Ticker.c:            // it still hasn't stopped.

That line in Trace.c is actually in a generic debugging function traceSchedEvent_stderr, but fortunately there’s a big case statement on one of its arguments tag:

case EVENT_STOP_THREAD:     // (cap, thread, status)
    debugBelch("cap %d: thread %lu stopped (%s)\n",·
               cap->no, (lnat)tso->id, thread_stop_reasons[other]);
    break;

So EVENT_STOP_THREAD is a good next grep. And sure enough:

ezyang@javelin:~/Dev/ghc-clean/rts$ grep -R EVENT_STOP_THREAD .
./Trace.c:    case EVENT_STOP_THREAD:     // (cap, thread, status)
./eventlog/EventLog.c:  [EVENT_STOP_THREAD]         = "Stop thread",
./eventlog/EventLog.c:        case EVENT_STOP_THREAD:     // (cap, thread, status)
./eventlog/EventLog.c:    case EVENT_STOP_THREAD:     // (cap, thread, status)
./Trace.h:    HASKELLEVENT_STOP_THREAD(cap, tid, status)
./Trace.h:    traceSchedEvent(cap, EVENT_STOP_THREAD, tso, status);

It looks to be an inline function in Trace.h:

INLINE_HEADER void traceEventStopThread(Capability          *cap    STG_UNUSED, 
                                        StgTSO              *tso    STG_UNUSED, 
                                        StgThreadReturnCode  status STG_UNUSED)
{
    traceSchedEvent(cap, EVENT_STOP_THREAD, tso, status);
    dtraceStopThread((EventCapNo)cap->no, (EventThreadID)tso->id,
                     (EventThreadStatus)status);
}

Classy. So traceEventStopThread is the magic word, and sure enough:

ezyang@javelin:~/Dev/ghc-clean/rts$ grep -R traceEventStopThread .
./Schedule.c:    traceEventStopThread(cap, t, ret);
./Schedule.c:  traceEventStopThread(cap, tso, THREAD_SUSPENDED_FOREIGN_CALL);
./Trace.h:INLINE_HEADER void traceEventStopThread(Capability          *cap    STG_UNUSED,

There are two plausible sites in Schedule.c.

Going digging. We first have to pick which site to inspect more closely. Fortunately, we notice that the second trace event corresponds to suspending the thread before going into a safe FFI call; that’s certainly not what we’re looking at here. Furthermore, the first is in the scheduler, which makes a lot of sense. But there’s nothing obvious in this vicinity that you might associate with saving a worker task away due to lack of work.

What about that giving up capability message? Some more grepping reveals it to be in the yieldCapability function (like one might expect). If we then trace backwards calls to yieldCapability, we see it is invoked by scheduleYield, which is in turn called by the scheduler loop:

scheduleYield(&cap,task);

if (emptyRunQueue(cap)) continue; // look for work again

// Get a thread to run
t = popRunQueue(cap);

This is very, very interesting. It suggests that the capability itself will tell us whether or not the work to do, and that yieldCapability is a promising function to look further into:

debugTrace(DEBUG_sched, "giving up capability %d", cap->no);

// We must now release the capability and wait to be woken up
// again.
task->wakeup = rtsFalse;
releaseCapabilityAndQueueWorker(cap);

That last call looks intriguing:

static void
releaseCapabilityAndQueueWorker (Capability* cap USED_IF_THREADS)
{
    Task *task;

    ACQUIRE_LOCK(&cap->lock);

    task = cap->running_task;

    // If the current task is a worker, save it on the spare_workers
    // list of this Capability.  A worker can mark itself as stopped,
    // in which case it is not replaced on the spare_worker queue.
    // This happens when the system is shutting down (see
    // Schedule.c:workerStart()).
    if (!isBoundTask(task) && !task->stopped) {
   task->next = cap->spare_workers;
   cap->spare_workers = task;
    }
    // Bound tasks just float around attached to their TSOs.

    releaseCapability_(cap,rtsFalse);

    RELEASE_LOCK(&cap->lock);
}

We’ve found it!

Checking the area. The spare_workers queue looks like the queue in which worker threads without anything to do go to chill out. We should verify that this is the case:

int i;
Task *t;
for (i = 0, t = cap->spare_workers; t != NULL; t = t->next, i++) {}
debugTrace(DEUBG_sched, "there are %d spare workers", i);

Indeed, as we saw in the debug statements above, this was indeed the case: the number of spare workers kept increasing:

54139b70: there are 199 spare workers
54139b70: passing capability 0 to worker 0x53938b70
53938b70: resuming capability 0
53938b70: cap 0: running thread 202 (ThreadRunGHC)
53938b70: cap 0: thread 202 stopped (blocked)
    thread  202 @ 0xb727a400 is blocked on an MVar @ 0xb72388a8 (TSO_DIRTY)
53938b70: giving up capability 0
53938b70: there are 200 spare workers
53938b70: passing capability 0 to worker 0x53137b70
53137b70: resuming capability 0
53137b70: cap 0: running thread 203 (ThreadRunGHC)
53137b70: cap 0: thread 203 stopped (blocked)
    thread  203 @ 0xb727a000 is blocked on an MVar @ 0xb72388a8 (TSO_DIRTY)
53137b70: giving up capability 0
53137b70: there are 201 spare workers

Writing up the solution. So, the patch from here is simple, since we’ve found the correct location. We check if the queue of spare workers is at some number, and if it is, instead of saving ourselves to the queue we just cleanup and then kill ourselves:

for (i = 1; t != NULL && i < 6; t = t->next, i++) {}
if (i >= 6) {
    debugTrace(DEBUG_sched, "Lots of spare workers hanging around, terminating this thread");
    releaseCapability_(cap,rtsFalse);
    RELEASE_LOCK(&cap->lock);
    pthread_exit(NULL);
}

And then we test see that this indeed has worked:

ezyang@javelin:~/Dev/ghc-build/testsuite/tests/ghc-regress/ffi/should_run$ ./cleanupThreads 
7

Postscript. There are some obvious deficiencies with this proof-of-concept. It’s not portable. We need to convince ourselves that this truly does all of the cleanup that the RTS expects a worker to do. Maybe our data representation could be more efficient (we certainly don’t need a linked list if the number of values we’ll be storing is fixed.) But these are questions best answered by someone who knows the RTS better, so at this point I sent in the proof of concept for further review. Fingers crossed!

mod_fcgid 2.3 is broken (fixed in 2.3.6)

November 10, 2010

This is a post to get some Google juice for a problem that basically prevented Scripts from being able to cut over from Fedora 11 to Fedora 13. The cluster of new machines kept falling over from load, and we kept scratching our heads, wondering, “Why?”

Turns out, the following commit broke mod_fcgid in a pretty terrifying way: essentially, mod_fcgid is unable to manage the pools of running FastCGI processes, so it keeps spawning new ones until the system runs out of memory. This is especially obvious in systems with large amounts of generated virtual hosts, i.e. people using mod_vhost_ldap. It got fixed in mod_fcgid 2.3.6, which was released last weekend.

Unrelatedly. I’ve been sort of turning around in my head a series of Philosophy of Computer Science posts, where I try to identify interesting philosophical questions in our field beyond the usual topics of discussion (cough AI cough.) The hope is to draw in a number of topics traditionally associated with philosophy of science, of mathematics, of biology, etc. and maybe pose a few questions of my own. One of the favorite pastimes of philosophers is to propose plausible sounding theories and then come up with perplexing examples which seem to break them down, and it sounds like this in itself could generate some Zen koans as well, which are always fun.

Medieval medicine and computers

November 8, 2010

This is a bit of a provocative post, and its impressions (I dare not promote them to the level of conclusions) should be taken with the amount of salt found in a McDonald’s Happy Meal. Essentially, I was doing some reading about medieval medicine and was struck by some of the similarities between it and computer engineering, which I attempt to describe below.

Division between computer scientists and software engineers. The division between those who studied medicine, the physics (a name derived from physica or natural science) and those who practiced medicine, the empirics, was extremely pronounced. There was a mutual distrust—Petrarch wrote, “I have never believed in doctors nor ever will” (Porter 169)—that stemmed in part from the social division between the physics and empirics. Physics would have obtained a doctorate from a university, having studied one of the highest three faculties possible (the others being theology and law), and tended to be among the upper strata of society. In fact, the actual art of medicine was not considered “worthy of study,” though the study of natural science was. (Cook 407).

The division between computer scientists and software engineers is not as bad as the corresponding division in the 1500s, but there is a definite social separation (computer scientists work in academia, software engineers work in industry) and a communication gap between these two communities. In many other fields, a PhD is required to be even considered for a job in your field; here, we see high school students starting up software companies (occasionally successfully) all the time, and if programming Reddit is any indication, there is a certain distaste for purely theoretical computer scientists.

The unsuitability of pure computer science for the real world. Though the study of pure medicine was highly regarded during this time, its theories and knowledge were tremendously off the mark. At the start of the 1500s the four humors of Hippocratic medicine were still widely believed: to be struck by disease was to have an imbalance of black bile, yellow bile, phlegm and blood, and thus the cure would be to apply the counteracting humor, and justified such techniques as bloodletting (the practice of purposely bleeding a person). Even the understanding of how the fundamentals of the human body worked were ill-understood: it was not until William Harvey and his De motu cordis et sanguinis (1628) that the earlier view that food was concocted in the organs and then flowed outwards via the veins, arteries and nerves to the rest of the body was challenged by the notion of the heart as a pump. (Cook 426) If the circulatory system was true, what did the other organs do? Harvey’s theory completely overturned the existing understanding of how the human body worked, and his theory was quite controversial.

I have enough faith in computer science that I don’t think most of our existing knowledge is fundamentally wrong. But I also think we know tremendously little about the actual nature of computation even at middling sizes, and this is a quite humbling fact. But I am also fundamentally optimistic about the future of computer science in dealing with large systems—more on this at the end.

Testing instead of formal methods. The lack of knowledge, however, did stop the physicians (as distinct from physics) from practicing their medicine. Even the academics recognized the importance of “medieval practica; handbooks listing disorders from head to toe with a description of symptoms and treatment.” (Porter 172) The observational (Hippocratic) philosophy, continued to hold great sway: Thomas Sydenham, when asked on the subject of dissection, stated “Anatomy—Botany—Nonsense! Sir, I know an old woman in Covent Garden who understand botany better, and as for anatomy, my butcher can dissect a joint full and well; now, young man, all that is stuff; you must go to the bedside, it is there alone you can learn disease.” (Porter 229)

In the absence of a convincing theoretical framework, empiricism rules. The way to gain knowledge is to craft experiments, conduct observations, and act accordingly. If a piece of code is buggy, how do you fix it? You add debug statements and observe the output, not construct a formal semantics and then prove the relevant properties. The day the latter is the preferred method of choice is a day when practitioners of formal methods across the world will rejoice.

No silver bullet. In the absence of reliable medical theories from the physics, quackery flourished in the eighteenth century, a century often dubbed the “Golden Age of Quackery.” (Porter 284) There was no need to explain why your wares worked: one simply needed to give a good show (“drawing first a crowd and then perhaps some teeth, both to the accompaniment of drums and trumpets” (Porter 285)), sell a few dozen bottles of your cure, and then move on to the next town. These “medicines” would claim to do anything from cure cancer to restore youth. While some of the quacks were merely charlatans, others earnestly believed in the efficacy of their treatments, and occasionally a quack remedy was actually effective.

I think the modern analogue to quackery are software methodology in all shapes in sizes. Like quack medicines, some of these may be effective, but in the absence of scientific explanations we can only watch the show, buy in, and see if it works. And we can’t hope for any better until our underlying theories are better developed.

The future. Modern medical science was eventually saved, though not before years of inadequate theories and quackery had brought it to a state of tremendous disrepute. The complexity of the craft had to be legitimized, but this would not happen until a powerful scientific revolution built the foundations of modern medical practice.

Works referenced:

Porter, Roy. The Greatest Benefit To Mankind. Fontana Press: 1999.
Park, Katherine; Daston, Lorraine. The Cambridge History of Science: Volume 3, Early Modern Science.

« Newer Posts Older Posts »