November 24, 2010The On-Line Encyclopedia of Integer Sequences is quite a nifty website. Suppose that you’re solving a problem, and you come up with the following sequence of integers: 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2... and you wonder to yourself: “huh, what’s that sequence?” Well, just type it in and the answer comes back: A007814, along with all sorts of tasty tidbits like constructions, closed forms, mathematical properties, and more. Even simple sequences like powers of two have a bazillion alternate interpretations and generators.
This got me wondering: what are integer sequences that every computer scientist should know? That is, the ones they should be able to see the first few terms of and think, “Oh, I know that sequence!” and then rack their brains for a little bit trying to remember the construction, closed form or some crucial property. For example, almost anyone with basic math background will recognize the sequences 1, 2, 3, 4, 5; 0, 1, 4, 9, 16; or 1, 1, 2, 3, 5. The very first sequence I cited in this article holds a special place in my heart because I accidentally derived it while working on my article Adventures in Three Monads for The Monad.Reader. Maybe a little less familiar might be 1, 1, 2, 5, 14, 42 or 3, 7, 31, 127, 8191, 131071, 524287, 2147483647, but they are still quite important for computer scientists.
So, what integer sequences every computer scientist should know? (Alternatively, what’s your favorite integer sequence?)
November 22, 2010I’ve been wanting to implement a count-min sketch for some time now; it’s a little less widely known than the bloom filter, a closely related sketch data structure (that is, a probabilistic data structure that approximates answers to certain queries), but it seems like a pretty practical structure and has been used in some interesting ways.
Alas, when you want to implement a data structure that was proposed less than a decade ago and hasn’t found its way into textbooks yet, there are a lot of theoretical vagaries that get in the way. In this particular case, the theoretical vagary was selection of a universal hash family. Having not taken a graduate-level algorithms course yet, I did not know what a universal hash family was, so it was off to the books for me.
From my survey of course notes, papers and textbooks, I noticed two things.
First, there are a lot of different independence guarantees a universal hash family may have, each of which may go under many different names. Assume that our hash family H is a family of functions from h : M → N where M = {0, 1, ..., m-1} and N = {0, 1, ..., n-1} with m >= n. M corresponds to our “universe”, the possibly values being hashed, while N is the range of the hash function.
A weak universal hash family, also called a weak 2-universal hash family and sometimes stated with the weak elided, is a hash family that for a hash function h chosen uniformly at random from H:
∀ x,y ∈ M, x != y. Pr[h(x) = h(y)] ≤ 1/n
A strongly 2-universal hash family, also called a (strongly) 2-independent universal hash family and sometimes stated with 2-universal elided, is one that fulfills this condition:
∀ x,y ∈ M, a,b ∈ N.
Pr[h(x) = a ∧ h(y) = b] ≤ 1/n²
A (strongly) k-independent universal hash family generalizes the above notion, to the following condition:
∀ x₁,x₂...x_k ∈ M, a₁,a₂...a_k ∈ N.
Pr[h(x₁) = a₁ ∧ h(x₂) = a₂ ∧ ...] ≤ 1/n^k
Second, the reason why weak is commonly elided from weak hash function is that 2-universal hash families tend to also be 2-independent. Randomized Algorithms states “Most known constructions of 2-universal hash families actually yield a strongly 2-universal hash family. For this reason, the two definitions are generally not distinguished from one another” and asks the student to prove that if n = m = p is a prime number, the Carter and Wegman’s hash family is strongly 2-universal. (I’ll state what this is shortly.) So Wikipedia happily adopts the weak criteria and only briefly mentions 2-independence in the last section. (I have not edited the article because I’m not sure what, if any change, would be made.)
So, what’s Carter and Wegman’s universal hash family? Quite simple:
$h_{a,b}(x) = (ax + b \mod p) \mod n$
given that p ≥ m is prime and $a,b \in {0, 1, \cdots, p-1}$. Except, uh, no one actually uses a modulus in practice. Here’s one example from Cormode’s implementation:
#define MOD 2147483647
#define HL 31
long hash31(long long a, long long b, long long x)
{
long long result;
long lresult;
// return a hash of x using a and b mod (2^31 - 1)
// may need to do another mod afterwards, or drop high bits
// depending on d, number of bad guys
// 2^31 - 1 = 2147483647
result=(a * x) + b;
result = ((result >> HL) + result) & MOD;
lresult=(long) result;
return(lresult);
}
This implementation is clearly correct:
- The multiplication and addition can’t overflow the
long long result, and - The second line takes advantage of our ability to do fast modulus with Mersenne primes with a few alternate bitwise operations. Of course, in order to do this, we need to be very careful what prime we pick. Mmm magic numbers.
OK, so that’s very nice. There is a minor bit of sloppiness in that we haven’t explicitly ensured that n = m = p, so I’m not 100% convinced we preserve strong universality. But I haven’t worked out the Randomized Algorithms exercise so I don’t know how important this property is in practice.
As an aside, this function also claims to be this very universal hash but I have a hard time believing it:
Tools::UniversalHash::value_type Tools::UniversalHash::hash(
UniversalHash::value_type x
) const
{
uint64_t r = m_a[0];
uint64_t xd = 1;
for (uint16_t i = 1; i < m_k; i++)
{
xd = (xd * x) % m_P;
r += (m_a[i] * xd) % m_P;
// FIXME: multiplications here might overflow.
}
r = (r % m_P) & 0xFFFFFFFF;
// this is the same as x % 2^32. The modulo operation with powers
// of 2 (2^n) is a simple bitwise AND with 2^n - 1.
return static_cast<value_type>(r);
}
We now turn our attention to multiply-carry, which Wikipedia claims is the fastest universal hash family currently known for integers. It’s designed to be easy to implement on computers: (unsigned) (a*x) >> (w-M) (with a odd) is all you need. Well, to be precise, it’s the fastest 2-universal has family currently known: the relevant paper only gives the weak universality proof about Pr[h(x) = h(y)].
So, my question is thus: is multiply-carry strongly universal? Motwani and Raghavan suggest it probably is, but I couldn’t dig up a proof.
Postscript. Fortunately, for count-min-sketch, we don’t actually need strong universality. I checked with Graham Cormode and they only use 2-universality in their paper. But the original question still stands… for strictly theoretical grounds, anyway.
Non sequitur. Here’s an interesting combinator for combining functions used in folds:
f1 <&> f2 = \(r1, r2) a -> (f1 r1 a, f2 r2 a)
It lets you bundle up two combining functions so that you can apply both of them to a list in one go:
(foldl xs f1 z1, foldl xs f2 z2) == foldl xs (f1 <&> f2) (z1, z2)
Flipping the combinator would make it work for right folds. This gives us the following cute implementation of the average function:
average = uncurry (/) . foldl' ((+) <&> (flip (const (+1)))) (0,0)
Maybe we could write a rewrite rule to do this for us.
November 19, 2010In which Edward tells some stories about Cambridge and posts lots of pictures.
Apparently, Alyssa B. Hacker (sic) went on the Cambridge-MIT exchange.

This is perhaps a phenomenon of having been around MIT for too long, a campus which has a reputation for not being too picturesque (I can probably count the actually pretty spots on campus with one hand), but it’s real easy to stop appreciating how nice the buildings and architecture are around here. Look!



Granted, I don’t live in any of these places; I just pass by them on my way to lecture in central Cambridge. My college, Fitzwilliam, is not so pretty from the outside (pictures omitted), but absolutely gorgeous inside:

Erm, never mind the bags of shredded paper awaiting collection.

A gorgeously illustrated map of Fitzwilliam college (sorry about the glare):

If you squint, you can make out a location on the map called The Grove. There is actually a quite story behind this part of the college: it was actually owned by the Darwin family (as in, the Charles Darwin), and the college was built around the grove, which was not part of the college until Emma Darwin died and the house became incorporated as part of the college.

It is so tremendously easy to wander into a room and see someone famous. Like Tony Hoare, inventor of Quicksort and Hoare logic. I told people I came to Cambridge University in part to get a taste of its theoretical flavour, and I have not been disappointed. I tremendously enjoyed Marcelo Fiore’s lectures on denotational semantics (maybe I’ll get a blog post or two out of what I learned in the class). Other lectures have been a somewhat mixed bag (but then again, when are they not?), but they’ve managed to fill out areas of my education that I didn’t know I didn’t know about. The section about skolemization from my Logic and Proof course reminds me of this blog post from a while back, bemoaning the fact that no one actually tells you that Skolem constants is how you actually implement a typechecker. Well, apparently, skolemization is a classic technique in the logic world, and acts precisely the way you might expect it to with a type system (after all, Curry-Howard isomorphism).
Also, our logic and proof supervisor gives us tea. :-) His room at Trinity college (where we hold our supervisions) is the same room that the brilliant mathematician Ramanujan stayed in while he was a fellow at Trinity. Speaking of which, it’s the hundredth anniversary of Russell and Whitehead’s Principia.
I’ve utterly failed (thus far) at actually doing any research, but I’m enjoying the theoretical talks and soaking in all of this foundational knowledge.

Speaking of which, the computer lab is not in classical style. I guess you’d have a hard time convincing architects these days of constructing old-style buildings. You’ll find oddities like libraries on stilts inside hardcore traditional colleges:

Ah, the march of modernity. If I wasn’t taking History and Philosophy of Science, I might have been stuck in the modern buildings of West Cambridge, but fortunately there is a wonderful little bike path between the lab and Fitzwiliam college (which Nidhi mentioned to me, and which I managed to miss and end up biking down the grass at the back of Churchill college and then along side the road in the wrong direction because of a fence blocking my way):


The food might be bad in Britain, but you really can’t beat its sweets:

Alas, I have banned myself from purchasing any sweets, lest they mysteriously disappear before I return to my college. Nutella for breakfast has also proved a stunningly bad idea and I have canned the practice in favor for a pastry and two pieces of fruit. Cambridge has a market square which is open every day (good!) but does not have very much local produce (understandable but boo!)

Actually, I discovered the library on stilts because I was attempting to determine the location of an ICUSU event and, unlike at MIT, I don’t carry around my laptop everywhere with me while in Cambridge (gasp!) I eventually found the location:

If you say you are punting, it means a quite different thing in Cambridge then at MIT. Though I consulted Wikipedia, which agrees with the Cambridge sense of the term. Oh, did I mention, dogs punt too!

Punting is a bit different from sailing; in particular, you push the punt pole in the direction you want to go (whereas the tiller goes in the opposite direction you want to go.)

This post is getting a bit long, so maybe I’ll save more for later.
Postscript. I noticed a (not particularly insightful) relationship between currying and church encoding today: namely, if you curry the destructor function (either, maybe, foldr, etc) with the data, you get the function with the same type as the Church encoding of that data type.
November 17, 2010Ever try to go to haskell.org and notice that it was down? Well, now you have someone to complain to: the haskell.org committee has formed and apparently I’m on it. 8-)
One of the first things we’ll be doing is moving haskell.org from a server being hosted at Yale (the hosting has been good, but what will happen is the server will go down during the weekend and there will be no one to kick it until Monday) to some dedicated hardware. I must admit, I do feel a bit sheepish for being on the committee but not having any bits (or direct experience) to help do the maintenance work—hopefully that will change soon.
November 15, 2010And then a signal to the ri-i-i-ight.
One notable wart with readline is that if you ^C during the prompt, nothing happens, and if you do a second ^C (and weren’t buggering around with the signal handlers) your entire program unceremoniously terminates. That’s not very nice! Fortunately, readline appears to be one of the rare C libraries that actually put some work into making sure that you could longjmp out of a signal handler and not completely break the library’s internal state (they do this with liberal masking and unmasking, and their own signal handler which cleans up and then rethrows the signal).
So I decided I was going to see if I could patch up readline to longjmp out of the signal handler (signal provided by yours truly) and give control back to Haskell. This monstrosity resulted.
static jmp_buf hs_readline_jmp_buf;
static struct sigaction old_sigpipe_action;
void hs_readline_sigaction (int signo, siginfo_t *info, void *data)
{
sigaction(SIGALRM, &old_sigpipe_action, NULL);
siglongjmp(hs_readline_jmp_buf, signo);
}
char *hs_readline (const char *s)
{
struct sigaction action;
int signo;
sigset_t mask;
memset(&action, 0, sizeof(struct sigaction));
sigemptyset(&mask);
sigaddset(&mask, SIGALRM);
action.sa_sigaction = hs_readline_sigaction;
action.sa_mask = mask;
action.sa_flags = SA_RESETHAND;
sigaction(SIGALRM, &action, &old_sigpipe_action);
if (signo = sigsetjmp(hs_readline_jmp_buf, 1)) {
return NULL;
}
char *r = readline(s);
sigaction(SIGALRM, &old_sigpipe_action, NULL);
return r;
}
It actually works pretty wonderfully, despite the somewhat circuitous route the signal takes: the SIGINT will first get handled by readline’s installed signal handler, which will clean up changes to the terminal and then rethrow it to GHC’s signal handler. GHC will tell the IO manager that a signal happened, and then go back to the innards of readline (who reinstates all changes in the terminal). Then, the IO manager reads out the signal, and sends a ThreadKilled exception, which then results in the RTS trying to interrupt the foreign call. The SIGALRM (actually, that’s a lie, the code that’s in GHC sends a SIGPIPE, but readline doesn’t think a SIGPIPE is a signal it should cleanup after, so I changed it—better suggestions welcome) hits readline’s signal handler again, we clean up the terminal, and then we hit our signal handler, which longjmps to a return NULL which will take us back to Haskell. And then the signal is caught and there is much rejoicing.
Unfortunately, almost all of that code is boilerplate and I can’t stick it in a nice Haskell combinator because the when Haskell is executing there’s no stack to speak of, and I bet a setjmp FFI call would make the RTS very confused. It’s also not reentrant although I doubt readline is reentrant either. And of course, nonlocal control transfer from a signal handler is something your Mom always told you not to do. So this approach probably doesn’t generalize. But it’s pretty amusing.
November 12, 2010About a month ago I decided that it would be cool if I could solve the bug GHC’s runtime never terminates unused worker threads. Well, I just got around to looking at it today, and after wandering aimlessly around the twisty maze that is the GHC RTS for an hour or so, I finally found a light at the end of a tunnel, in the form of a heart-warmingly simple patch. I’ve sent mail off to Simon Marlow to make sure the light isn’t actually a train, but it occurred to me that it would be interesting to look at my command history and blog about the process by which I came to the conclusion that line 464 of Capability.c was the correct place to add my change, since this sort of mental journey is not the one that is really ever recorded anywhere in any shape or form.
Warmups before running the maze. In a shifty shifty maze like GHC, you want to make sure the guided route (i.e. a clean build) is working before trying anything fancy. I use a separate build tree from source tree, so getting everything up to date involves:
cd ghc-clean
./darcs-all get
./darcs-all pull -a
cd ../ghc-build
lndir ../ghc-clean
perl boot && ./configure && make
inplace/bin/ghc-stage2 --interactive
When this has been resolved in a satisfactory manner (a non-trivial task for platforms with Windows), the code hunting can begin.
Grab your equipment. What? You mean to say you’ve wandered into this maze and you don’t even know how to tell you’ve gotten to your destination? That’s no good… you’ll need a dousing rod of some sort… something to tell you when you’ve got it right.
In this particular case, the original bug reporter had written up a small, incomplete test script, so the first thing I did was flesh it out into a script that required no human interaction. The benchmark for the new script was clear: /proc/PID/task should report a number substantially smaller than 200. To see that the current implementation is broken:
ezyang@javelin:~/Dev/ghc-build/testsuite/tests/ghc-regress/ffi/should_run$ ./cleanupThreads
203
Getting your bearings. Ok, so what do we want? We want threads to die instead of hanging around. There are two ways to do this: have the thread commit seppuku when it realizes it isn’t wanted, or have some manager kill the thread as necessary. The later is generally considered poor form, since you want to make sure the threads aren’t doing anything critical that will get corrupted if they die. So seppuku it is. Here, now, there are two questions:
- When does the thread decide to go into a waiting pool? This is presumably where we’d want it to terminate itself instead.
- How would the thread decide whether or not it should hang around or bug out?
Mapping out the land. GHC has this little runtime flag called -Ds. It’s pretty useful: it dumps out a whole gaggle of debug information concerning threads, which is precisely what we’d like to look for. Our plan of action is to look at what the thread activity looks like in our test script, and identify the points at which threads should be dying instead of hanging around.
The very beginning of the log looks like this:
b75006d0: allocated 1 capabilities
b75006d0: new task (taskCount: 1)
b75006d0: returning; I want capability 0
b75006d0: resuming capability 0
b75006d0: starting new worker on capability 0
b75006d0: new worker task (taskCount: 2)
b75006d0: task exiting
b75006d0: new task (taskCount: 2)
b75006d0: returning; I want capability 0
b71ffb70: cap 0: schedule()
b71ffb70: giving up capability 0
Note the number b75006d0; that’s our main thread and it’s going to be quite a busy beaver. Here is the very first thread we spin off to make a foreign call, but it finishes fairly quickly and isn’t the foreign call we are looking for:
b75006d0: cap 0: created thread 1
b75006d0: cap 0: thread 1 appended to run queue
b75006d0: new bound thread (1)
b75006d0: cap 0: schedule()
b75006d0: cap 0: running thread 1 (ThreadRunGHC)
b75006d0: cap 0: thread 1 stopped (suspended while making a foreign call)
b75006d0: freeing capability 0
b75006d0: returning; I want capability 0
b75006d0: resuming capability 0
b75006d0: cap 0: running thread 1 (ThreadRunGHC)
b75006d0: cap 0: thread 1 stopped (suspended while making a foreign call)
b75006d0: freeing capability 0
b75006d0: returning; I want capability 0
b75006d0: resuming capability 0
b75006d0: cap 0: running thread 1 (ThreadRunGHC)
b75006d0: cap 0: created thread 2
b75006d0: cap 0: thread 2 appended to run queue
b75006d0: cap 0: thread 1 stopped (finished)
Not before long, we see a veritable avalanche of new threads being created and added to the run queue—these are our threads:
b75006d0: woken up on capability 0
b75006d0: resuming capability 0
b75006d0: cap 0: running thread 3 (ThreadRunGHC)
b75006d0: cap 0: created thread 4
b75006d0: cap 0: thread 4 appended to run queue
b75006d0: cap 0: created thread 5
b75006d0: cap 0: thread 5 appended to run queue
b75006d0: cap 0: created thread 6
b75006d0: cap 0: thread 6 appended to run queue
b75006d0: cap 0: created thread 7
b75006d0: cap 0: thread 7 appended to run queue
b75006d0: cap 0: created thread 8
b75006d0: cap 0: thread 8 appended to run queue
b75006d0: cap 0: created thread 9
b75006d0: cap 0: thread 9 appended to run queue
b75006d0: cap 0: created thread 10
b75006d0: cap 0: thread 10 appended to run queue
b75006d0: cap 0: created thread 11
b75006d0: cap 0: thread 11 appended to run queue
b75006d0: cap 0: created thread 12
b75006d0: cap 0: thread 12 appended to run queue
b75006d0: cap 0: created thread 13
The process continues until we’ve spawned them all:
54139b70: starting new worker on capability 0
54139b70: new worker task (taskCount: 201)
53938b70: cap 0: schedule()
53938b70: cap 0: running thread 202 (ThreadRunGHC)
53938b70: cap 0: thread 202 stopped (suspended while making a foreign call)
53938b70: starting new worker on capability 0
53938b70: new worker task (taskCount: 202)
53137b70: cap 0: schedule()
53137b70: cap 0: running thread 203 (ThreadRunGHC)
53137b70: cap 0: thread 203 stopped (suspended while making a foreign call)
53137b70: starting new worker on capability 0
53137b70: new worker task (taskCount: 203)
52936b70: cap 0: schedule()
And then, since there’s nothing to do (all of our threads are in FFI land), we go and run a major GC:
52936b70: woken up on capability 0
52936b70: resuming capability 0
52936b70: deadlocked, forcing major GC...
52936b70: cap 0: requesting parallel GC
52936b70: ready_to_gc, grabbing GC threads
all threads:
threads on capability 0:
other threads:
thread 203 @ 0xb72b5c00 is blocked on an external call (TSO_DIRTY)
thread 202 @ 0xb72b5800 is blocked on an external call (TSO_DIRTY)
thread 201 @ 0xb72b5400 is blocked on an external call (TSO_DIRTY)
thread 200 @ 0xb72b5000 is blocked on an external call (TSO_DIRTY)
thread 199 @ 0xb72b4c00 is blocked on an external call (TSO_DIRTY)
thread 198 @ 0xb72b4800 is blocked on an external call (TSO_DIRTY)
thread 197 @ 0xb72b4400 is blocked on an external call (TSO_DIRTY)
thread 196 @ 0xb72b4000 is blocked on an external call (TSO_DIRTY)
thread 195 @ 0xb72b3c00 is blocked on an external call (TSO_DIRTY)
thread 194 @ 0xb72b3800 is blocked on an external call (TSO_DIRTY)
thread 193 @ 0xb72b3400 is blocked on an external call (TSO_DIRTY)
[snip (you get the idea)]
(I’ve always kind of wondered whether or not FFI calls should be considered deadlocked.)
Now the threads start coming back from FFI-land and idling:
b69feb70: cap 0: running thread 4 (ThreadRunGHC)
b69feb70: cap 0: waking up thread 3 on cap 0
b69feb70: cap 0: thread 3 appended to run queue
b69feb70: cap 0: thread 4 stopped (finished)
b69feb70: giving up capability 0
b69feb70: there are 2 spare workers
b69feb70: passing capability 0 to bound task 0xb75006d0
b61fdb70: returning; I want capability 0
b61fdb70: resuming capability 0
b61fdb70: cap 0: running thread 5 (ThreadRunGHC)
b59fcb70: returning; I want capability 0
b61fdb70: cap 0: thread 5 stopped (finished)
b61fdb70: giving up capability 0
b61fdb70: there are 3 spare workers
b61fdb70: passing capability 0 to worker 0xb59fcb70
b75006d0: woken up on capability 0
b75006d0: capability 0 is owned by another task
b51fbb70: returning; I want capability 0
b59fcb70: resuming capability 0
b59fcb70: cap 0: running thread 6 (ThreadRunGHC)
b59fcb70: cap 0: thread 6 stopped (finished)
b59fcb70: giving up capability 0
b49fab70: returning; I want capability 0
b59fcb70: there are 4 spare workers
b59fcb70: passing capability 0 to worker 0xb51fbb70
b51fbb70: resuming capability 0
b51fbb70: cap 0: running thread 7 (ThreadRunGHC)
b51fbb70: cap 0: thread 7 stopped (finished)
b51fbb70: giving up capability 0
b41f9b70: returning; I want capability 0
b51fbb70: there are 5 spare workers
I’ve actually cheated a little: the there are X spare workers debug statements I added myself. But this section is golden; we’re specifically interested in these lines:
b61fdb70: cap 0: thread 5 stopped (finished)
b61fdb70: giving up capability 0
The thread stops, but it doesn’t die, it just gives up the capability. These are two extremely good candidates for where the thread might alternately decide to kill itself.
Placemarkers. It’s time to bust out the trusty old grep and figure out where these debug messages are being emitted from. Unfortunately, 5 and finished are probably dynamically generated messages, so stopped is the only real identifier. Fortunately, that’s specific enough for me to find the right line in the RTS:
ezyang@javelin:~/Dev/ghc-clean/rts$ grep -R stopped .
./Capability.c: // list of this Capability. A worker can mark itself as stopped,
./Capability.c: if (!isBoundTask(task) && !task->stopped) {
./RaiseAsync.c: - all the other threads in the system are stopped (eg. during GC).
./RaiseAsync.c: // if we got here, then we stopped at stop_here
./Task.c: if (task->stopped) {
./Task.c: task->stopped = rtsFalse;
./Task.c: task->stopped = rtsFalse;
./Task.c: task->stopped = rtsTrue;
./Task.c: task->stopped = rtsTrue;
./Task.c: debugBelch("task %p is %s, ", taskId(task), task->stopped ? "stopped" : "alive");
./Task.c: if (!task->stopped) {
./sm/GC.c: // The other threads are now stopped. We might recurse back to
./Schedule.c: "--<< thread %ld (%s) stopped: requesting a large block (size %ld)\n",
./Schedule.c: "--<< thread %ld (%s) stopped to switch evaluators",
./Schedule.c: // stopped. We need to stop all Haskell threads, including
./Trace.c: debugBelch("cap %d: thread %lu stopped (%s)\n", ### THIS IS THE ONE
./Task.h: rtsBool stopped; // this task has stopped or exited Haskell
./Task.h:// Notify the task manager that a task has stopped. This is used
./Task.h:// Put the task back on the free list, mark it stopped. Used by
./Interpreter.c: // already stopped at just now
./Interpreter.c: // record that this thread is not stopped at a breakpoint anymore
./win32/Ticker.c: // it still hasn't stopped.
That line in Trace.c is actually in a generic debugging function traceSchedEvent_stderr, but fortunately there’s a big case statement on one of its arguments tag:
case EVENT_STOP_THREAD: // (cap, thread, status)
debugBelch("cap %d: thread %lu stopped (%s)\n",·
cap->no, (lnat)tso->id, thread_stop_reasons[other]);
break;
So EVENT_STOP_THREAD is a good next grep. And sure enough:
ezyang@javelin:~/Dev/ghc-clean/rts$ grep -R EVENT_STOP_THREAD .
./Trace.c: case EVENT_STOP_THREAD: // (cap, thread, status)
./eventlog/EventLog.c: [EVENT_STOP_THREAD] = "Stop thread",
./eventlog/EventLog.c: case EVENT_STOP_THREAD: // (cap, thread, status)
./eventlog/EventLog.c: case EVENT_STOP_THREAD: // (cap, thread, status)
./Trace.h: HASKELLEVENT_STOP_THREAD(cap, tid, status)
./Trace.h: traceSchedEvent(cap, EVENT_STOP_THREAD, tso, status);
It looks to be an inline function in Trace.h:
INLINE_HEADER void traceEventStopThread(Capability *cap STG_UNUSED,
StgTSO *tso STG_UNUSED,
StgThreadReturnCode status STG_UNUSED)
{
traceSchedEvent(cap, EVENT_STOP_THREAD, tso, status);
dtraceStopThread((EventCapNo)cap->no, (EventThreadID)tso->id,
(EventThreadStatus)status);
}
Classy. So traceEventStopThread is the magic word, and sure enough:
ezyang@javelin:~/Dev/ghc-clean/rts$ grep -R traceEventStopThread .
./Schedule.c: traceEventStopThread(cap, t, ret);
./Schedule.c: traceEventStopThread(cap, tso, THREAD_SUSPENDED_FOREIGN_CALL);
./Trace.h:INLINE_HEADER void traceEventStopThread(Capability *cap STG_UNUSED,
There are two plausible sites in Schedule.c.
Going digging. We first have to pick which site to inspect more closely. Fortunately, we notice that the second trace event corresponds to suspending the thread before going into a safe FFI call; that’s certainly not what we’re looking at here. Furthermore, the first is in the scheduler, which makes a lot of sense. But there’s nothing obvious in this vicinity that you might associate with saving a worker task away due to lack of work.
What about that giving up capability message? Some more grepping reveals it to be in the yieldCapability function (like one might expect). If we then trace backwards calls to yieldCapability, we see it is invoked by scheduleYield, which is in turn called by the scheduler loop:
scheduleYield(&cap,task);
if (emptyRunQueue(cap)) continue; // look for work again
// Get a thread to run
t = popRunQueue(cap);
This is very, very interesting. It suggests that the capability itself will tell us whether or not the work to do, and that yieldCapability is a promising function to look further into:
debugTrace(DEBUG_sched, "giving up capability %d", cap->no);
// We must now release the capability and wait to be woken up
// again.
task->wakeup = rtsFalse;
releaseCapabilityAndQueueWorker(cap);
That last call looks intriguing:
static void
releaseCapabilityAndQueueWorker (Capability* cap USED_IF_THREADS)
{
Task *task;
ACQUIRE_LOCK(&cap->lock);
task = cap->running_task;
// If the current task is a worker, save it on the spare_workers
// list of this Capability. A worker can mark itself as stopped,
// in which case it is not replaced on the spare_worker queue.
// This happens when the system is shutting down (see
// Schedule.c:workerStart()).
if (!isBoundTask(task) && !task->stopped) {
task->next = cap->spare_workers;
cap->spare_workers = task;
}
// Bound tasks just float around attached to their TSOs.
releaseCapability_(cap,rtsFalse);
RELEASE_LOCK(&cap->lock);
}
We’ve found it!
Checking the area. The spare_workers queue looks like the queue in which worker threads without anything to do go to chill out. We should verify that this is the case:
int i;
Task *t;
for (i = 0, t = cap->spare_workers; t != NULL; t = t->next, i++) {}
debugTrace(DEUBG_sched, "there are %d spare workers", i);
Indeed, as we saw in the debug statements above, this was indeed the case: the number of spare workers kept increasing:
54139b70: there are 199 spare workers
54139b70: passing capability 0 to worker 0x53938b70
53938b70: resuming capability 0
53938b70: cap 0: running thread 202 (ThreadRunGHC)
53938b70: cap 0: thread 202 stopped (blocked)
thread 202 @ 0xb727a400 is blocked on an MVar @ 0xb72388a8 (TSO_DIRTY)
53938b70: giving up capability 0
53938b70: there are 200 spare workers
53938b70: passing capability 0 to worker 0x53137b70
53137b70: resuming capability 0
53137b70: cap 0: running thread 203 (ThreadRunGHC)
53137b70: cap 0: thread 203 stopped (blocked)
thread 203 @ 0xb727a000 is blocked on an MVar @ 0xb72388a8 (TSO_DIRTY)
53137b70: giving up capability 0
53137b70: there are 201 spare workers
Writing up the solution. So, the patch from here is simple, since we’ve found the correct location. We check if the queue of spare workers is at some number, and if it is, instead of saving ourselves to the queue we just cleanup and then kill ourselves:
for (i = 1; t != NULL && i < 6; t = t->next, i++) {}
if (i >= 6) {
debugTrace(DEBUG_sched, "Lots of spare workers hanging around, terminating this thread");
releaseCapability_(cap,rtsFalse);
RELEASE_LOCK(&cap->lock);
pthread_exit(NULL);
}
And then we test see that this indeed has worked:
ezyang@javelin:~/Dev/ghc-build/testsuite/tests/ghc-regress/ffi/should_run$ ./cleanupThreads
7
Postscript. There are some obvious deficiencies with this proof-of-concept. It’s not portable. We need to convince ourselves that this truly does all of the cleanup that the RTS expects a worker to do. Maybe our data representation could be more efficient (we certainly don’t need a linked list if the number of values we’ll be storing is fixed.) But these are questions best answered by someone who knows the RTS better, so at this point I sent in the proof of concept for further review. Fingers crossed!
November 10, 2010This is a post to get some Google juice for a problem that basically prevented Scripts from being able to cut over from Fedora 11 to Fedora 13. The cluster of new machines kept falling over from load, and we kept scratching our heads, wondering, “Why?”
Turns out, the following commit broke mod_fcgid in a pretty terrifying way: essentially, mod_fcgid is unable to manage the pools of running FastCGI processes, so it keeps spawning new ones until the system runs out of memory. This is especially obvious in systems with large amounts of generated virtual hosts, i.e. people using mod_vhost_ldap. It got fixed in mod_fcgid 2.3.6, which was released last weekend.
Unrelatedly. I’ve been sort of turning around in my head a series of Philosophy of Computer Science posts, where I try to identify interesting philosophical questions in our field beyond the usual topics of discussion (cough AI cough.) The hope is to draw in a number of topics traditionally associated with philosophy of science, of mathematics, of biology, etc. and maybe pose a few questions of my own. One of the favorite pastimes of philosophers is to propose plausible sounding theories and then come up with perplexing examples which seem to break them down, and it sounds like this in itself could generate some Zen koans as well, which are always fun.
November 8, 2010This is a bit of a provocative post, and its impressions (I dare not promote them to the level of conclusions) should be taken with the amount of salt found in a McDonald’s Happy Meal. Essentially, I was doing some reading about medieval medicine and was struck by some of the similarities between it and computer engineering, which I attempt to describe below.
Division between computer scientists and software engineers. The division between those who studied medicine, the physics (a name derived from physica or natural science) and those who practiced medicine, the empirics, was extremely pronounced. There was a mutual distrust—Petrarch wrote, “I have never believed in doctors nor ever will” (Porter 169)—that stemmed in part from the social division between the physics and empirics. Physics would have obtained a doctorate from a university, having studied one of the highest three faculties possible (the others being theology and law), and tended to be among the upper strata of society. In fact, the actual art of medicine was not considered “worthy of study,” though the study of natural science was. (Cook 407).
The division between computer scientists and software engineers is not as bad as the corresponding division in the 1500s, but there is a definite social separation (computer scientists work in academia, software engineers work in industry) and a communication gap between these two communities. In many other fields, a PhD is required to be even considered for a job in your field; here, we see high school students starting up software companies (occasionally successfully) all the time, and if programming Reddit is any indication, there is a certain distaste for purely theoretical computer scientists.
The unsuitability of pure computer science for the real world. Though the study of pure medicine was highly regarded during this time, its theories and knowledge were tremendously off the mark. At the start of the 1500s the four humors of Hippocratic medicine were still widely believed: to be struck by disease was to have an imbalance of black bile, yellow bile, phlegm and blood, and thus the cure would be to apply the counteracting humor, and justified such techniques as bloodletting (the practice of purposely bleeding a person). Even the understanding of how the fundamentals of the human body worked were ill-understood: it was not until William Harvey and his De motu cordis et sanguinis (1628) that the earlier view that food was concocted in the organs and then flowed outwards via the veins, arteries and nerves to the rest of the body was challenged by the notion of the heart as a pump. (Cook 426) If the circulatory system was true, what did the other organs do? Harvey’s theory completely overturned the existing understanding of how the human body worked, and his theory was quite controversial.
I have enough faith in computer science that I don’t think most of our existing knowledge is fundamentally wrong. But I also think we know tremendously little about the actual nature of computation even at middling sizes, and this is a quite humbling fact. But I am also fundamentally optimistic about the future of computer science in dealing with large systems—more on this at the end.
Testing instead of formal methods. The lack of knowledge, however, did stop the physicians (as distinct from physics) from practicing their medicine. Even the academics recognized the importance of “medieval practica; handbooks listing disorders from head to toe with a description of symptoms and treatment.” (Porter 172) The observational (Hippocratic) philosophy, continued to hold great sway: Thomas Sydenham, when asked on the subject of dissection, stated “Anatomy—Botany—Nonsense! Sir, I know an old woman in Covent Garden who understand botany better, and as for anatomy, my butcher can dissect a joint full and well; now, young man, all that is stuff; you must go to the bedside, it is there alone you can learn disease.” (Porter 229)
In the absence of a convincing theoretical framework, empiricism rules. The way to gain knowledge is to craft experiments, conduct observations, and act accordingly. If a piece of code is buggy, how do you fix it? You add debug statements and observe the output, not construct a formal semantics and then prove the relevant properties. The day the latter is the preferred method of choice is a day when practitioners of formal methods across the world will rejoice.
No silver bullet. In the absence of reliable medical theories from the physics, quackery flourished in the eighteenth century, a century often dubbed the “Golden Age of Quackery.” (Porter 284) There was no need to explain why your wares worked: one simply needed to give a good show (“drawing first a crowd and then perhaps some teeth, both to the accompaniment of drums and trumpets” (Porter 285)), sell a few dozen bottles of your cure, and then move on to the next town. These “medicines” would claim to do anything from cure cancer to restore youth. While some of the quacks were merely charlatans, others earnestly believed in the efficacy of their treatments, and occasionally a quack remedy was actually effective.
I think the modern analogue to quackery are software methodology in all shapes in sizes. Like quack medicines, some of these may be effective, but in the absence of scientific explanations we can only watch the show, buy in, and see if it works. And we can’t hope for any better until our underlying theories are better developed.
The future. Modern medical science was eventually saved, though not before years of inadequate theories and quackery had brought it to a state of tremendous disrepute. The complexity of the craft had to be legitimized, but this would not happen until a powerful scientific revolution built the foundations of modern medical practice.
Works referenced:
- Porter, Roy. The Greatest Benefit To Mankind. Fontana Press: 1999.
- Park, Katherine; Daston, Lorraine. The Cambridge History of Science: Volume 3, Early Modern Science.
November 5, 2010Someone told me it’s all happening at the zoo…
I’ve always thought dynamic programming was a pretty crummy name for the practice of storing sub-calculations to be used later. Why not call it table-filling algorithms, because indeed, thinking of a dynamic programming algorithm as one that fills in a table is a quite good way of thinking about it.
In fact, you can almost completely characterize a dynamic programming algorithm by the shape of its table and how the data flows from one cell to another. And if you know what this looks like, you can often just read off the complexity without knowing anything about the problem.
So what I did was collected up a bunch of dynamic programming problems from Introduction to Algorithms and drew up the tables and data flows. Here’s an easy one to start off with, which solves the Assembly-Line problem:

The blue indicates the cells we can fill in ‘for free’, since they have no dependencies on other cells. The red indicates cells that we want to figure out, in order to pick the optimal solution from them. And the grey indicates a representative cell along the way, and its data dependency. In this case, the optimal path for a machine to a given cell only depends on the optimal paths to the two cells before it. (Because, if there was a more optimal route, than it would have shown in my previous two cells!) We also see there are a constant number of arrows out of any cell and O(n) cells in this table, so the algorithm clearly takes O(n) time total.
Here’s the next introduction example, optimal parenthesization of matrix multiplication.

Each cell contains the optimal parenthesization of the subset i to j of matrixes. To figure this out the value for a cell, we have to consider all of the possible combos of existing parentheticals that could have lead to this (thus the multiple arrows). There are O(n²) boxes, and O(n) arrows, for O(n³) overall.
Here’s a nice boxy one for finding the longest shared subsequence of two strings. Each cell represents the longest shared subsequence of the first string up to x and the second string up to y. I’ll let the reader count the cells and arrows and verify the complexity is correct.

There aren’t that many ways to setup dynamic programming tables! Constructing optimal binary search trees acts a lot like optimal matrix parenthesization. But the indexes are a bit fiddly. (Oh, by the way, Introduction to Algorithms is 1-indexed; I’ve switched to 0-indexing here for my examples.)

Here we get into exercise land! The bitonic Euclidean traveling salesman problem is pretty well-known on the web, and its tricky recurrence relation has to do with the bottom edge. Each cell represents the optimal open bitonic route between i and j.

The lovely word wrapping problem, a variant of which lies at the heart of the Knuth TeX word wrapping algorithm, takes advantage of some extra information to bound the number of cells one has to look back. (The TeX algorithm does a global optimization, so the complexity would be O(n²) instead.) Each cell represents the optimal word wrapping of all the words up to that point.

Finally, the edit problem, which seems like the authors decided to pile on as much complexity as they could muster, falls out nicely when you realize each string operation they order you to design corresponds to a single arrow to some earlier cell. Useful! Each cell is the optimal edit chain from that prefix of the source to that prefix of the destination.

And the zookeeper is very fond of rum.
Squares, triangles, rectangles, those were the tables I usually found. I’m curious to know if there are more exotic tables that DP algorithms have filled out. Send them in and I’ll draw them!
November 3, 2010Should software engineers be required to implement the abstractions use before using them? (Much like how before you’re allowed to use a theorem in a math textbook, you have to prove it first.) A bit like reinventing the wheel for pedagogical purposes.
(I’ve been sick since Saturday, so it’s a Dead Edward Day. Hopefully I’ll clean up the DP Zoo post and present it with more annotations for this Friday.)