March 16, 2011Wherein I make fun of functional programming advocates.
In this essay, I’d like to discuss the ideologies of “imperative programming” and “functional programming” in terms of the language features they lean on: in particular, the mechanisms by which they allow developers to express themselves in less code. I propose that the set of features that make up imperative programming constitute a dominant programming monoculture that is partially incompatible with functional programming’s favored features, requiring functional programming advocates to do funny things to gain the attention of the programmers.
To first give a flavor of expressiveness, here are some frequently seen language features that increase expressiveness:
- Macros
- Concurrency
- Mutation
- Indirection
- Laziness
- Dynamic typing
- Polymorphism
- Higher-order functions
- Exceptions
- Eval
- Input/Output
- Continuations
- Anonymous functions
A few of those entries might make you laugh, because you might not understand how you could program without them. You may recognize a few that your favorite language supports well, a few that your language supports less well, and a few your language has no support for. The culture around the language will also have its folklore about what kinds of features are acceptable for use and which ones are not (think Pythonic or JavaScript: The Good Parts). The language you choose determines which features you get to know well.
Being expressive has a cost, which most developers figure out with a little experience in the field. There is a sort of natural selection going on here: language features that are well supported by languages, that other programmers know how to use, and that allow the job to get done are favored—in particular the community effect reinforces the winner. As a result, we have developer monoculture that is mostly comfortable with mutation, input/output, indirection, exceptions, polymorphism, etc. But even the bread-and-butter of current programming practice doesn’t come without cost: think about the famed division between “those who get pointers and those who don’t”, or the runtime costs of using exceptions in C++, or the representation complications of polymorphism (e.g. autoboxing in Java and Go’s lack thereof).
When someone does functional programming advocacy, what they’re really doing is asking you to look more closely at some of the other mechanisms we have for increasing expressiveness. You might feel like these are the only voices you hear, because there’s not much point advocating something that everyone uses already. And you might feel like the enthusiasm is unjustified, because the feature seems bizarrely complicated (continuations, anyone?) or you’ve tried using it in your favorite language, and there’s nothing more painful than seeing someone try to do functional programming in Python. Fact of the matter is, it’s not easy to add these extra language features to the existing monoculture. The new features interact in very complex and subtle ways.
This is why a functional programming advocate will often ask you to give up some of your old tools of expression. They will ask you to give up shared-state mutation, because otherwise handling concurrency is really damn hard. They will ask you to give up dynamic typing, because otherwise higher-order functions become much harder to reason about. The rhetoric will edge on the side of “stop doing that!” because it’s the common practice—they don’t actually mean “stop it entirely” but to the poor functional programming advocate it seems like a little bit of hyperbole is necessary to get the point across—and you can do some pretty amazing things with these extra features.
I encourage programmers to learn about as many ways to express themselves as possible, even if their language or workplace won’t necessarily allow them to use the method. The reasons are manifold:
- “Any complex enough program eventually contains a poorly written implementation of Lisp.” Like it or not, eventually you will be faced with a hard problem that is handily dispatched by one of these well studied language features, and if you’re going to have to implement it by hand, you might as well know how it’s going to look like from the start. As the Gang of Four once said, language features that are not actually supported by the language often show up as design patterns; knowing the pattern makes your code clearer and cleaner.
- Conversely, if you are reading someone else’s code and they resort to using one of these patterns, knowing how the feature should work will greatly aid comprehension.
- Libraries and frameworks are considered essential to the working developer’s toolbox, yet they seem to grow and be obsoleted at a dizzying rate. Language features are eternal: the anonymous functions of 1936 (when Alonzo Church invented the lambda calculus) are still the anonymous functions of today.
- Language features are fun to learn about! Unlike “yet another API to memorize”, a language feature will tickle your brain and make you think very hard about what is going on.
tl;dr Certain programming language features increase developer expressiveness, and the “imperative programming methodology” captures the dominant monoculture containing those language features in wide use. But there are other ways of expressing oneself, and programmers are encouraged to explore these methods even when practical use necessitates them to stop using some of their favorite expressive tools.
March 14, 2011The perverse incentive structure of the Internet
Suppose you have a Random Question™ that you would like answered. For example, “Did Sir Francis Galton have any children?” It’s the type of thing you’d type into Google.

Answers.com is not a terribly good source, but it is something to at least see if you can scrounge up some extra information on. So suppose you dig a little deeper and discover that Francis Galton only married once and that this marriage was barren. So unless some historian’s account of Galton suffering from venereal disease (Kevles 12) indicates that he knocked up some ladies in Syria and managed to pop out four offspring, who happened to conveniently be named “Francis, Harriet, Matilda and Mark,” Answers.com is wrong.
Not really surprising. But that’s not what this essay is about.
What this essay is about is what you’re supposed to do when you encounter flagrant and unabashed misinformation on the Internet: that is, someone is wrong on the Internet. There are a few obvious answers:
- Disseminate corrective information,
- Fix it, or
- Ignore it.

Disseminating corrective information is really just a glorified term for “arguing”. And everyone knows that arguing on the Internet is perhaps one of the most undignified occupations one can assume. Nevertheless, it is an important mechanism for correcting incorrect information. As Djerassi once said (and I paraphrase), only a masochist would get into an argument with a fundamentalist about evolution, yet we would be much worse off if we had no such masochists willing to engage in this debate. For less contentious issues, the formula of the “dispelling myths” essay is a popular mechanism of presenting interesting, corrective information to a receptive audience.
I think the primary difficulties with this approach can be summed up in three words: cost, effectiveness and reach.
- Cost. Writing up an effective rebuttal takes time. Arguing is a time-consuming activity. If it’s not something you particularly care about, it makes much more sense to update your own belief structure, and not bother attempting to convince other people about your updated belief.
- Effectiveness. The truism is that “arguing with someone on the Internet is like arguing with a pig: it frustrates you and irritates the pig.” But social psychology research paints an even more troubling picture: even if you manage to convince someone that you’re right, they’re old (falsified) belief can still affect their decision making process. Not only that, but true information may cause a person to believe more strongly in their false beliefs. Why bother?
- Reach. Even if you write a stunning argument that convinces everyone who reads it, there is still the problem of dissemination. The Internet makes it extremely easy to pick and choose what you want to read, to aggregate news from like-minded individuals, a phenomenon Ethan Zuckerman calls homophily. The fear is that we get caught up in our own comfortable information loops, making us much dumber.
The notion that you might be able to directly fix it seems to be a relatively recent one, come with the advent of wikis and the social web. Perhaps the ability to unrestrictedly edit an incorrect information source is a new, but the ability to go to the source, to write a letter to the editor of a textbook, is one that has been around for a long time. There is something of a glorious tradition here, including things like Knuth’s reward checks that offer fame and minuscule amounts of money for those who find errors. Sure you might need to argue with one person, but that’s all you need, and afterwards, the original infelicity is corrected.
When you go to the source, you partially resolve the reach problem: new readers instantly have access to the corrected text (though this doesn’t do much for existing dead-tree copies). But cost and effectiveness are still present; perhaps even exacerbated. You have to convince a single person that you’re right; if you can’t manage that, you’ll have to fall back to a separate rebuttal. Sometimes content is abandoned, with the original author having no intention of updating it.
Unrestricted edit access has also given rise to edit warring. In this situation, you have to actively defend the modification you have made, and unfortunately, the winner is usually the one who has more time and resources. Edit warring is war by attrition,.
Aside. There are some parallels with the world of open-source software. Fixing it is analogous to submitting a bug report or a patch; the need to convince the maintainers you are right translates into the act of writing a good bug report. If a project is abandoned, you’re out of luck; even if it’s not abandoned, you need to make the maintainer care about your specific problem. To fork is to disseminate corrective information, although there is an element of unrestricted edits mixed in with social coding websites that make it easy to find related copies.
Or you can not bother, and just ignore it. Maybe vent a little about the problem to your friends, who have no particular stake in the matter, file it away in your brain, and let life go on. In fact, this is precisely the disposition I see adopted by most of my colleagues. They don’t write for the public. They are more than happy to share their vast warehouse of knowledge and insights with their friends, with whom the exchange is usually effective and the cost not very high (they prefer informal mediums like face-to-face conversations and instant-messaging).
In fact, I’d hazard that only a fraction of what you might call the literati and experts of the world prolifically utilize this sort of broad-net information dissemination, that sites like Wikipedia, StackOverflow and Quora attempt to exploit. The incentive structure is simply not there! For the academic, communication with the general public is secondary to communication with your peers and your funders. For the entrepreneur, communication with your customers, and not the sort of communication we see here, is what is valued. For the software engineer, communication with upstream is beneficial insofar as it lets you stop maintaining your bug fixes. The mind boggles at how much untapped expertise there is out there (perhaps that’s what sites like Quora are going for.)
It would be folly to try to tell them all to not do that. After all, from their point of view, this is what is the most rational use of their time. We can try to trick individuals into pouring this information out, in the form of social incentives and addictive triggers, but by in large I think these individuals will not bite. They’re too clever for that.
Better hope you catch them in a fit of irrationality.
March 11, 2011I was flipping through An Introduction to Non-Classical Logic by Graham Priest and the section on many-valued logics caught my eye. Many-valued logics are logics with more than the usual two truth values true and false. The (strong) Kleene 3-valued logic, sets up the following truth table with 0, 1 and x (which is thought to be some value that is neither true nor false):
NOT
1 0
x x
0 1
AND
1 x 0
1 1 x 0
x x x 0
0 0 0 0
OR
1 x 0
1 1 1 1
x 1 x x
0 1 x 0
IMPLICATION
1 x 0
1 1 x 0
x 1 x x
0 1 1 1
I’ve always thought many-valued logics were a bit of a “hack” to deal with the self-referentiality paradoxes, but in fact, Kleene invented his logic by thinking about what happened with partial functions where applied with values that they were not defined for: a sort of denotation failure. So it’s not surprising that these truth tables correspond to the parallel-or and and operators predicted by denotational semantics.
The reader is invited to consider whether or not one could use this logic for a Curry-Howard style correspondence; in particular, the law of the excluded middle is not valid in K3.
March 9, 2011This is a collection of WTFs due to misuse of mutable state.
We’ll start off with some Java. What do you expect this snippet of code to do? :
Sensor Accel = sm.getDefaultSensor(Sensor.TYPE_ACCELEROMETER);
Sensor Orient = sm.getDefaultSensor(Sensor.TYPE_ORIENTATION);
sm.registerListener((SensorEventListener) this, Accel, sm.SENSOR_DELAY_FASTEST);
Ostensibly, it registers the current object to receive just accelerometer updates. But what if I told you getDefaultSensor was implemented like this:
public Sensor getDefaultSensor (int type){
if(sensors == null) {
sensors = new Sensor(mContext, type);
return sensors;
}else if(sensors.checkList(type)){
sensors.addSensor(type);
return sensors;
}else{
sensors.removeSensor(type);
return sensors;
}
}
This code completely fails to manage the expected semantics: there is a single sm wide Sensor object (stored in sensors) that accumulates sensor values as getDefaultSensor is called. So in fact, this will receive events from both the accelerometer and the magnetometer. The only saving grace is that when we register event listeners, we usually do want them to receive all events, so we might not notice if we weren’t looking too closely. This is real code from OpenIntents SensorSimulator.
Lest you think I only make fun of other people’s code, here is a diff from a project of my own:
@@ -197,13 +197,7 @@ def upload_all(tree, ftp, base):
ftp.cwd(base)
for blob in tree.blobs:
- logging.info('Uploading ' + '/'.join((base, blob.name)))
- try:
- ftp.delete(blob.name)
- except ftplib.error_perm:
- pass
- ftp.storbinary('STOR ' + blob.name, blob.data_stream)
- ftp.voidcmd('SITE CHMOD ' + format_mode(blob.mode) + ' ' + blob.name)
+ upload_blob(blob, ftp, base)
@@ -260,11 +254,25 @@ def upload_diff(diff, tree, ftp, base):
node = subtree/components[-1]
assert isinstance(node, Blob)
- logging.info('Uploading ' + full_path)
- ftp.storbinary('STOR ' + file, node.data_stream)
- ftp.voidcmd('SITE CHMOD ' + format_mode(node.mode) + ' ' + file)
- # Don't do anything if there isn't any item; maybe it
- # was deleted.
+ upload_blob(node, ftp, base)
It looks fairly plausible: I’ve factored out some common storebinary logic. Can you tell what the bug is? Here’s a hint.
The problem is that the upload_all changes the current working directory on the FTP connection (mutable state!), while upload_diff does not (working entirely from the base working directory). The upload function assumed upload_all style working directory changes, and so all upload_diff uploads were dumped in the base directory. Mutability hurts modularity! The fix was to get rid of this mutation and manually calculate the full path; this also removed some delicate invariant preservation in the original upload_all implementation.
Paradoxically enough, though Haskell encourages you not to use mutation, when you do use it, Haskell expressive static type system gives you the unusual ability to statically encode complicated invariants about your mutation—invariants that would not have been necessary if you hadn’t used mutation. A small example of this is ST monad, which uses rank-2 types to ensure that references to mutable memory cannot escape runST, the isolated “mutation thread.”
To the limit, you may find yourself knee deep in advanced type system features if you try to statically rule out incorrect usages of a mutable API. I found this out when I worked with abcBridge, and tried very hard to use types to prevent improper use of underlying C library. Here is one relevant code quote:
-- | When you duplicate a network, node indexes may change, so if you
-- would like to use old references, they need to be translated first.
-- You can read this type as @Node n -> NT n2 (Node n2)@ or @Node n ->
-- NQ n2 (Node n2)@, with the condition that the @n2@ index was one
-- generated by 'branch' or 'speculate'.
translate :: (G.NetworkMonad m AIG (Dup n n2)) => Node n -> m AIG (Dup n n2) (Node (Dup n n2))
This truly is some WTF, rank-2-phantom-types code, but it grew out of a very specific bug I stumbled onto and was unconvinced that I’d remember to avoid in the future (can you guess what it was?) A curious reader may ask, why do I need to duplicate networks in the first place? Because some operations that the underlying library provides are destructive, and the only way I can provide the illusion of persistent networks is duplicating before destruction.
In summary:
- Mutation is frequently not what people expect,
- Mutation is not modular, and
- Mutation is complicated.
Avoid it when you can!
March 7, 2011They say that one doesn’t discover advanced type system extensions: rather, the type system extensions discover you! Nevertheless, it’s worthwhile to know what the tech tree for GHC’s type extensions are, so you can decide how much power (and the correspondingly headache inducing error messages) you need. I’ve organized the relations in the following diagram with the following criterion in mind:
- Some extensions automatically enable other extensions (implies);
- Some extensions offer all the features another extension offers (subsumes);
- Some extensions work really nicely with other extensions (synergy);
- Some extensions offer equivalent (but differently formulated) functionality to another extension (equiv).
It’s also worth noting that the GHC manual divides these extensions into “Extensions to data types and type synonyms”, “Class and instances declarations”, “Type families” and “Other type system extensions”. I have them organized here a little differently.
Our first tech tree brings together two extensions: arbitrary-rank polymorphism and generalized algebraic data types.

Briefly:
- GADTSyntax permits ordinary data types to be written GADT-style (with explicit constructor signatures):
data C where C :: Int -> C - ExplicitForall allows you to explicitly state the quantifiers in polymorphic types:
forall a. a -> a - ExistentialQuantification allows types to be hidden inside a data constructor:
data C = forall e. C e - GADTs permits explicit constructor signatures:
data C where C :: C a -> C b -> C (a, b). Subsumes ExistentialQuantification because existentially quantified data types are simply polymorphic constructors for which the type variable isn’t in the result. - PolymorphicComponents allows you to write
forall inside data type fields: data C = C (forall a. a) - Rank2Types allows polymorphic arguments:
f :: (forall a. a -> a) -> Int -> Int. This with GADTs subsumes PolymorphicComponents because data type fields with forall within them correspond to data constructors with rank-2 types. - RankNTypes:
f :: Int -> (forall a. a -> a) - ImpredicativeTypes allows polymorphic functions and data structures to be parametrized over polymorphic types:
Maybe (forall a. a -> a)
Our next tech tree deals with type class instances.

Briefly:
- TypeSynonymInstances permits macro-like usage of type synonyms in instance declarations:
instance X String - FlexibleInstances allows more instances for more interesting type expressions, with restrictions to preserve decidability:
instance MArray (STArray s) e (ST s) (frequently seen with multi-parameter type classes, which are not in the diagram) - UndecidableInstances allows instances for more interesting type expression with no restrictions, at the cost of decidability. See Oleg for a legitimate example.
- FlexibleContexts allows more type expressions in constraints of functions and instance declarations:
g :: (C [a], D (a -> b)) => [a] -> b - OverlappingInstances allows instances to overlap if there is a most specific one:
instance C a; instance C Int - IncoherentInstances allows instances to overlap arbitrarily.
Perhaps conspicuously missing from this diagram is MultiParamTypeClasses which is below.
Our final tech tree addresses programming with types:

Briefly:
- KindSignatures permits stating the kind of a type variable:
m :: * -> * - MultiParamTypeClasses allow type classes to range over multiple type variables:
class C a b - FunDeps allow restricting instances of multi-parameter type classes, helping resolve ambiguity:
class C a b | a -> b - TypeFamilies allow “functions” on types:
data family Array e
The correspondence between functional dependencies and type families is well known, though not perfect (type families can be more wordy and can’t express certain equalities, but play more nicely with GADTs).
March 4, 2011A petri net is a curious little graphical modeling language for control flow in concurrency. They came up in this talk a few weeks ago: Petri-nets as an Intermediate Representation for Heterogeneous Architectures, but what I found interesting was how I could describe some common concurrency structures using this modeling language.
Here is, for example, the well venerated lock:

The way to interpret the graph is thus: each circle is a “petri dish” (place) that may contain some number of tokens. The square boxes (transitions) are actions that would like to fire, but in order to do so all of the petri dishes feeding into them must have tokens. It’s the sort of representation that you could make into a board game of sorts!
If multiple transitions can fire off, we pick one of them and only that one succeeds; the ability for a token to flow down one or another arrow encodes nondeterminism in this model. In the lock diagram, only one branch can grab the lock token in the middle, but they return it once they exit the critical area (unlock).
Here is a semaphore:

It’s exactly the same, except that the middle place may contain more than one token. Of course, no one said that separate processes must wait before signalling. We can implement a simple producer-consumer chain like this:

Note that petri net places are analogous to MVar (), though it takes a little care to ensure we are not manufacturing tokens out of thin air in Haskell, due to the lack of linear types. You may also notice that petri nets say little about data flow; we can imagine the tokens as data, but the formalism doesn’t say much about what the tokens actually represent.
March 2, 2011The bug bit me in early 2009, during MIT’s independent activities period; really, it was two bugs. The first was 6.184, the re-animated introductory computer science class taught in Scheme—for obvious reasons. But I don’t think that was sufficient: I seemed to recall thinking Scheme was interesting but not a language I actually wanted to code in. The second was a comment made by Anders Kaseorg after I finished delivering a talk Introduction to Web Application Security (one of the few things that, as a freshman at MIT, I thought I knew well enough to give a lecture on). One of the emphases of the talk was all about types: that is, the fact that “string” doesn’t adequately represent the semantic content of most bits of text that float around in our applications these days. Haskell came up as a way of making your compiler make sure you didn’t mix up HTML with plain text.
Something must have clicked. That February, I wrote:
Wow. Haskell is pretty.
To which someone replied:
Don’t look too hard into the sun, your eyes will get burned.
And thus a statically-typed functional programmer was born.
Postscript. My first application in Haskell was a Laplace solver, with which I also learned about monads (because a map lookup returned a Maybe value, and Anders decided it would be a good idea to talk about do-notation and bind to elucidate how to handle it. I probably didn’t understand the explanation the first time around, but I did manage to get the program working.)
February 28, 2011two inflammatory vignettes
The term to cargo cult is one with derogatory connotations: it indicates the act of imitating the superficial exterior without actually understanding the underlying causal structure of the situation. The implication is that one should try to understand what one is doing, before doing it. There is, however, an ounce of truth in the practice of cargo culting: when you are in a situation in which you legitimately do not know what’s going on (e.g. the context of an experiment), it is safest to preserve as many superficial traits as possible, in case a “superficial” trait in fact has a deep, non-obvious connection to the system being studied. But in this regard, beneficial “cargo culting” is nothing like the islanders throwing up airstrips in hopes of attracting planes—understanding what conditions are applicable for this treatment is often the mark of experience: the novice ignores conditions that should be preserved and does not know how to probe deeper.
Hacking is the art of accidental generalization. It is developing a program under a single set of conditions (a hard-coded test input, a particular directory structure, a single URL) and (perhaps) hoping it will work in the more general case. Anything that gets in the way of specificity—proofs, types, security, verbosity, edge-cases, thinking—is the enemy for pure creation. It is the art of the present, and much profit and pleasure can be derived from it. It is the art of laser precision, each problem dealt with as it comes. It is an art that becomes more acceptable engineering practice with experience: one develops little internal censors that continually pipe up with mental flags where you need to give a little extra to make the generalization work. Novices are recommended to bring their check-lists along.
February 25, 2011Most of my hacking cycles right now are going towards debugging the new code generator for GHC. The code generation stage of GHC takes the Spineless Tagless G-machine (STG) intermediate representation (IR) to the C– high-level assembly representation; the old code generator essentially performed this step in one big bang. The new code generator is many things. It is a more modular, understandable and flexible codebase. It is a client of cutting edge research in higher-order frameworks for control-flow optimization.
It is also frickin’ hard to debug.
I used to get frustrated and give up if I couldn’t figure out what was causing a bug within a few hours of close analysis. Working on GHC has enlightened me about the multi-day debugging: a well-defined bug that persists despite several days of intense analysis. (I’ve only managed this a few times in the past—I’m quite proud that I managed to pull together enough information to resolve “the bug”. What is “the bug”? Have you ever been browsing a MediaWiki site and then been mysteriously asked to download a PHP file? Yeah, that’s “the bug”). It has exponentially increased my proficiency with gdb and has been an amazing romp in the theoretics and practice of compiler construction. I’ve felt stupid for not immediately understanding what in retrospect seem perfectly clear and obvious concepts. I’ve felt an amazing rush, not from when the problem is solved (though that certainly gives a good feeling), but when my plan of attack is making progress. I’ve seen my theories evolve from one to another to another, and have learned never to trust any experimental observation at first sight.
While the debugging process is not yet done (though I think I’m close to having a correct—but slow—new code generation pipeline), I thought I’d take out some time to describe the journey.
There are some classic reasons:
- I didn’t write GHC. (Ha!) Debugging not-your-code is hard.
- I’ve never written a compiler before.
- In fact, I had never taken a compilers class (currently being fixed, though I think I’ve learned a lot more a lot more quickly via GHC hacking).
Some reasons that come naturally from big projects:
- GHC takes a long time to compile, so think before you compile.
- GHC is big; there are a lot of possible places for bugs to creep in.
Some of it’s because GHC is written in a functional programming language for compiling functional programs:
- Forget about line-by-line stepping: it’s all assembly and memory dumps.
- GHC is highly abstract, and you can’t really hope to understand what the code is doing by mapping it to some step-by-step execution.
- The generated code is in continuation passing style and doesn’t resemble any imperative execution scene you might have seen before. I will never again take a calling convention for granted.
Fascinatingly enough, while the bugs result in extremely strange behavior in compiled programs that takes ages to decipher, once the bad behavior is fully understood, the fix is usually a one-liner. It is this fact that makes debugging GHC frustrating and brilliant at the same time: sometimes code you’re debugging is fundamentally mistaken, and you have to rewrite the whole thing. GHC’s code is fundamentally clear (a testament to those who wrote it), and a bug is usually just a small detail someone forgot to attend to. The solutions are like Mystery Hunt solutions: short, and you know when you’ve found it. Nothing messy like, “What is this actually supposed to do?”
I have the benefit of an existing code generation pipeline which I can use to compare my results with, although doing so is not trivial since the new code generator does go about the compilation in a fundamentally different way, and so sections of code are frequently not comparable.
I also have the benefit of a wondrous test-suite which produces me programs that reproduceably segfault with little fuss, and have been relatively blessed with bugs that show in single-threaded situations. My programs have well defined inputs and outputs, and I have sophisticated mechanisms for inspecting the internal state of the multipass compiler.
- Have utter and complete confidence in your build. If you have the slightest suspicion that your build didn’t precisely pick up on a change you made, make a new build tree and build from scratch. A lot of mysterious problems go away when you do a clean build.
- The GHC .gdbinit file is magical; I don’t know where I’d be without it, and I’m probably only using 20% of its macros at this point.
disas is your friend, as is pstk and pmem. I can’t describe how many times I’ve used the go-back-in-time trick. - It is always worth reducing your test input. Not only does it make the output you stare at simpler, the act of reduction will cause changes in irrelevant portions of the output, but preserve the essential portions that are related to the bug. Probably my biggest breakthroughs were when I could show that another program exhibited the same bug.
- Do the background reading. A lot of the “undocumented” cleverness in GHC is actually described in the literature. Reading the Hoopl paper suddenly made the register spilling code lucid, and made me realize my princess was in another castle.
- Your first theory is usually wrong. You will fixate on the wrong detail. Keep all of the possibilities in mind, but at the very beginning, you should ask for more information. If you did not write the system, you probably don’t have any intuition about what the bug might be, and attempts to short-cut the analysis process will just leave you confused. (An annoying side effect is once you’ve done your research, you start thinking, “Man, if only I had known the system a bit better, I would have spotted that bug much more quickly.”)
- You still need a working theory to dictate what sort of information you are going to gather, but it’s really important to keep anomalies in mind as you begin to formulate a solution. For a more detailed analysis on this, see Test 4221 below. If a fix “works”, but there is an anomaly, keep trying to understand the situation.
- You will never collect a ton of data in one go, stare at it, and notice the critical pattern that resolves the bug. Debugging is all about asking the right questions, and on the path to this state, you will ask a lot of wrong questions.
Warning: Gory technical detail ahead.
My first job was to make the latest new code generation code compile with the latest GHC branch (it had bit-rotted a bit in the interim.) This went mostly smoothly, except for the fact that Norman Ramsey really likes polymorphic local definitions and MonoLocalBinds reared its ugly head in Hoopl and a few other modules.
Test 4030 was this “simple” program (simple is in quotes, because as Simon Peyton-Jones put it, “that looks like a hard one to start with… threads, exceptions, etc.”) :
main = do tid <- block $ forkIO $ let x = x in x
killThread tid
The resulting code segfaulted in stg_BLACKHOLE_info when attempting to dereference “something.” :
0x822a6e0 <stg_CAF_BLACKHOLE_info>: jmp 0x822a620 <stg_BLACKHOLE_info>
0x822a620 <stg_BLACKHOLE_info>: mov 0x4(%esi),%eax
0x822a623 <stg_BLACKHOLE_info+3>: test $0x3,%eax
0x822a628 <stg_BLACKHOLE_info+8>: jne 0x822a663 <stg_BLACKHOLE_info+67>
0x822a62a <stg_BLACKHOLE_info+10>: mov (%eax),%ecx -- SEGFAULT!
This something ended up being a new stack slot that Simon Marlow introduced when he had rewritten the blackholing scheme. The solution was to port these changes to the new code generator. I ended up manually reviewing every patch within the merge time window to ensure all changes had been ported, and probably squished a few latent bugs in the process. There’s no patch because I ended up folding this change into the merge (since the new blackholing scheme had not existed at the time the new code generator branch was frozen.)
Test ffi021 involved creating a pointer to an imported FFI function, and then dynamically executing it. (I didn’t even know you could do that with the FFI!) :
type Malloc = CSize -> IO (Ptr ())
foreign import ccall unsafe "&malloc" pmalloc:: FunPtr Malloc
foreign import ccall unsafe "dynamic" callMalloc :: FunPtr Malloc -> Malloc
This ended up being a latent bug in the inline statement optimizer (not a bug in the new code generator, but a bug that the new codegen tickled). I got as far as concluding that it was an optimization bug in the native code generator before Simon Marlow identified the bug, and we got a one-line patch. :
hunk ./compiler/cmm/CmmOpt.hs 156
- where infn (CmmCallee fn cconv) = CmmCallee fn cconv
+ where infn (CmmCallee fn cconv) = CmmCallee (inlineExpr u a fn) cconv
This one took three weeks to solve. The original test code was fairly complex and highly sensitive to code changes. My first theory was that we were attempting to access a variable that had never been spilled to the stack, but after talking to Simon Peyton Jones about how stack spilling worked I got the inkling that this might not actually be the problem, and stopped attempting to understand the Hoopl code that did spilling and went back to analysis. There was another false end with regards to optimization fuel, which I hoped would help pinpoint the point of error but in fact doesn’t work yet. (Optimization fuel allows you to incrementally increase the number of optimizations applied, so you can binary search which optimization introduces the bug. Unfortunately, you most of the so-called “optimizations” were actually essential program transformations on the way to machine code.)
The breakthrough came when I realized that the bug persisted when I changed the types in the input program from CDouble to CInt64, but not when I changed the types to CInt32. This allowed me to identify the erroneous C– code involving garbage collection and reduce the test-case to a very small program which didn’t crash but showed the wrong code (since the program needed to run for a while in order to trigger a stack overflow at precisely the right place):
{-# LANGUAGE ForeignFunctionInterface #-}
module Main(main) where
import Foreign.C
foreign import ccall safe "foo" foo :: CLLong -> CLLong
-- Changing to unsafe causes stg_gc_l1 to not be generated
-- Changing to IO causes slight cosmetic changes, but it's still wrong
main = print (foo 0)
After a huge misunderstanding regarding the calling convention and futile attempts to find a bug in the stack layout code (I assumed that slot<foo> + 4 indicated a higher memory location; in fact it indicated a lower memory location than slot<foo>), I finally identified the problem to be with the stg_gc_* calling convention.
My first patch to fix this changed the callee (the stg_gc_* functions) to use the observed calling convention that the new code generator was emitting, since I couldn’t see anything wrong with that code. But there was an anomalous bit: by this theory, all of the calls to GC should have used the wrong calling convention, yet only doubles and 64-bit integers exhibited this behavior. My patch worked, but there was something wrong. This something wrong was in fact the fact that 32-bit x86 has no general purpose non-32-bit registers, which was why the code generator was spilling only these types of arguments onto the stack. I learned a little bit more about GHC’s virtual registers, and determined another one line fix. :
hunk ./compiler/cmm/CmmCallConv.hs 50
- (_, GC) -> getRegsWithNode
+ (_, GC) -> allRegs
This one is in progress. Fixing the GC bug resolved all of the remaining mysterious test suite failures (hooray), and with this I was able to recompile GHC with all of the libraries with the new code generator. This triggered test 2047 to start segfaulting.
It took me a little bit of time to establish that I had not introduced a bug from compiling the stage 2 compiler with the new codegen (which I had done overzealously) and confirm which library code had the bug, but once I had done so I managed to reduce it to the following program (which I had lovingly named “bagels”):
import Bagel
main = do
l <- getContents
length l `seq` putStr (sort l)
with sort defined in the module Bagel as such:
module Bagel where
-- a bastardized version of sort that still exhibits the bug
sort :: Ord a => [a] -> [a]
sort = mergeAll . sequences
where
sequences (a:xs) = compare a a `seq` []:sequences xs
sequences _ = []
mergeAll [x] = x
mergeAll xs = mergeAll (mergePairs xs)
mergePairs (a:b:xs) = merge a b: mergePairs xs
mergePairs xs = xs
merge (a:as') (b:bs') = compare a a `seq` merge as' as'
merge _ _ = []
and run with the following data:
$ hexdump master-data
0000000 7755 7755 7755 7755 7755 7755 7755 7755
*
000b040
This program has a number of curious properties. The segfault goes away if I:
- Turn off compacting GC
- Reduce the size of master-data
- Turn off optimizations
- Use the old codegen
- Put all of the code in one file
- Remove the seqs from ‘sort’ (which isn’t actually a sort)
- Remove the seqs from ‘main’
- Make the sort function monomorphic on Char
The current theory is someone (either the new code generator or the compacting GC) is not handling a tag bit properly, but I haven’t quite figured out where yet. This is the only outstanding bug unique to the new code generator.
The code generator produces some pretty idiotic code (as I noticed while I was reading pages and pages of C–), and it’s also pretty slow. Once correct, it’s time to optimize, in more ways than one.
February 23, 2011In his book Against Method, Paul Feyerabend writes the following provocative passage about ‘ad hoc approximations’, familiar to anyone whose taken a physics course and thought, “Now where did they get that approximation from…”
The perihelion of Mercury moves along at a rate of about 5600" per century. Of this value, 5026" are geometric, having to do with the movement of the reference system, while 531" are dynamical, due to the perturbations in the solar system. Of these perturbations all but the famous 43" are accounted for by classical mechanics. This is how the situation is usually explained.
The explanation shows that the premise from which we derive 43" is not the general theory of relativity plus suitable initial conditions. The premise contains classical physics in addition to whatever relativistic assumptions are being made. Furthermore, the relativistic calculation, the so-called ‘Schwarzschild solution’, does not deal with the planetary system as it exists in the real world (i.e. our own asymmetric galaxy); it deals with the entirely fictional case of a central symmetrical universe containing a singularity in the middle and nothing else. What are the reasons for employing such an odd conjunction of premises?
The reason, according to the customary reply, is that we are dealing with approximations. The formulae of classical physics do not appear because relativity is incomplete. Nor is the centrally symmetric case used because relativity does not offer anything better. Both schemata flow from the general theory under special circumstances realized in our planetary system provided we omit magnitudes too small to be considered. Hence, we are using the theory of relativity throughout, and we are using it in an adequate matter.
Note, how this idea of an approximation differs from the legitimate idea. Usually one has a theory, one is able to calculate the particular case one is interested in, one notes that this calculation leads to magnitudes below experimental precision, one omits such magnitudes, and one obtains a vastly simplified formalism. In the present case, making the required approximations would mean calculating the full n-body problem relativistically (including long-term resonances between different planetary orbits), omitting magnitudes smaller than the precision of observation reached, and showing that the theory thus curtailed coincides with classical celestial mechanics as corrected by Schwarzschild. This procedure has not been used by anyone simply because the relativistic n-body problem has as yet withstood solution. There are not even approximate solutions for important problems such as, for example, the problem of stability (one of the first great stumbling blocks for Newton’s theory). The classical part of the explanans [the premises of that explain our observations], therefore, does not occur just for convenience, it is absolutely necessary. And the approximations made are not a result of relativistic calculations, they are introduced in order to make relativity fit the case. One may properly call them ad hoc approximations.
Feyerabend is wonderfully iconoclastic, and I invite the reader to temporarily suspend their gut reaction to the passage. For those thinking, “Of course that’s what physicists do, otherwise we’d never get any work done,” consider the question, Why do we have any reason to believe that the approximations are justified, that they will not affect the observable results of our calculations, that they actually reflect reality? One could adopt the viewpoint that such doubts are unproductive and get in the way of doing science, which we know from prior experience to work. But I think this argument does have important implications for prescriptivists in all fields—those who would like to say how things ought to be done (goodness one sees a lot of that in the software field; even on this blog.) Because, just as the student complains, “There is no way I could have possibly thought up of that approximation” or the mathematician winces and thinks “There is no reason I should believe that approximation should work”, if these approximations do exist and the course of science is to discover them, well, how do you do that?
The ivory tower is not free of the blemishes of real life, it seems.