May 23, 2011Recursion is one of those things that functional programming languages shine at—but it seems a bit disappointing that in many cases, you have to convert your beautiful recursive function back into iterative form. After all, iteration is what imperative languages do best, right?
Actually, explicitly tail-recursive functions in functional programming languages can be fairly beautiful: in fact, in the cases of complicated loops, they can be even prettier than their imperative counterparts. Take this midpoint line-drawing algorithm as an example:
circleMidpoint d r = go 0 (-r) k0
where k0 = 5 - 4 * r
x1 = ceiling (fromIntegral r / sqrt 2)
go x y k | x > x1 = return ()
| k > 0 = d (x,y) >> go (x+1) (y+1) (k+8*x+8*y+20)
| otherwise = d (x,y) >> go (x+1) y (k+8*x+12)
There are three loop variables: x, y and k, and depending on various conditions, some of them get updated in different ways. x is a bog-standard loop variable; ye old C-style for loop could handle it just fine. But y and k are updated differently depending on some loop conditions. But since they’re parameters to the go helper function, it’s always clear what the frequently changing variables are. You lose that nice structure in the imperative translation:
// global variables and loop variables are all mixed together
int k = 5 - 4 * r;
int y = -r;
int x1 = ceil(r/sqrt(2));
for (int x = 0; x <= x1; x++) { // only x is obviously an index var
draw(x, y);
if (k > 0) {
y++;
k += 8*x + 8*y + 20;
} else {
k += 8*x + 12;
}
// does it ever make sense for any code to live here?
}
I’ve also managed to introduce a bug in the process…
May 20, 2011This is an addendum to my second example in Anatomy of a thunk leak, in which I’d like to propose another solution to the space leak, involving computing the composition of all of these thunks. This solution is particularly notable because it preserves the denotation of the original function, that is, that f l (undefined, undefined) = (undefined, undefined). This should be surprising, because I claimed that it would be impossible for GHC to optimize a function with that had this denotation into one without the space leak by more eagerly evaluating some thunks. There is no contradiction: the optimization we would like to apply here is one of partial evaluation. Didn’t understand that? Don’t worry, a concrete example is coming soon.
As Heinrich Apfelmus points out, the space leak can be visualized as a large graph of expressions which has not been collapsed into a single value: 1 + (1 + (1 + (1 + (1 + (1 + ...))))). We can visualize this graph being built up in successive iterations of the function:

The point of introducing strictness (and thus changing the denotation of the function) is that we keep collapsing (evaluating) the tree.

But notice the value highlighted in red: we must know what this value is before we can do any computation. But if this value is unknown (or, in our case, if we don’t want to evaluate it while we are forming this graph), our strategy doesn’t really work. We can’t collapse the entire tree. However, (and this is the key), because addition is associative, we can rotate the tree, and then evaluate the (now left) subtree.

In effect, all of the thunks have been merged together: instead of 1 + 1 + 1 + X. we now have 3 + X. Simple! Here is the implementation:
f l (x0, x1) = go l (0, 0)
where go [] (!c0, !c1) = (c0 + x0, c1 + x1)
go (x:xs) !c = go xs (tick x c)
tick x (!c0, !c1) | even x = (c0, c1 + 1)
| otherwise = (c0 + 1, c1)
go is essentially the strict version of f, but at the end of the iteration it returns a pair with two thunks: c0 + x0 and c1 + x1, were both c0 and c1 have been fully evaluated.
Here’s another way of thinking of how we’re doing things:

It would be pretty cool if this could be done automatically, and it would pretty applicable in other domains too. Combining functions that are associative are a precious commodity when it comes to parallelization.
May 18, 2011In this post, we discuss the characteristics of a thunk leak, the leak that has come to symbolize the difficulties of “reasoning about space usage” in Haskell. I’ll consider a few examples of this type of leak and argue that these leaks are actually trivial to fix. Rather, the difficulty is when a thunk leak gets confused with other types of leaks (which we will cover in later posts).
Description
I’ll be describing the various leaks in two ways: I will first give an informal, concrete description using the metaphor I developed in the Haskell Heap series, and then I will give a more direct, clinical treatment at the end. If you can’t stand one form of explanation or the other, feel free to skip around.
Thunk leaks occur when too many wrapped presents (thunks) are lying around at the same time.

The creation of thunks is not necessarily a bad thing: indeed, most Haskell programs generate lots of thunks. Sometimes the presence of thunks on the heap is unavoidable. The problem is when they do not get evaluated in due course: like socks in the room of a lazy college student, they start piling up.

There is a precise sense by which the thunks “pile” up, which can be observed by looking at the presents the ghosts care about.

Each ghost cares about the next present in the pile (so the Grinch can’t steal them away), and we (the user) care about the present at the very bottom of the pile. Thus, when we open that present, the whole chain of presents comes toppling down (assuming there are not other references pointed to the pile).

The chain of thunks could really be any shape you want, though linear is the usual case.

What would fixing the problem look like? It’s certainly not waiting until the presents get piled up and then cleaning them up in one go (as our college student might do): the damage (big memory usage) has already been done!

Rather, we should be a bit more eager and open up our presents as we get them.

This strategy can fail, however. If opening the presents results in something even bigger than we started off with or if we might not need to open all the presents, we might be better off just being lazy about it.

There’s also the question of where all these presents came from in the first place. Maybe we were too eager about getting the presents in the first place…

In summary, a thunk leak is when a Haskell program builds up a large number of thunks that, if evaluated, would result in much smaller memory usage. This requires such thunks to have several properties:
- They must not have external references to them (since the idea is as the thunks are evaluated, their results can get garbage collected),
- They must perform some sort of reduction, rather than create a bigger data structure, and
- They should be necessary.
If (1) fails, it is much more probable that these thunks are legitimate and only incur a small overhead (and the real difficulty is an algorithmic one). If (2) fails, evaluating all of the thunks can exacerbate the memory situation. And if (3) fails, you might be looking at a failure of streaming, since thunks are being eagerly created but lazily evaluated (they should be lazily created as well).
Diagnosis
As with most space leaks, they usually only get investigated when someone notices that memory usage is unusually high. However, thunk leaks also tend to result in stack overflows when these thunk chains get reduced (though not always: a thunk chain could be tail recursive.) As with all performance tuning, you should only tune while you are doing measurements: otherwise, you may spend a lot of time optimizing something that is relatively insignificant (or worse yet, that GHC already optimized for you!)
The next line of diagnosis is the heap residency profile, which does not require you to recompile your program with profiling enabled. Just add -hT as an RTS flag. In the case of thunk leak, the heap profile is very tell-tale: a large chunk of the heap will be occupied with THUNK. Bingo!

Note. This diagnostic step is why I’ve chosen to distinguish between thunk leaks and live variable leaks. A thunk leak will have thunks dominating the heap because the thunks themselves are numerous and are consuming memory. A live variable leak may be caused by a thunk retaining extra memory, but the thunks themselves may not necessarily show up on the heap, because you only need one reachable thunk to cause memory to be retained.
Examples
I’ve distilled some examples in order to help illustrate the phenomenon in question, as well as give direct, source-level indications on all the possible ways you can go about fixing the leak. I’ll also give some examples of things that could have leaked, but didn’t because GHC was sufficiently clever (hooray for optimizations!) Runnable code can be found in the GitHub repository, which I will try to keep up-to-date.
We’ll first start with the classic space leak from naive iterative code:
main = evaluate (f [1..4000000] (0 :: Int))
f [] c = c
f (x:xs) c = f xs (c + 1)
It should be obvious who is accumulating the thunks: it’s c + 1. What is less obvious, is that this code does not leak when you compile GHC with optimizations. Why is this the case? A quick look at the Core will tell us why:
Main.$wf =
\ (w_s1OX :: [GHC.Integer.Type.Integer])
(ww_s1P0 :: GHC.Prim.Int#) ->
case w_s1OX of _ {
[] -> ww_s1P0;
: _ xs_a1MR -> Main.$wf xs_a1MR (GHC.Prim.+# ww_s1P0 1)
}
Notice that the type of c (renamed to ww_s1P0) is GHC.Prim.Int#, rather than Int. As this is a primitive type, it is unlifted: it is impossible to create thunks of this type. So GHC manages to avoid thunks by not creating them at all in the first place. Fixing the unoptimized case is as simple as making c strict, since addition of integers is a strict function.
It is not, in general, possible for GHC to do this kind of unboxing optimization without violating the semantics of our code. Our next piece of code looks at precisely such a case:
main = do
evaluate (f [1..4000000] (0 :: Int, 1 :: Int))
f [] c = c
f (x:xs) c = f xs (tick x c)
tick x (c0, c1) | even x = (c0, c1 + 1)
| otherwise = (c0 + 1, c1)
This space leaks both with and without optimizations. It also stack overflows.

It is not possible for GHC to optimize this code in such a way that the elements of the pair are eagerly evaluated without changing the semantics of the function f. Why is this the case? We consider an alternate call to f: f [1..4000000] (0, undefined). The current semantics of the function demand that the result be (2000000, undefined) (since anything added to undefined is undefined), which means we cannot do any evaluation until the inside of the resulting tuple is forced. If we only ever evaluate the tuple to whnf (as the call to evaluate does) or if we only ever use the first result, then no exception should be thrown. This is indeed the case if we replace 1 :: Int with undefined and run the program.
OK, that’s enough theory, how do we fix this bug? I could just give you a single answer, but I think it will be more informative if we consider a range of possible fixes and analyze their effect on the program. Hopefully, this will make space leaks less like casting the runes, and much more methodical.
Add a bang-pattern to c in f. This doesn’t work:
f [] !c = c
f (x:xs) !c = f xs (tick x c)

The insight is that we’ve not changed the semantics of the function at all: f l (undefined, undefined) still should result in (undefined, undefined), since seq doesn’t “look inside the tuple”. However, adding this bang-pattern may help in the construction of other solutions, if evaluating the tuple itself has other side-effects (as we might say, that ghost might open some presents for us).
Make the tuple in tick irrefutable. This is just confused:
tick x ~(c0, c1) | even x = (c0, c1 + 1)
| otherwise = (c0 + 1, c1)

Irrefutable patterns add laziness, not strictness, so it’s not surprising that the problem has gotten worse (note the memory usage is now up to 80M, rather than 40M).
Make tick strict. Notice that the x is already forced immediately by even x, so there’s no need to add a bang pattern there. So we just add bang patterns to c0 and c1:
tick x (!c0, !c1) | even x = (c0, c1 + 1)
| otherwise = (c0 + 1, c1)

These might look like a terrible graph, but look at the scale. 1.2 kilobytes. In general, if after you make a change to a Haskell program and you start seeing lots of bands again, it means you’ve fixed the leak. So we’ve fixed it!
Well, not quite. The unoptimized code still has a leak:

We fixed our space leak by enabling a GHC optimization, similar to the one that fixed our original space leak. Once again, the Core makes this clear:
Main.$wf :: [GHC.Integer.Type.Integer]
-> GHC.Types.Int
-> GHC.Types.Int
-> (# GHC.Types.Int, GHC.Types.Int #)
GHC has optimized the tuple away into an unboxed return and inlined the call to tick, as a result we don’t have any tuple thunks floating around. We could have manually performed this optimization, but it’s better to the let the compiler do it for us (and keep our code clean.)
Strictify tick and f. In analogy with the first example, now that tick is strict, if we strictify both places, the unoptimized code will also be fine. And indeed, it is.

It doesn’t help us much for the optimized case though! (There is essentially no change to the heap profile.)
Make the pair strict. Using a strict pair instead of the default lazy pair is equivalent to inserting bang patterns every where we pattern match on a tuple. It is thus equivalent to strictifying tick, and if you do this you will still need a little extra to get it working in the unoptimized case. This tends to work better when you control the data structure that is going into the loop, since you don’t need to change all of your data declarations.
Deep seq c. If a simple bang pattern for c doesn’t work, a deep bang pattern will:
f [] c = c
f (x:xs) c@(!_,!_) = f xs (tick x c)

Alternatively, you could have used rnf from the deep seq package. While this does work, I personally think that it’s better policy to just use a strict data type, if you’re going to be rnf’ing willy-nilly, you might as well keep things fully evaluated all the time.
I had another example, but I’m out of time for today! As some parting words, note that tuples aren’t the only lifted types floating around: everything from records to single data constructors (data I a = I a) to mutable references have these extra semantics which can have extra space costs. But identifying and fixing this particular problem is really easy: the heap profile is distinctive, the fix is easy and non-invasive, and you even have denotational semantics to aid your analysis of the code! All you need is a little extra knowledge.
Postscript. Apologies for the wildly varying graph axes and shifty colors. Try to focus on the shape and labeling. I’m still wrangling hp2pretty to get it to generate the right kinds of heap profiles, and I need a more consistent scaling mechanism and more consistent coloring. Experiments were done on GHC 6.12.3.
May 16, 2011A big thanks to everyone who everyone who sent in space leak specimens. All of the leaks have been inspected and cataloged by our experts, and we are quite pleased to open the doors of the space leak zoo to the public!
There are a few different types of space leak here, but they are quite different and a visitor would do well not to confuse them (the methods for handling them if encountered in the wild vary, and using the wrong technique could exacerbate the situation).
- A memory leak is when a program is unable to release memory back to the operating system. It’s a rare beast, since it has been mostly eliminated by modern garbage collection. We won’t see any examples of it in this series, though it is strictly possible for Haskell programs to exhibit this type of leak if they use non-bracketed low-level FFI allocation functions.
- A strong reference leak is when a program keeps around a reference to memory that in principle could be used but actually will never be used anymore. A confluence of purity and laziness make these types of leaks uncommon in idiomatic Haskell programs. Purity sidesteps these leaks by discouraging the use of mutable references, which can leak memory if they are not cleared when appropriate. Laziness sidesteps these leaks by making it difficult to accidentally generate too much of a data structure when you only want parts of it: we just use less memory to begin with. Of course, using mutable references or strictness in Haskell can reintroduce these errors (sometimes you can fix instances of the former with weak references—thus the name, “strong reference” leak), and live variable leaks (described below) are a type of strong reference leak that catch people unfamiliar with closures by surprise.
- A thunk leak is when a program builds up a lot of unevaluated thunks in memory which intrinsically take up a lot of space. It can be observed when a heap profile shows a large number of
THUNK objects, or when you stack overflow from evaluating a chain of these thunks. These leaks thrive on lazy evaluation, and as such are relatively rare in the imperative world. You can fix them by introducing appropriate strictness. - A live variable leak is when some closure (either a thunk or a lambda) contains a reference to memory that a programmer expects to have been freed already. They arise because references to memory in thunks and functions tend to be implicit (via live variables) as opposed to explicit (as is the case for a data record.) In instances where the result of the function is substantially smaller, these leaks can be fixed by introducing strictness. However, these are not as easy to fix as thunk leaks, as you must ensure all of the references are dropped. Furthermore, the presence of a large chunk of evaluated memory may not necessarily indicate a live variable leak; rather, it may mean that streaming has failed. See below.
- A streaming leak is when a program should only need a small amount of the input to produce a small amount of output, thus using only a small amount of memory, but it doesn’t. Instead, large amounts of the input are forced and kept in memory. These leaks thrive on laziness and intermediate data structures, but, unlike the case of thunk leaks, introducing strictness can exacerbate the situation. You can fix them by duplicating work and carefully tracking data dependencies.
- A stack overflow is when a program builds up a lot of pending operations that need to performed after the current execution. It can be observed when your program runs out of stack space. It is, strictly speaking, not a space leak, but an improper fix to a thunk leak or a streaming leak can convert it into a stack overflow, so we include it here. (We also emphasize these are not the same as thunk leaks, although sometimes they look the same.) These leaks thrive on recursion. You can fix them by converting recursive code to iterative style, which can be tail-call optimized, or using a better data structure. Turning on optimization also tends to help.
- A selector leak is a sub-species of a thunk leak when a thunk that only uses a subset of a record, but because it hasn’t been evaluated it causes the entire record to be retained. These have mostly been killed off by the treatment of GHC’s selector thunks by the RTS, but they also occasionally show due to optimizations. (See below.)
- An optimization induced leak is a camouflaged version of any of the leaks here, where the source code claims there is no leak, but an optimization by the compiler introduces the space leak. These are very tricky to identify; we would not put these in a petting zoo! (You can fix them by posting a bug to GHC Trac.)
- A thread leak is when too many Haskell threads are lying around. You can identify this by a contingent of TSO on the heap profile: TSO stands for thread-state object. These are interesting to debug, because there are a variety of reasons why a thread may not dying.
In the next post, we’ll draw some pictures and give examples of each of these leaks. As an exercise, I invite interested readers to categorize the leaks we saw last time.
Update. I’ve separated thunk leaks from what I will now call “live variable leaks,” and re-clarified some other points, especially with respect to strong references. I’ll expand on what I think is the crucial conceptual difference between them in later posts.
May 13, 2011Short post, longer ones in progress.
One of the really neat things about the Par monad is how it explicitly reifies laziness, using a little structure called an IVar (also known in the literature as I-structures). An IVar is a little bit like an MVar, except that once you’ve put a value in one, you can never take it out again (and you’re not allowed to put another value in.) In fact, this precisely corresponds to lazy evaluation.

The key difference is that an IVar splits up the naming of a lazy variable (the creation of the IVar), and specification of whatever code will produce the result of the variable (the put operation on an IVar). Any get to an empty IVar will block, much the same way a second attempt to evaluate a thunk that is being evaluated will block (a process called blackholing), and will be fulfilled once the “lazy computation” completes (when the put occurs.)
It is interesting to note that this construction was adopted precisely because laziness was making it really, really hard to reason about parallelism. It also provides some guidance for languages who might want to provide laziness as a built-in construct (hint: implementing it as a memoized thunk might not be the best idea!)
May 11, 2011I’m currently collecting non-stack-overflow space leaks, in preparation for a future post in the Haskell Heap series. If you have any interesting space leaks, especially if they’re due to laziness, send them my way.
Here’s what I have so far (unverified: some of these may not leak or may be stack overflows. I’ll be curating them soon).
import Control.Concurrent.MVar
-- http://groups.google.com/group/fa.haskell/msg/e6d1d5862ecb319b
main1 = do file <- getContents
putStrLn $ show (length $ lines file) ++ " " ++
show (length $ words file) ++ " " ++
show (length file)
-- http://www.haskell.org/haskellwiki/Memory_leak
main2 = let xs = [1..1000000::Integer]
in print (sum xs * product xs)
-- http://hackage.haskell.org/trac/ghc/ticket/4334
leaky_lines :: String -> [String]
leaky_lines "" = []
leaky_lines s = let (l, s') = break (== '\n') s
in l : case s' of
[] -> []
(_:s'') -> leaky_lines s''
-- http://stackoverflow.com/questions/5552433/how-to-reason-about-space-complexity-in-haskell
data MyTree = MyNode [MyTree] | MyLeaf [Int]
makeTree :: Int -> MyTree
makeTree 0 = MyLeaf [0..99]
makeTree n = MyNode [ makeTree (n - 1)
, makeTree (n - 1) ]
count2 :: MyTree -> MyTree -> Int
count2 r (MyNode xs) = 1 + sum (map (count2 r) xs)
count2 r (MyLeaf xs) = length xs
-- http://stackoverflow.com/questions/2777686/how-do-i-write-a-constant-space-length-function-in-haskell
leaky_length xs = length' xs 0
where length' [] n = n
length' (x:xs) n = length' xs (n + 1)
-- http://stackoverflow.com/questions/3190098/space-leak-in-list-program
leaky_sequence [] = [[]]
leaky_sequence (xs:xss) = [ y:ys | y <- xs, ys <- leaky_sequence xss ]
-- http://hackage.haskell.org/trac/ghc/ticket/917
initlast :: (()->[a]) -> ([a], a)
initlast xs = (init (xs ()), last (xs ()))
main8 = print $ case initlast (\()->[0..1000000000]) of
(init, last) -> (length init, last)
-- http://hackage.haskell.org/trac/ghc/ticket/3944
waitQSem :: MVar (Int,[MVar ()]) -> IO ()
waitQSem sem = do
(avail,blocked) <- takeMVar sem
if avail > 0 then
putMVar sem (avail-1,[])
else do
b <- newEmptyMVar
putMVar sem (0, blocked++[b])
takeMVar b
-- http://hackage.haskell.org/trac/ghc/ticket/2607
data Tree a = Tree a [Tree a] deriving Show
data TreeEvent = Start String
| Stop
| Leaf String
deriving Show
main10 = print . snd . build $ Start "top" : cycle [Leaf "sub"]
type UnconsumedEvent = TreeEvent -- Alias for program documentation
build :: [TreeEvent] -> ([UnconsumedEvent], [Tree String])
build (Start str : es) =
let (es', subnodes) = build es
(spill, siblings) = build es'
in (spill, (Tree str subnodes : siblings))
build (Leaf str : es) =
let (spill, siblings) = build es
in (spill, Tree str [] : siblings)
build (Stop : es) = (es, [])
build [] = ([], [])
May 9, 2011Today, we discuss how presents on the Haskell Heap are named, whether by top-level bindings, let-bindings or arguments. We introduce the Expression-Present Equivalent Exchange, which highlights the fact that expressions are also thunks on the Haskell heap. Finally, we explain how this let-bindings inside functions can result in the creation of more presents, as opposed to constant applicative forms (CAFs) which exist on the Haskell Heap from the very beginning of execution.
When we’ve depicted presents on the Haskell Heap, they usually have names.

We’ve been a bit hush-hush about where these names come from, however. Partially, this is because the source of most of these names is straight-forward: they’re simply top-level bindings in a Haskell program:
y = 1
maxDiameter = 100
We also have names that come as bindings for arguments to a function. We’ve also discussed these when we talked about functions. You insert a label into the machine, and that label is how the ghost knows what the “real” location of x is:
f x = x + 3
pred = \x -> x == 2
So if I write f maxDiameter the ghost knows that wherever it sees x it should instead look for maxDiameter. But this explanation has some gaps in it. What if I write f (x + 2): what’s the label for x + 2?
One way to look at this is to rewrite this function in a different way: let z = x + 2 in f z, where z is a fresh variable: one that doesn’t show up anywhere else in the expression. So, as long as we understand what let does, we understand what the compact f (x + 2) does. I’ll call this the Expression-Present Equivalent Exchange.

But what does let do anyway?
Sometimes, exactly the same job as a top-level binding. These are Constant Applicative Forms (CAF).

So we just promote the variable to the global heap, give it some unique name and then it’s just like the original situation. We don’t even need to re-evaluate it on a subsequent call to the function. To reiterate, the key difference is free variables (see bottom of post for a glossary): a constant-applicative form has no free variables, whereas most let bindings we write have free variables.
Glossary. The definition of free variables is pretty useful, even if you’ve never studied the lambda calculus. The free variables of an expression are variables for which I don’t know the value of simply by looking at the expression. In the expression x + y, x and y are free variables. They’re called free variables because a lambda “captures” them: the x in \x -> x is not free, because it is defined by the lambda \x ->. Formally:
fv(x) = {x}
fv(e1 e2) = fv(e1) `union` fv(e2)
fv(\x -> e1) = fv(e1) - {x}
If we do have free variables, things are a little trickier. So here is an extended comic explaining what happens when you force a thunk that is a let binding.


Notice how the ghosts pass the free variables around. When a thunk is left unevaluated, the most important things to look at are its free variables, as those are the other thunks that will have been left unevaluated. It’s also worth repeating that functions always take labels of presents, never actual unopened presents themselves.
The rules are very simple, but the interactions can be complex!
Last time: How the Grinch stole the Haskell Heap
Technical notes. When writing strict mini-languages, a common trick when implementing let is to realize that it is actually syntax sugar for lambda application: let x = y in z is the same as (\x -> z) y. But this doesn’t work for lazy languages: what if y refers to x? In this case, we have a recursive let binding, and usually you need to use a special let-rec construction instead, which requires some mutation. But in a lazy language, it’s easy: making the binding will never evaluate the right-hand side of the equation, so I can set up each variable at my leisure. I also chose to do the presentation in the opposite way because I want people to always be thinking of names. CAFs don’t have names, but for all intents and purposes they’re global data that does get shared, and so naming it is useful if you’re trying to debug a CAF-related space leak.
Perhaps a more accurate translation for f (x + 2) is f (let y = x + 2 in y), but I thought that looked kind of strange. My apologies.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
May 6, 2011One of the persistent myths about Aristotelean physics—the physics that was proposed by the Ancient Greeks and held up until Newton and Galileo came along—is that Aristotle thought that “heavier objects fall more quickly than light objects”, the canonical example being that of a cannon ball and feather. Although some fraction of contemporary human society may indeed believe this “fact,” Aristotle had a far more subtle and well-thought out view on the matter. If you don’t believe me, an English translation of his original text (Part 8) adequately gives off this impression. Here is a relevant quote (emphasis mine):
We see the same weight or body moving faster than another for two reasons, either because there is a difference in what it moves through, as between water, air, and earth, or because, other things being equal, the moving body differs from the other owing to excess of weight or of lightness.
The “other things being equal” is a critical four words that are left out even of Roman accounts of the theory. See for example Roman philosopher Lucretius:
For whenever bodies fall through water and thin air, they quicken their descents in proportion to their weights…
While I do not mean to imply the Aristotle was correct or at all near a Newtonian conception of physics, Aristotle was cognizant of the fact that the shape of the falling object could affect its descent (“For a moving thing cleaves the medium either by its shape”) and in fact, the context of these quotes is not a treatise on how bodies move, but about how bodies move through a medium: in particular, Aristotle is attempting to argue how bodies might move in “the void.” But somehow, this statement got misinterpreted into a myth that the majority of Westerners believed up until the advent of Galilean physics. Where did all of these essential details go?
My belief is that Aristotle lacked the crucial conceptual framework—Newtonian physics—to fit these disparate facts together. Instead, Aristotle believed in the teleological principal, that all bodies had natural places which they natural moved toward, which failed to have any explanatory power for many instances of physical experience. What generalizations he were able to make were riddled with special cases and the need for careful thought, that made the ideas difficult to be communicated without losing something in the process. This is not to say the careful thinking is not necessary: even when we teach physics today, it is extremely easy to make the same mistakes that our historic predecessors made: the facts of the matter may all “be in Newton’s laws”, but it’s not necessarily obvious that this is the case!
But as you learn about the applications of a theory, about the places where it easily can go wrong, it is critical to remember that these thoughts and intuitions are to be hung on the grander tree of the unifying and adequate theory. Those who do not are forever doomed to juggling a confusing panoply of special cases and disjoint facts and cursed with the inability to concisely express their ideas. And though I have not the space to argue it here, I also claim that this applies for every other sort of discipline for which you must accumulate knowledge.
tl;dr — Facts without structure are facts easily forgotten.
May 4, 2011When we teach beginners about Haskell, one of the things we handwave away is how the IO monad works. Yes, it’s a monad, and yes, it does IO, but it’s not something you can implement in Haskell itself, giving it a somewhat magical quality. In today’s post, I’d like to unravel the mystery of the IO monad by describing how GHC implements the IO monad internally in terms of primitive operations and the real world token. After reading this post, you should be able to understand the resolution of this ticket as well as the Core output of this Hello World! program:
main = do
putStr "Hello "
putStrLn "world!"
Nota bene: This is not a monad tutorial. This post assumes the reader knows what monads are! However, the first section reviews a critical concept of strictness as applied to monads, because it is critical to the correct functioning of the IO monad.
The lazy and strict State monad
As a prelude to the IO monad, we will briefly review the State monad, which forms the operational basis for the IO monad (the IO monad is implemented as if it were a strict State monad with a special form of state, though there are some important differences—that’s the magic of it.) If you feel comfortable with the difference between the lazy and strict state monad, you can skip this section. Otherwise, read on. The data type constructor of the State monad is as follows:
newtype State s a = State { runState :: s -> (a, s) }
A running a computation in the state monad involves giving it some incoming state, and retrieving from it the resulting state and the actual value of the computation. The monadic structure involves threading the state through the various computations. For example, this snippet of code in the state monad:
do x <- doSomething
y <- doSomethingElse
return (x + y)
could be rewritten (with the newtype constructor removed) as:
\s ->
let (x, s') = doSomething s
(y, s'') = doSomethingElse s' in
(x + y, s'')
Now, a rather interesting experiment I would like to pose for the reader is this: suppose that doSomething and doSomethingElse were traced: that is, when evaluated, they outputted a trace message. That is:
doSomething s = trace "doSomething" $ ...
doSomethingElse s = trace "doSomethingElse" $ ...
Is there ever a situation in which the trace for doSomethingElse would fire before doSomething, in the case that we forced the result of the elements of this do block? In a strict language, the answer would obviously be no; you have to do each step of the stateful computation in order. But Haskell is lazy, and in another situation it’s conceivable that the result of doSomethingElse might be requested before doSomething is. Indeed, here is such an example of some code:
import Debug.Trace
f = \s ->
let (x, s') = doSomething s
(y, s'') = doSomethingElse s'
in (3, s'')
doSomething s = trace "doSomething" $ (0, s)
doSomethingElse s = trace "doSomethingElse" $ (3, s)
main = print (f 2)
What has happened is that we are lazy in the state value, so when we demanded the value of s'', we forced doSomethingElse and were presented with an indirection to s', which then caused us to force doSomething.
Suppose we actually did want doSomething to always execute before doSomethingElse. In this case, we can fix things up by making our state strict:
f = \s ->
case doSomething s of
(x, s') -> case doSomethingElse s' of
(y, s'') -> (3, s'')
This subtle transformation from let (which is lazy) to case (which is strict) lets us now preserve ordering. In fact, it will turn out, we won’t be given a choice in the matter: due to how primitives work out we have to do things this way. Keep your eye on the case: it will show up again when we start looking at Core.
Bonus. Interestingly enough, if you use irrefutable patterns, the case-code is equivalent to the original let-code:
f = \s ->
case doSomething s of
~(x, s') -> case doSomethingElse s' of
~(y, s'') -> (3, s'')
Primitives
The next part of our story are the primitive types and functions provided by GHC. These are the mechanism by which GHC exports types and functionality that would not be normally implementable in Haskell: for example, unboxed types, adding together two 32-bit integers, or doing an IO action (mostly, writing bits to memory locations). They’re very GHC specific, and normal Haskell users never see them. In fact, they’re so special you need to enable a language extension to use them (the MagicHash)! The IO type is constructed with these primitives in GHC.Types:
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))
In order to understand the IO type, we will need to learn about a few of these primitives. But it should be very clear that this looks a lot like the state monad…
The first primitive is the unboxed tuple, seen in code as (# x, y #). Unboxed tuples are syntax for a “multiple return” calling convention; they’re not actually real tuples and can’t be put in variables as such. We’re going to use unboxed tuples in place of the tuples we saw in runState, because it would be pretty terrible if we had to do heap allocation every time we performed an IO action.
The next primitive is State# RealWorld, which will correspond to the s parameter of our state monad. Actually, it’s two primitives, the type constructor State#, and the magic type RealWorld (which doesn’t have a # suffix, fascinatingly enough.) The reason why this is divided into a type constructor and a type parameter is because the ST monad also reuses this framework—but that’s a matter for another blog post. You can treat State# RealWorld as a type that represents a very magical value: the value of the entire real world. When you ran a state monad, you could initialize the state with any value you cooked up, but only the main function receives a real world, and it then gets threaded along any IO code you may end up having executing.
One question you may ask is, “What about unsafePerformIO?” In particular, since it may show up in any pure computation, where the real world may not necessarily available, how can we fake up a copy of the real world to do the equivalent of a nested runState? In these cases, we have one final primitive, realWorld# :: State# RealWorld, which allows you to grab a reference to the real world wherever you may be. But since this is not hooked up to main, you get absolutely no ordering guarantees.
Hello World
Let’s return to the Hello World program that I promised to explain:
main = do
putStr "Hello "
putStrLn "world!"
When we compile this, we get some core that looks like this (certain bits, most notably the casts (which, while a fascinating demonstration of how newtypes work, have no runtime effect), pruned for your viewing pleasure):
Main.main2 :: [GHC.Types.Char]
Main.main2 = GHC.Base.unpackCString# "world!"
Main.main3 :: [GHC.Types.Char]
Main.main3 = GHC.Base.unpackCString# "Hello "
Main.main1 :: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
Main.main1 =
\ (eta_ag6 :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case GHC.IO.Handle.Text.hPutStr1
GHC.IO.Handle.FD.stdout Main.main3 eta_ag6
of _ { (# new_s_alV, _ #) ->
case GHC.IO.Handle.Text.hPutStr1
GHC.IO.Handle.FD.stdout Main.main2 new_s_alV
of _ { (# new_s1_alJ, _ #) ->
GHC.IO.Handle.Text.hPutChar1
GHC.IO.Handle.FD.stdout System.IO.hPrint2 new_s1_alJ
}
}
Main.main4 :: GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
Main.main4 =
GHC.TopHandler.runMainIO1 @ () Main.main1
:Main.main :: GHC.Types.IO ()
:Main.main =
Main.main4
The important bit is Main.main1. Reformatted and renamed, it looks just like our desugared state monad:
Main.main1 =
\ (s :: State# RealWorld) ->
case hPutStr1 stdout main3 s of _ { (# s', _ #) ->
case hPutStr1 stdout main2 s' of _ { (# s'', _ #) ->
hPutChar1 stdout hPrint2 s''
}}
The monads are all gone, and hPutStr1 stdout main3 s, while ostensibly always returning a value of type (# State# RealWorld, () #), has side-effects. The repeated case-expressions, however, ensure our optimizer doesn’t reorder the IO instructions (since that would have a very observable effect!)
For the curious, here are some other notable bits about the core output:
- Our
:main function (with a colon in front) doesn’t actually go straight to our code: it invokes a wrapper function GHC.TopHandler.runMainIO which does some initialization work like installing the top-level interrupt handler. unpackCString# has the type Addr# -> [Char], so what it does it transforms a null-terminated C string into a traditional Haskell string. This is because we store strings as null-terminated C strings whenever possible. If a null byte or other nasty binary is embedded, we would use unpackCStringUtf8# instead.putStr and putStrLn are nowhere in sight. This is because I compiled with -O, so these function calls got inlined.
The importance of being ordered
To emphasize how important ordering is, consider what happens when you mix up seq, which is traditionally used with pure code and doesn’t give any order constraints, and IO, for which ordering is very important. That is, consider Bug 5129. Simon Peyton Jones gives a great explanation, so I’m just going to highlight how seductive (and wrong) code that isn’t ordered properly is. The code in question is x `seq` return (). What does this compile to? The following core:
case x of _ {
__DEFAULT -> \s :: State# RealWorld -> (# s, () #)
}
Notice that the seq compiles into a case statement (since case statements in Core are strict), and also notice that there is no involvement with the s parameter in this statement. Thus, if this snippet is included in a larger fragment, these statements may get optimized around. And in fact, this is exactly what happens in some cases, as Simon describes. Moral of the story? Don’t write x `seq` return () (indeed, I think there are some instances of this idiom in some of the base libraries that need to get fixed.) The new world order is a new primop:
case seqS# x s of _ {
s' -> (# s', () #)
}
Much better!
This also demonstrates why seq x y gives absolutely no guarantees about whether or not x or y will be evaluated first. The optimizer may notice that y always gives an exception, and since imprecise exceptions don’t care which exception is thrown, it may just throw out any reference to x. Egads!
Further reading
- Most of the code that defines IO lives in the
GHC supermodule in base, though the actual IO type is in ghc-prim. GHC.Base and GHC.IO make for particularly good reading. - Primops are described on the GHC Trac.
- The ST monad is also implemented in essentially the exact same way: the unsafe coercion functions just do some type shuffling, and don’t actually change anything. You can read more about it in
GHC.ST.
May 2, 2011It is well known that unsafePerformIO is an evil tool by which impure effects can make their way into otherwise pristine Haskell code. But is the rest of Haskell really that pure? Here are a few questions to ask:
- What is the value of
maxBound :: Int? - What is the value of
\x y -> x / y == (3 / 7 :: Double) with 3 and 7 passed in as arguments? - What is the value of
os :: String from System.Info? - What is the value of
foldr (+) 0 [1..100] :: Int?
The answers to each of these questions are ill-defined—or you might say they’re well defined, but you need a little extra information to figure out what the actual result is.
- The Haskell 98 Report guarantees that the value of
Int is at least -2^29 to 2^29 - 1. But the precise value depends on what implementation of Haskell you’re using (does it need a bit for garbage collection purposes) and whether or not you’re on a 32-bit or 64-bit system. - Depending on whether or not the excess precision of your floating point registers is used to calculate the division, or if the IEEE standard is adhered to, this equality may or may not hold.
- Depending on what operating system the program is run on this value will change.
- Depending on the stack space allotted to this program at runtime, it may return a result or it may stack overflow.
In some respects, these constructs break referential transparency in an interesting way: while their values are guaranteed to be consistent during a single execution of the program, they may vary between different compilations and runtime executions of our program.
Is this kosher? And if it’s not, what are we supposed to say about the semantics of these Haskell programs?
The topic came up on #haskell, and I and a number of participants had a lively discussion about the topic. I’ll try to distill a few of the viewpoints here.
- The mathematical school says that all of this is very unsatisfactory, and that their programming languages should adhere to some precise semantics over all compilations and runtime executions. People ought to use arbitrary-size integers, and if they need modular arithmetic specify explicitly how big the modulus is (
Int32? Int64?) os is an abomination that should have been put in the IO sin bin. As tolkad puts it, “Without a standard you are lost, adrift in a sea of unspecified semantics. Hold fast to the rules of the specification lest you be consumed by ambiguity.” Limitations of the universe we live in are something of an embarrassment to the mathematician, but as long as the program crashes with a nice stack overflow they’re willing to live with a partial correctness result. An interesting subgroup is the distributed systems school which also care about the assumptions that are being made about the computing environment, but for a very practical reason. If multiple copies of your program are running on heterogeneous machines, you better not make any assumptions about pointer size on the wire. - The compile time school says that the mathematical approach is untenable for real world programming: one should program with compilation in mind. They’re willing to put up with a little bit of uncertainty in their source code programs, but all of the ambiguity should be cleared up once the program is compiled. If they’re feeling particularly cavalier, they’ll write their program with several meanings in mind, depending on the compile time options. They’re willing to put up with stack overflows, which are runtime determined, but are also a little uncomfortable with it. It is certainly better than the situation with
os, which could vary from runtime to runtime. The mathematicians make fun of them with examples like, “What about a dynamic linker or virtual machine, where some of the compilation is left off until runtime?” - The run time school says “Sod referential transparency across executions” and only care about internal consistency across a program run. Not only are they OK with stack overflows, they’re also OK with command line arguments setting global (pure!) variables, since those don’t change within the executions (they perhaps think
getArgs should have had the signature [String], not IO [String]), or variables that unsafely read in the contents of an external data file at program startup. They write things in docs like “This integer need not remain consistent from one execution of an application to another execution of the same application.” Everyone else sort of shudders, but it’s a sort of guilty pleasure that most people have indulged in at some point or another.
So, which school are you a member of?
Postscript. Since Rob Harper has recently posted another wonderfully iconoclastic blog post, and because his ending remarks are tangentially related to the topic of this post (purity), I thought I couldn’t help but sneak in a few remarks. Rob Harper states:
So why don’t we do this by default? Because it’s not such a great idea. Yes, I know it sounds wonderful at first, but then you realize that it’s pretty horrible. Once you’re in the IO monad, you’re stuck there forever, and are reduced to Algol-style imperative programming. You cannot easily convert between functional and monadic style without a radical restructuring of code. And you inevitably need unsafePerformIO to get anything serious done. In practical terms, you are deprived of the useful concept of a benign effect, and that just stinks!
I think Harper overstates the inability to write functional-style imperative programs in Haskell (conversions from functional to monadic style, while definitely annoying in practice, are relatively formulaic.) But these practical concerns do influence the day-to-day work of programmers, and as we’ve seen here, purity comes in all sorts of shades of gray. There is design space both upwards and downwards of Haskell’s current situation, but I think to say that purity should be thrown out entirely is missing the point.