April 29, 2011Today, we introduce the Grinch.

A formerly foul and unpleasant character, the Grinch has reformed his ways. He still has a penchant for stealing presents, but these days he does it ethically: he only takes a present if no one cares about it anymore. He is the garbage collector of the Haskell Heap, and he plays a very important role in keeping the Haskell Heap small (and thus our memory usage low)—especially since functional programs generate a lot of garbage. We’re not a particularly eco-friendly bunch.

The Grinch also collects garbage in traditional imperative languages, since the process is fundamentally the same. (We describe copying collection using Cheney’s algorithm here.) The Grinch first asks us what objects in the heap we care about (the roots). He moves these over to a new heap (evacuation), which will contain objects that will be saved. Then he goes over the objects in the new heap one by one, making sure they don’t point to other objects on the old heap. If they do, he moves those presents over too (scavenging). Eventually, he’s checked all of the presents in the new heap, which means everything left-over is garbage, and he drags it away.

But there are differences between the Haskell Heap and a traditional heap.
A traditional heap is one in which all of the presents have been opened: there will only be unwrapped boxes and gift-cards, so the Grinch only needs to check what gifts the gift-cards refer to in order to decide what else to scavenge.

However, the Haskell Heap has unopened presents, and the ghosts that haunt these presents are also pretty touchy when it comes to presents they know about. So the Grinch has to consult with them and scavenge any presents they point to.

How do presents become garbage? In both the Haskell Heap and a traditional heap, a present obviously becomes garbage if we tell the Grinch we don’t care about it anymore (the root set changes). Furthermore, if a gift-card is edited to point to a different present, the present it used to point to might also become unwanted (mutation). But what is distinctive about the Haskell Heap is this: after we open a present (evaluate a thunk), the ghost disappears into the ether, its job done. The Grinch may now be able to garbage collect the presents the ghost previously cared about.

Let’s review the life-cycle of a present on the Haskell Heap, in particular emphasizing the present’s relationship to other presents on the heap. (The phase names are actually used by GHC’s heap profiling.)

Suppose we want to minimize our memory usage by keeping the number of presents in our heap low. There are two ways to do this: we can reduce the number of presents we care about or we can reduce the number of presents we create. The former corresponds to making presents go dead, usually by opening a present and releasing any presents the now absent ghost cared about. The latter corresponds to avoiding function application, usually by not opening presents when unnecessary.
So, which one results in a smaller heap? It depends!

It is not true that only laziness causes space leaks on the heap. Excessive strictness can cause space leaks too. The key to fixing an identified space leak is figuring out which is the case. Nota bene: I’ve said a lot about space leaks, but I haven’t touched on a common space leak that plagues many people: space leaks on the stack. Stick around.
Last time: Functions produce the Haskell Heap
Next time: Bindings and CAFs on the Haskell Heap
Technical notes. The metaphor of the Grinch moving presents from one pile to another is only accurate if we assume copying garbage collections (GHC also has a compacting garbage collector, which operates differently), and some details (notably how the Grinch knows that a present has already been moved to the new heap, and how the Grinch keeps track of how far into the new heap he is) were skipped over. Additionally, the image of the Grinch “dragging off the garbage presents” is a little misleading: we just overwrite the old memory! Also, GHC doesn’t only have one heap: we have a generational garbage collector, which effectively means there are multiple heaps (and the Grinch visits the young heap more frequently than the old heaps.)
Presents and gift cards look exactly the same to a real garbage collector: a gift card is simply a pointer to a constructor info table and some pointer fields, whereas a present (thunk) is simply a pointer to the info table for the executable code and some fields for its closure variables. For Haskell, which treats data as code, they are one and the same. An implementation of anonymous functions in a language without them built-in might manually represent them as a data structure pointing to the static function code and extra space for its arguments. After all, evaluation of a lazy value is just a controlled form of mutation!

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
April 27, 2011We’ve talked about how we open (evaluate) presents (thunks) in the Haskell Heap: we use IO. But where do all of these presents come from? Today we introduce where all these presents come from, the Ghost-o-matic machine (a function in a Haskell program).

Using a function involves three steps.

We can treat the machine as a black box that takes present labels and pops out presents, but you can imagine the inside as having an unlimited supply of identical ghosts and empty present boxes: when you run the machine, it puts a copy of the ghost in the box.

If the ghosts we put into the presents are identical, do they all behave the same way? Yes, but with one caveat: the actions of the ghost are determined by a script (the original source code), but inside the script there are holes that are filled in by the labels you inserted into the machine.

Since there’s not actually anything in the boxes, we can precisely characterize a present by the ghost that haunts it.

A frequent problem that people who use the Ghost-o-matic run into is that they expect it to work the same way as the Strict-o-matic (a function in a traditional, strictly evaluated language.) They don’t even take the same inputs: the Strict-o-matic takes unwrapped, unhaunted (unlifted) objects and gift cards, and outputs other unhaunted presents and gift cards.

But it’s really easy to forget, because the source-level syntax for strict function application and lazy function application are very similar.

This is a point that must be thoroughly emphasized. In fact, in order to emphasize it, I’ve drawn two more pictures to reiterate what the permitted inputs and outputs for a Ghost-o-matic machine are.
Ghost-o-matics take labels of presents, not the actual presents themselves. This importantly means that the Ghost-o-matic doesn’t open any presents: after all, it only has labels, not the actual present. This stands in contrast to a Strict-o-matic machine which takes presents as inputs and opens them: one might call this machine the force function, of type Thunk a -> a. In Haskell, there is no such thing.

The Ghost-o-matic always creates a wrapped present. It will never produce an unwrapped present, even if there is no ghost haunting the present (the function was a constant).

We state previously that there is no force function in Haskell. But the function seq seems to do something very like forcing a thunk. A present haunted by a seq ghost, when opened, will cause two other presents to be opened (even if the first one is unnecessary). It seems like the first argument is forced; and so seq x x might be some reasonable approximation of force in an imperative language. But what happens when we actually open up a present haunted by the seq ghost?

Although the ghost ends up opening the present rather than us, it’s too late for it to do any good: immediately after the ghost opens the present, we would have gone to open it (which it already is). The key observation is that the seq x x ghost only opens the present x when the present seq x x is opened, and immediately after seq x x is opened we have to go open x by means of an indirection. The strictness of the seq ghost is defeated by the fact that it’s put in a present, not to be opened until x is desired.
One interesting observation is that the Strict-o-matic machine does things when its run. It can open presents, fire missiles or do other side effects.

But the Ghost-o-matic machine doesn’t do any of that. It’s completely pure.
To prevent confusion, users of the Strict-o-matic and Ghost-o-matic machines may find it useful to compare the the present creation life-cycle for each of the machines.

The lazy Ghost-o-matic machine is split into two discrete phases: the function application, which doesn’t actually do anything, just creates the present, and the actual opening of the present. The Strict-o-matic does it all in one bang—although it could output a present (that’s what happens when you implement laziness inside a strict language). But in a strict language, you have to do it all yourself.
The Ghost-o-matic is approved for use by both humans and ghosts.

This does mean that opening a haunted present may produces more presents. For example, if the present produces a gift card for presents that don’t already live on the heap.

For a spine-strict data structure, it can produce a lot more presents.

Oh, and one more thing: the Ghost-o-matic makes a great gift for ghosts and family. They can be gift-wrapped in presents too. After all, everything in Haskell is a present.

Technical notes. With optimizations, a function may not necessarily allocate on the heap. The only way to be sure is to check out what optimized Core the program produces. It’s also not actually true that traditional, strict functions don’t exist in Haskell: unboxed primitives can be used to write traditional imperative code. It may look scary, but it’s not much different than writing your program in ML.
I’ve completely ignored partial application, which ought to be the topic of a later post, but I will note that, internally speaking, GHC does try its very best to pass all of the arguments a function wants at application time; if all the arguments are available, it won’t bother creating a partially application (PAP). But these can be thought of modified Ghost-o-matics, whose ghost already has some (but not all) of its arguments. Gifted ghost-o-matics (functions in the heap) can also be viewed this way: but rather than pre-emptively giving the ghost some arguments, the ghost is instead given its free variables (closure).
Last time: Implementing the Haskell Heap in Python, v1
Next time: How the Grinch stole the Haskell Heap

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
April 25, 2011Here is a simple implementation of all of the parts of the Haskell heap we have discussed up until now, ghosts and all.
heap = {} # global
# ---------------------------------------------------------------------#
class Present(object): # Thunk
def __init__(self, ghost):
self.ghost = ghost # Ghost haunting the present
self.opened = False # Evaluated?
self.contents = None
def open(self):
if not self.opened:
# Hey ghost! Give me your present!
self.contents = self.ghost.disturb()
self.opened = True
self.ghost = None
return self.contents
class Ghost(object): # Code and closure
def __init__(self, *args):
self.tags = args # List of names of presents (closure)
def disturb(self):
raise NotImplemented
class Inside(object):
pass
class GiftCard(Inside): # Constructor
def __init__(self, name, *args):
self.name = name # Name of gift card
self.items = args # List of presents on heap you can redeem!
def __str__(self):
return " ".join([self.name] + map(str, self.items))
class Box(Inside): # Boxed, primitive data type
def __init__(self, prim):
self.prim = prim # Like an integer
def __str__(self):
return str(self.prim)
# ---------------------------------------------------------------------#
class IndirectionGhost(Ghost):
def disturb(self):
# Your present is in another castle!
return heap[self.tags[0]].open()
class AddingGhost(Ghost):
def disturb(self):
# Gotta make your gift, be back in a jiffy!
item_1 = heap[self.tags[0]].open()
item_2 = heap[self.tags[1]].open()
result = item_1.prim + item_2.prim
return Box(result)
class UnsafePerformIOGhost(Ghost):
def disturb(self):
print "Fire ze missiles!"
return heap[self.tags[0]].open()
class PSeqGhost(Ghost):
def disturb(self):
heap[self.tags[0]].open() # Result ignored!
return heap[self.tags[1]].open()
class TraceGhost(Ghost):
def disturb(self):
print "Tracing %s" % self.tags[0]
return heap[self.tags[0]].open()
class ExplodingGhost(Ghost):
def disturb(self):
print "Boom!"
raise Exception
# ---------------------------------------------------------------------#
def evaluate(tag):
print "> evaluate %s" % tag
heap[tag].open()
def makeOpenPresent(x):
present = Present(None)
present.opened = True
present.contents = x
return present
# Let's put some presents in the heap (since we can't make presents
# of our own yet.)
heap['bottom'] = Present(ExplodingGhost())
heap['io'] = Present(UnsafePerformIOGhost('just_1'))
heap['just_1'] = makeOpenPresent(GiftCard('Just', 'bottom'))
heap['1'] = makeOpenPresent(Box(1))
heap['2'] = makeOpenPresent(Box(2))
heap['3'] = makeOpenPresent(Box(3))
heap['traced_1']= Present(TraceGhost('1'))
heap['traced_2']= Present(TraceGhost('2'))
heap['traced_x']= Present(TraceGhost('x'))
heap['x'] = Present(AddingGhost('traced_1', '3'))
heap['y'] = Present(PSeqGhost('traced_2', 'x'))
heap['z'] = Present(IndirectionGhost('traced_x'))
print """$ cat src.hs
import Debug.Trace
import System.IO.Unsafe
import Control.Parallel
import Control.Exception
bottom = error "Boom!"
io = unsafePerformIO (putStrLn "Fire ze missiles" >> return (Just 1))
traced_1 = trace "Tracing 1" 1
traced_2 = trace "Tracing 2" 2
traced_x = trace "Tracing x" x
x = traced_1 + 3
y = pseq traced_2 x
z = traced_x
main = do
putStrLn "> evaluate 1"
evaluate 1
putStrLn "> evaluate z"
evaluate z
putStrLn "> evaluate z"
evaluate z
putStrLn "> evaluate y"
evaluate y
putStrLn "> evaluate io"
evaluate io
putStrLn "> evaluate io"
evaluate io
putStrLn "> evaluate bottom"
evaluate bottom
$ runghc src.hs"""
# Evaluating an already opened present doesn't do anything
evaluate('1')
# Evaluating an indirection ghost forces us to evaluate another present
evaluate('z')
# Once a present is opened, it stays opened
evaluate('z')
# Evaluating a pseq ghost may mean extra presents get opened
evaluate('y')
# unsafePerformIO can do anything, but maybe only once..
evaluate('io')
evaluate('io')
# Exploding presents may live in the heap, but they're only dangerous
# if you evaluate them...
evaluate('bottom')
Technical notes. You can already see some resemblance of the ghosts’ Python implementations and the actual Core GHC might spit out. Here’s an sample for pseq:
pseq =
\ (@ a) (@ b) (x :: a) (y :: b) ->
case x of _ { __DEFAULT -> y }
The case operation on x corresponds to opening x, and once it’s open we do an indirection to y (return heap['y'].open()). Here’s another example for the non-polymorphic adding ghost:
add =
\ (bx :: GHC.Types.Int) (by :: GHC.Types.Int) ->
case bx of _ { GHC.Types.I# x ->
case by of _ { GHC.Types.I# y ->
GHC.Types.I# (GHC.Prim.+# x y)
}
}
In this case, Box plays the role of GHC.Types.I#. See if you can come up with some of the other correspondences (What is pattern matching on bx and by? What is GHC.Prim.+# ?)
I might do the next version in C just for kicks (and because then it would actually look something like what a real heap in a Haskell program looks like.)
Last time: IO evaluates the Haskell Heap
Next time: Functions produce the Haskell Heap

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
April 24, 2011In today’s post, we focus on you, the unwitting person rooting around the Haskell heap to open a present. After all, presents in the Haskell heap do not spontaneously unwrap themselves.

Someone has to open the first present.

If the Haskell heap doesn’t interact with the outside world, no presents need to be opened: thus IO functions are the ones that will open presents. What presents they will open is not necessarily obvious for many functions, so we’ll focus on one function that makes it particularly obvious: evaluate. Which tells you to…

…open a present.
If you get a primitive value, you’re done. But, of course, you might get a gift card (constructor):

Will you open the rest of the presents? Despite that deep, dissatisfaction inside you, the answer is no. evaluate only asks you to open one present. If it’s already opened, there’s nothing for you to do.

Advanced tip: If you want to evaluate more things, make a present containing a ghost who will open those things for you! A frequently used example of this when lazy IO is involved was evaluate (length xs), but don’t worry too much if you don’t understand that yet: I haven’t actually said how we make presents yet!
Even though we’re only opening one present, many things can happen, as described in the last post. It could execute some IO…

This is our direct window into evaluation as it evolves: when we run programs normally, we can’t see the presents being opened up; but if we ask the ghost to also shout out when it is disturbed, we get back this information. And in fact, this is precisely what Debug.Trace does!

There are other ways to see what evaluation is going on. A present could blow up: this is the exploding booby-trapped present, also known as “bottom”.

Perhaps the explosion was caused by an undefined or error "Foobar".

Boom.
We’ll end on a practical note. As we’ve mentioned, you can only be sure that a present has been opened if you’ve explicitly asked for it to be opened from IO. Otherwise, ghosts might play tricks on you. After all, you can’t actually see the Haskell heap, so there’s no way to directly tell if a present has been opened or not.

If you’re unsure when a thunk is being evaluated, add a trace statement to it. If ghosts are being lazy behind your back, the trace statement will never show up.

More frequently, however, the trace statement will show up; it’ll just be later than you expect (the ghosts may be lazy, but they’ll eventually get the job done.) So it’s useful to prematurely terminate your program or add extra print statements demarcating various stages of your program.
Last time: Evaluation on the Haskell Heap
Next time: Implementing the Haskell Heap in Python, v1
Technical notes. Contrary to what I’ve said earlier, there’s no theoretical reason why we couldn’t spontaneously evaluate thunks on the heap: this evaluation approach is called speculative evaluation. Somewhat confusingly, IO actions themselves can be thunks as well: this corresponds to passing around values of IO a without actually “running” them. But since I’m not here to talk about monads, I’ll simply ignore the existence of presents that contain IO actions—they work the same way, but you have to keep the levels of indirection straight. And finally, of course infinite loops also count as bottom, but the image of opening one present for the rest of eternity is not as flashy as an exploding present.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
April 20, 2011
The Ghost of the Christmas Present
In today’s post, we’ll do a brief survey of the various things that can happen when you open haunted presents in the Haskell heap. Asides from constants and things that have already been evaluated, mostly everything on the Haskell heap is haunted. The real question is what the ghost haunting the present does.
In the simplest case, almost nothing!

Unlike gift-cards, you have to open the next present (Haskell doesn’t let you evaluate a thunk, and then decide not to follow the indirection…)

More commonly, the ghost was lazy and, when woken up, has to open other presents to figure out what was in your present in the first place!

Simple primitive operations need to open all of the presents involved.

But the ghost may also open another present for no particular reason…

or execute some IO…

Note that any presents he opens may trigger more ghosts:

Resulting in a veritable ghost jamboree, all to open one present!

The fact that opening a present (thunk) can cause such a cascading effect is precisely what makes the timing of lazy evaluation surprising to people who are used to all of the objects in their heap being unwrapped (evaluated) already. So the key to getting rid of this surprise is understanding when a ghost will decide it needs to unwrap a present (strictness analysis) and whether or not your presents are unwrapped already (amortized analysis).
Last time: The Haskell Heap
Next time: IO evaluates the Haskell Heap

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
April 18, 2011
The Haskell heap is a rather strange place. It’s not like the heap of a traditional, strictly evaluated language…

…which contains a lot of junk! (Plain old data.)
In the Haskell heap, every item is wrapped up nicely in a box: the Haskell heap is a heap of presents (thunks).

When you actually want what’s inside the present, you open it up (evaluate it).

Presents tend to have names, and sometimes when you open a present, you get a gift card (data constructor). Gift cards have two traits: they have a name (the Just gift card or the Right gift card), and they tell you where the rest of your presents are. There might be more than one (the tuple gift card), if you’re a lucky duck!

But just as gift cards can lie around unused (that’s how the gift card companies make money!), you don’t have to redeem those presents.
Presents on the Haskell heap are rather mischievous. Some presents explode when you open them, others are haunted by ghosts that open other presents when disturbed.

Understanding what happens when you open a present is key to understanding the time and space behavior of Haskell programs.

In this series, Edward makes a foray into the webcomic world in order to illustrate the key operational concepts of evaluation in a lazily evaluated language. I hope you enjoy it!
Next time: Evaluation on the Haskell Heap
Technical notes. Technically speaking, this series should be “The GHC Heap.” However, I’ll try to avoid as many GHC-isms as possible, and simply offer a metaphor for operationally reasoning about any kind of lazy language. Originally, the series was titled “Bomberman Teaches Lazy Evaluation,” but while I’ve preserved the bomb metaphor for thunks that error or don’t terminate, I like the present metaphor better: it in particular captures several critical aspects of laziness: it captures the evaluated/non-evaluated distinction and the fact that once a present is opened, it’s opened for everyone. The use of the term “boxed” is a little suggestive: indeed, boxed or lifted values in GHC are precisely the ones that can be nonterminating, whereas unboxed values are more akin to what you’d see in C’s heap. However, languages like Java also use the term boxed to refer to primitive values that look like objects. For clarity’s sake, we won’t be using the term boxed from now on (indeed, we won’t mention unboxed types).

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
April 16, 2011What is the Mailbox? It’s a selection of interesting email conversations from my mailbox, which I can use in place of writing original content when I’m feeling lazy. I got the idea from Matt Might, who has a set of wonderful suggestions for low-cost academic blogging.
From: Brent Yorgey
I see you are a contributor to the sup mail client. At least I assume it is you, I doubt there are too many Edward Z. Yangs in the world. =) I’m thinking of switching from mutt. Do you still use sup? Any thoughts/encouragements/cautions to share?
To: Brent Yorgey
Yeah! I still use Sup, and I’ve blogged a little about it in the past, which is still essentially the setup I use. Hmm, some notes:
- I imagine that you want to switch away from Mutt because of some set of annoyances that has finally gotten unbearable. I will warn you that no mail client is perfect; many of my friends used to use Sup and gave up on it along the way. (I hear they use GMail now.) I’ve stuck it out, partially due to inertia, partially because someone else took ezyang@gmail.com, but also partially because I think it’s worth it :-)
- One of the things that has most dramatically changed the way I read email is the distinction between inbox and unread, and not-inbox and unread. In particular, while I have a fairly extensive set of filters that tag mail I get from mailing lists, when I read things on a day to day basis, I check my inbox for important stuff, and then I check my not-inbox “for fun”, mostly not reading most emails (but skimming the subject headers.) This means checking my mail in the morning is about a ten minute deal. This is the deal-maker for me.
- You will almost definitely want to setup OfflineIMAP, because downloading a 80 message thread from the Internet gets old very quickly. But Sup doesn’t propagate back changes to your mailbox this way (in particular, ‘read’ in Sup doesn’t mean ‘read’ in your INBOX) unless you use some experimental code on the maildir-sync branch. I’ve been using it for some time now quite well, but it does make getting other upstream changes a bit of an ordeal.
- Getting Sup setup will take some time; the initial import takes a long time and tweaking also takes a bit of time. Make sure you don’t have any deadlines coming soon. I’ve also found that it’s not really possible to use Sup without dipping your hand into some Ruby hacking (though maybe that’s just me :-); right now I have four hand-crafted patches on top of my source tree, and rebasing to master is not something done likely. I’ve actually gotten into a bit of trouble being so hack happy, but it’s also nice to be able to fix things when you need to (Sup not working is /very/ disruptive). Unfortunately, Ruby is not statically typed. :-)
Hope that helps. Do feel free to shout out if you need some help.
April 13, 2011It is often said that the factorial function is the “Hello World!” of the functional programming language world. Indeed, factorial is a singularly useful way of testing the pattern matching and recursive facilities of FP languages: we don’t bother with such “petty” concerns as input-output. In this blog post, we’re going to trace the compilation of factorial through the bowels of GHC. You’ll learn how to read Core, STG and Cmm, and hopefully get a taste of what is involved in the compilation of functional programs. Those who would like to play along with the GHC sources can check out the description of the compilation of one module on the GHC wiki. We won’t compile with optimizations to keep things simple; perhaps an optimized factorial will be the topic of another post!
The examples in this post were compiled with GHC 6.12.1 on a 32-bit Linux machine.
Haskell
$ cat Factorial.hs
We start in the warm, comfortable land of Haskell:
module Factorial where
fact :: Int -> Int
fact 0 = 1
fact n = n * fact (n - 1)
We don’t bother checking if the input is negative to keep the code simple, and we’ve also specialized this function on Int, so that the resulting code will be a little clearer. But other then that, this is about as standard Factorial as it gets. Stick this in a file called Factorial.hs and you can play along.
Core
$ ghc -c Factorial.hs -ddump-ds
Haskell is a big, complicated language with lots of features. This is important for making it pleasant to code in, but not so good for machine processing. So once we’ve got the majority of user visible error handling finished (typechecking and the like), we desugar Haskell into a small language called Core. At this point, the program is still functional, but it’s a bit wordier than what we originally wrote.
We first see the Core version of our factorial function:
Rec {
Factorial.fact :: GHC.Types.Int -> GHC.Types.Int
LclIdX
[]
Factorial.fact =
\ (ds_dgr :: GHC.Types.Int) ->
let {
n_ade :: GHC.Types.Int
LclId
[]
n_ade = ds_dgr } in
let {
fail_dgt :: GHC.Prim.State# GHC.Prim.RealWorld -> GHC.Types.Int
LclId
[]
fail_dgt =
\ (ds_dgu :: GHC.Prim.State# GHC.Prim.RealWorld) ->
*_agj n_ade (Factorial.fact (-_agi n_ade (GHC.Types.I# 1))) } in
case ds_dgr of wild_B1 { GHC.Types.I# ds_dgs ->
letrec { } in
case ds_dgs of ds_dgs {
__DEFAULT -> fail_dgt GHC.Prim.realWorld#; 0 -> GHC.Types.I# 1
}
}
This may look a bit foreign, so here is the Core re-written in something that has more of a resemblance to Haskell. In particular I’ve elided the binder info (the type signature, LclId and [] that precede every binding), removed some type signatures and reindented:
Factorial.fact =
\ds_dgr ->
let n_ade = ds_dgr in
let fail_dgt = \ds_dgu -> n_ade * Factorial.fact (n_ade - (GHC.Int.I# 1)) in
case ds_dgr of wild_B1 { I# ds_dgs ->
case ds_dgs of ds_dgs {
__DEFAULT -> fail_dgt GHC.Prim.realWorld#
0 -> GHC.Int.I# 1
}
}
It’s still a curious bit of code, so let’s step through it.
- There are no longer
fact n = ... style bindings: instead, everything is converted into a lambda. We introduce anonymous variables prefixed by ds_ for this purpose. - The first let-binding is to establish that our variable
n (with some extra stuff tacked on the end, in case we had defined another n that shadowed the original binding) is indeed the same as ds_dgr. It will get optimized away soon. - Our recursive call to
fact has been mysteriously placed in a lambda with the name fail_dgt. What is the meaning of this? It’s an artifact of the pattern matching we’re doing: if all of our other matches fail (we only have one, for the zero case), we call fail_dgt. The value it accepts is a faux-token GHC.Prim.realWorld#, which you can think of as unit. - We see that our pattern match has been desugared into a case-statement on the unboxed value of
ds_dgr, ds_dgs. We do one case switch to unbox it, and then another case switch to do the pattern match. There is one extra bit of syntax attached to the case statements, a variable to the right of the of keyword, which indicates the evaluated value (in this particular case, no one uses it.) - Finally, we see each of the branches of our recursion, and we see we have to manually construct a boxed integer
GHC.Int.I# 1 for literals.
And then we see a bunch of extra variables and functions, which represent functions and values we implicitly used from Prelude, such as multiplication, subtraction and equality:
$dNum_agq :: GHC.Num.Num GHC.Types.Int
LclId
[]
$dNum_agq = $dNum_agl
*_agj :: GHC.Types.Int -> GHC.Types.Int -> GHC.Types.Int
LclId
[]
*_agj = GHC.Num.* @ GHC.Types.Int $dNum_agq
-_agi :: GHC.Types.Int -> GHC.Types.Int -> GHC.Types.Int
LclId
[]
-_agi = GHC.Num.- @ GHC.Types.Int $dNum_agl
$dNum_agl :: GHC.Num.Num GHC.Types.Int
LclId
[]
$dNum_agl = GHC.Num.$fNumInt
$dEq_agk :: GHC.Classes.Eq GHC.Types.Int
LclId
[]
$dEq_agk = GHC.Num.$p1Num @ GHC.Types.Int $dNum_agl
==_adA :: GHC.Types.Int -> GHC.Types.Int -> GHC.Bool.Bool
LclId
[]
==_adA = GHC.Classes.== @ GHC.Types.Int $dEq_agk
fact_ado :: GHC.Types.Int -> GHC.Types.Int
LclId
[]
fact_ado = Factorial.fact
end Rec }
Since +, * and == are from type classes, we have to lookup the dictionary for each type dNum_agq and dEq_agk, and then use this to get our actual functions *_agj, -_agi and ==_adA, which are what our Core references, not the fully generic versions. If we hadn’t provided the Int -> Int type signature, this would have been a bit different.
Simplified Core
ghc -c Factorial.hs -ddump-simpl
From here, we do a number of optimization passes on the core. Keen readers may have noticed that the unoptimized Core allocated an unnecessary thunk whenever n = 0, the fail_dgt. This inefficiency, among others, is optimized away:
Rec {
Factorial.fact :: GHC.Types.Int -> GHC.Types.Int
GblId
[Arity 1]
Factorial.fact =
\ (ds_dgr :: GHC.Types.Int) ->
case ds_dgr of wild_B1 { GHC.Types.I# ds1_dgs ->
case ds1_dgs of _ {
__DEFAULT ->
GHC.Num.*
@ GHC.Types.Int
GHC.Num.$fNumInt
wild_B1
(Factorial.fact
(GHC.Num.-
@ GHC.Types.Int GHC.Num.$fNumInt wild_B1 (GHC.Types.I# 1)));
0 -> GHC.Types.I# 1
}
}
end Rec }
Now, the very first thing we do upon entry is unbox the input ds_dgr and pattern match on it. All of the dictionary nonsense has been inlined into the __DEFAULT branch, so GHC.Num.* @ GHC.Types.Int GHC.Num.$fNumInt corresponds to multiplication for Int, and GHC.Num.- @ GHC.Types.Int GHC.Num.$fNumInt corresponds to subtraction for Int. Equality is nowhere to be found, because we could just directly pattern match against an unboxed Int.
There are a few things to be said about boxing and unboxing. One important thing to notice is that the case statement on ds_dgr forces this variable: it may have been a thunk, so some (potentially large) amount of code may run before we proceed any further. This is one of the reasons why getting backtraces in Haskell is so hard: we care about where the thunk for ds_dgr was allocated, not where it gets evaluated! But we don’t know that it’s going to error until we evaluate it.
Another important thing to notice is that although we unbox our integer, the result ds1_dgs is not used for anything other than pattern matching. Indeed, whenever we would have used n, we instead use wild_B1, which corresponds to the fully evaluated version of ds_dgr. This is because all of these functions expect boxed arguments, and since we already have a boxed version of the integer lying around, there’s no point in re-boxing the unboxed version.
STG
ghc -c Factorial.hs -ddump-stg
Now we convert Core to the spineless, tagless, G-machine, the very last representation before we generate code that looks more like a traditional imperative program.
Factorial.fact =
\r srt:(0,*bitmap*) [ds_sgx]
case ds_sgx of wild_sgC {
GHC.Types.I# ds1_sgA ->
case ds1_sgA of ds2_sgG {
__DEFAULT ->
let {
sat_sgJ =
\u srt:(0,*bitmap*) []
let {
sat_sgI =
\u srt:(0,*bitmap*) []
let { sat_sgH = NO_CCS GHC.Types.I#! [1];
} in GHC.Num.- GHC.Num.$fNumInt wild_sgC sat_sgH;
} in Factorial.fact sat_sgI;
} in GHC.Num.* GHC.Num.$fNumInt wild_sgC sat_sgJ;
0 -> GHC.Types.I# [1];
};
};
SRT(Factorial.fact): [GHC.Num.$fNumInt, Factorial.fact]
Structurally, STG is very similar to Core, though there’s a lot of extra goop in preparation for the code generation phase:
- All of the variables have been renamed,
- All of the lambdas now have the form
\r srt:(0,*bitmap*) [ds_sgx]. The arguments are in the list at the rightmost side: if there are no arguments this is simply a thunk. The first character after the backslash indicates whether or not the closure is re-entrant (r), updatable (u) or single-entry (s, not used in this example). Updatable closures can be rewritten after evaluation with their results (so closures that take arguments can’t be updateable!) Afterwards, the static reference table is displayed, though there are no interesting static references in our program. NO_CCS is an annotation for profiling that indicates that no cost center stack is attached to this closure. Since we’re not compiling with profiling it’s not very interesting.- Constructor applications take their arguments in square brackets:
GHC.Types.I# [1]. This is not just a stylistic change: in STG, constructors are required to have all of their arguments (e.g. they are saturated). Otherwise, the constructor would be turned into a lambda.
There is also an interesting structural change, where all function applications now take only variables as arguments. In particular, we’ve created a new sat_sgJ thunk to pass to the recursive call of factorial. Because we have not compiled with optimizations, GHC has not noticed that the argument of fact will be immediately evaluated. This will make for some extremely circuitous intermediate code!
Cmm
ghc -c Factorial.hs -ddump-cmm
Cmm (read “C minus minus”) is GHC’s high-level assembly language. It is similar in scope to LLVM, although it looks more like C than assembly. Here the output starts getting large, so we’ll treat it in chunks. The Cmm output contains a number of data sections, which mostly encode the extra annotated information from STG, and the entry points: sgI_entry, sgJ_entry, sgC_ret and Factorial_fact_entry. There are also two extra functions __stginit_Factorial_ and __stginit_Factorial which initialize the module, that we will not address.
Because we have been looking at the STG, we can construct a direct correspondence between these entry points and names from the STG. In brief:
sgI_entry corresponded to the thunk that subtracted 1 from wild_sgC. We’d expect it to setup the call to the function that subtracts Int.sgJ_entry corresponded to the thunk that called Factorial.fact on sat_sgI. We’d expect it to setup the call to Factorial.fact.sgC_ret is a little different, being tagged at the end with ret. This is a return point, which we will return to after we successfully evaluate ds_sgx (i.e. wild_sgC). We’d expect it to check if the result is 0, and either “return” a one (for some definition of “return”) or setup a call to the function that multiplies Int with sgJ_entry and its argument.
Time for some code! Here is sgI_entry:
sgI_entry()
{ has static closure: False update_frame: <none>
type: 0
desc: 0
tag: 17
ptrs: 1
nptrs: 0
srt: (Factorial_fact_srt,0,1)
}
ch0:
if (Sp - 24 < SpLim) goto ch2;
I32[Sp - 4] = R1; // (reordered for clarity)
I32[Sp - 8] = stg_upd_frame_info;
I32[Sp - 12] = stg_INTLIKE_closure+137;
I32[Sp - 16] = I32[R1 + 8];
I32[Sp - 20] = stg_ap_pp_info;
I32[Sp - 24] = base_GHCziNum_zdfNumInt_closure;
Sp = Sp - 24;
jump base_GHCziNum_zm_info ();
ch2: jump stg_gc_enter_1 ();
}
There’s a bit of metadata given at the top of the function, this is a description of the info table that will be stored next to the actual code for this function. You can look at CmmInfoTable in cmm/CmmDecl.hs if you’re interested in what the values mean; most notably the tag 17 corresponds to THUNK_1_0: this is a thunk that has in its environment (the free variables: in this case wild_sgC) a single pointer and no non-pointers.
Without attempting to understand the code, we can see a few interesting things: we are jumping to base_GHCziNum_zm_info, which is a Z-encoded name for base GHC.Num - info: hey, that’s our subtraction function! In that case, a reasonable guess is that the values we are writing to the stack are the arguments for this function. Let’s pull up the STG invocation again: GHC.Num.- GHC.Num.$fNumInt wild_sgC sat_sgH (recall sat_sgH was a constant 1).base_GHCziNum_zdfNumInt_closureis Z-encodedbase GHC.Num $fNumInt, so there is our dictionary function.stg_INTLIKE_closure+137is a rather curious constant, which happens to point to a statically allocated closure representing the number1. Which means at last we haveI32[R1 + 8], which must refer towild_sgC(in factR1is a pointer to this thunk’s closure on the stack.) You may ask, what dostg_ap_pp_infoandstg_upd_frame_infodo, and why isbase_GHCziNum_zdfNumInt_closureat the very bottom of the stack? The key is to realize that in fact, we’re placing three distinct entities on the stack: an argument forbase_GHCziNum_zm_info, astg_ap_pp_infoobject with a closure containingI32[R1 + 8]andstg_INTLIKE_closure+137, and astg_upd_frame_infoobject with a closure containingR1. We’ve delicately setup a Rube Goldberg machine, that when run, will do the following things: 1. Insidebase_GHCziNum_zm_info, consume the argumentbase_GHCziNum_zdfNumInt_closureand figure out what the right subtraction function for this dictionary is, put this function on the stack, and then jump to its return point, the next info table on the stack,stg_ap_pp_info. 2. Insidestg_ap_pp_info, consume the argument thatbase_GHCziNum_zm_infocreated, and apply it with the two argumentsI32[R1 + 8]andstg_INTLIKE_closure+137. (As you might imagine,stg_ap_pp_infois very simple.) 3. The subtraction function runs and does the actual subtraction. It then invokes the next info table on the stackstg_upd_frame_infowith this argument. 4. Because this is an updateable closure (remember theucharacter in STG?), willstg_upd_frame_infothe result of step 3 and use it to overwrite the closure pointed to byR1(the original closure of the thunk) with a new closure that simply contains the new value. It will then invoke the next info table on the stack, which was whatever was on the stack when we enteredsgI_Entry. Phew! And now there’s the minor question ofif (Sp - 24 < SpLim) goto ch2;which checks if we will overflow the stack and bugs out to the garbage collector if so.sgJ_entrydoes something very similar, but this time the continuation chain isFactorial_facttostg_upd_frame_infoto the great beyond. We also need to allocate a new closure on the heap (sgI_info), which will be passed in as an argument:: sgJ_entry() { has static closure: False update_frame: <none> type: 0 desc: 0 tag: 17 ptrs: 1 nptrs: 0 srt: (Factorial_fact_srt,0,3) } ch5: if (Sp - 12 < SpLim) goto ch7; Hp = Hp + 12; if (Hp > HpLim) goto ch7; I32[Sp - 8] = stg_upd_frame_info; I32[Sp - 4] = R1; I32[Hp - 8] = sgI_info; I32[Hp + 0] = I32[R1 + 8]; I32[Sp - 12] = Hp - 8; Sp = Sp - 12; jump Factorial_fact_info (); ch7: HpAlloc = 12; jump stg_gc_enter_1 (); } And finally,sgC_retactually does computation:: sgC_ret() { has static closure: False update_frame: <none> type: 0 desc: 0 tag: 34 stack: [] srt: (Factorial_fact_srt,0,3) } ch9: Hp = Hp + 12; if (Hp > HpLim) goto chb; _sgG::I32 = I32[R1 + 3]; if (_sgG::I32 != 0) goto chd; R1 = stg_INTLIKE_closure+137; Sp = Sp + 4; Hp = Hp - 12; jump (I32[Sp + 0]) (); chb: HpAlloc = 12; jump stg_gc_enter_1 (); chd: I32[Hp - 8] = sgJ_info; I32[Hp + 0] = R1; I32[Sp + 0] = Hp - 8; I32[Sp - 4] = R1; I32[Sp - 8] = stg_ap_pp_info; I32[Sp - 12] = base_GHCziNum_zdfNumInt_closure; Sp = Sp - 12; jump base_GHCziNum_zt_info (); } ...though not very much of it. We grab the result of the case split fromI32[R1 + 3](R1 is a tagged pointer, which is why the offset looks weird.) We then check if its zero, and if it is we shovestg_INTLIKE_closure+137(the literal 1) into our register and jump to our continuation; otherwise we setup our arguments on the stack to do a multiplicationbase_GHCziNum_zt_info. The same dictionary passing dance happens. And that’s it! While we’re here, here is a brief shout-out to “Optimised Cmm”, which is just Cmm but with some minor optimisations applied to it. If you’re *really* interested in the correspondence to the underlying assembly, this is good to look at. :: ghc -c Factorial.hs -ddump-opt-cmm Assembly ------------ :: ghc -c Factorial.hs -ddump-asm Finally, we get to assembly. It’s mostly the same as the Cmm, minus some optimizations, instruction selection and register allocation. In particular, all of the names from Cmm are preserved, which is useful if you’re debugging compiled Haskell code with GDB and don’t feel like wading through assembly: you can peek at the Cmm to get an idea for what the function is doing. Here is one excerpt, which displays some more salient aspects of Haskell on x86-32:: sgK_info: .Lch9: leal -24(%ebp),%eax cmpl 84(%ebx),%eax jb .Lchb movl $stg_upd_frame_info,-8(%ebp) movl %esi,-4(%ebp) movl $stg_INTLIKE_closure+137,-12(%ebp) movl 8(%esi),%eax movl %eax,-16(%ebp) movl $stg_ap_pp_info,-20(%ebp) movl $base_GHCziNum_zdfNumInt_closure,-24(%ebp) addl $-24,%ebp jmp base_GHCziNum_zm_info .Lchb: jmp *-8(%ebx) Some of the registers are pinned to registers we saw in Cmm. The first two lines are the stack check, and we can see that%ebpis always set to the value ofSp.84(%ebx)must be whereSpLim; indeed,%ebxstores a pointer to theBaseRegstructure, where we store various “register-like” data as the program executes (as well as the garbage collection function, see*-8(%ebx)). Afterwards, a lot of code moves values onto the stack, and we can see that%esicorresponds toR1. In fact, once you’ve allocated all of these registers, there aren’t very many general purpose registers to actually do computation in: just%eaxand%edx. Conclusion ------------- That’s it: factorial all the way down to the assembly level! You may be thinking several things: * *Holy crap! The next time I need to enter an obfuscated C contest, I’ll just have GHC generate my code for me.* GHC’s internal operational model is indeed very different from any imperative language you may have seen, but it is very regular, and once you get the hang of it, rather easy to understand. * *Holy crap! I can’t believe that Haskell performs at all!* Remember we didn’t compile with optimizations at all. The same module compiled with-O`` is considerably smarter.
Thanks for reading all the way! Stay tuned for the near future, where I illustrate action on the Haskell heap in comic book format.
April 11, 2011I was supposed to have another post about Hoopl today, but it got scuttled when an example program I had written triggered what I think is a bug in Hoopl (if it’s not a bug, then my mental model of how Hoopl works internally is seriously wrong, and I ought not write about it anyway.) So today’s post will be about the alleged bug Hoopl was a victim of: bugs from using the wrong variable.
The wrong variable, if I’m right, is that of the missing apostrophe:
; let (cha', fbase') = mapFoldWithKey
- (updateFact lat lbls)
+ (updateFact lat lbls')
(cha,fbase) out_facts
Actually, this bug tends to happen a lot in functional code. Here is one bug in the native code generation backend in GHC that I recently fixed with Simon Marlow:
- return (Fixed sz (getRegisterReg use_sse2 reg) nilOL)
+ return (Fixed size (getRegisterReg use_sse2 reg) nilOL)
And a while back, when I was working on abcBridge, I got an infinite loop because of something along the lines of:
cecManVerify :: Gia_Man_t -> Cec_ParCec_t_ -> IO Int
- cecManVerify a b = handleAbcError "Cec_ManVerify" $ cecManVerify a b
+ cecManVerify a b = handleAbcError "Cec_ManVerify" $ cecManVerify' a b
How does one defend against these bugs? There are various choices:
Mutate/shadow the old variable away
This is the classic solution for any imperative programmer: if some value is not going to be used again, overwrite it with the new value. You thus get code like this:
$string = trim($string);
$string = str_replace('/', '_', $string);
$string = ...
You can do something similar in spirit in functional programming languages by reusing the name for a new binding, which shadows the old binding. But this is something of a discouraged practice, as -Wall might suggest:
test.hs:1:24:
Warning: This binding for `a' shadows the existing binding
bound at test.hs:1:11
Eliminate the variable with point-free style
In the particular case that the variable is used in only one place, in this pipeline style, it’s fairly straightforward to eliminate it by writing a pipeline of functions, moving the code to point-free style (the “point” in “point-free” is the name for variable):
let z = clipJ a . clipI b . extendIJ $ getIJ (q ! (i-1) ! (j-1))
But this tends to work less well when an intermediate value needs to be used multiple times. There’s usually a way to arrange it, but “multiple uses” is a fairly good indicator of when pointfree style will become incomprehensible.
View patterns
View patterns are a pretty neat language extension that allow you to avoid having to write code that looks like this:
f x y = let x' = unpack x
in ... -- using only x'
With {-# LANGUAGE ViewPatterns #-}, you can instead write:
f (unpack -> x) y = ... -- use x as if it were x'
Thus avoiding the need to create a temporary name that may be accidentally used, while allowing yourself to use names.
Turn on warnings
It only took a few seconds of staring to see what was wrong with this code:
getRegister (CmmReg reg)
= do use_sse2 <- sse2Enabled
let
sz = cmmTypeSize (cmmRegType reg)
size | not use_sse2 && isFloatSize sz = FF80
| otherwise = sz
--
return (Fixed sz (getRegisterReg use_sse2 reg) nilOL)
That’s right: size is never used in the function body. GHC will warn you about that:
test.hs:1:24: Warning: Defined but not used: `b'
Unfortunately, someone turned it off (glare):
{-# OPTIONS -w #-}
-- The above warning supression flag is a temporary kludge.
-- While working on this module you are encouraged to remove it and fix
-- any warnings in the module. See
-- http://hackage.haskell.org/trac/ghc/wiki/Commentary/CodingStyle#Warnings
-- for details
Use descriptive variable names and types
Haskell programmers tend to have a penchant for short, mathematical style names like f, g, h, x, y, z, when the scope a variable is used isn’t very large. Imperative programmers tend to find this a bit strange and unmaintainable. The reason why this is a maintainable style in Haskell is the static type system: if I’m writing the function compose f g, where f :: a -> b and g :: b -> c, I’m certain not to accidentally use g in the place of f: it will type error! If all of the semantic information about what is in the variable is wrapped up in the type, there doesn’t seem to be much point in reiterating it. Of course, it’s good not to go too far in this direction: the typechecker won’t help you much when there are two variables, both with the type Int. In that case, it’s probably better to use a teensy bit more description. Conversely, if you refine your types so that the two variables have different types again, the possibility of error goes away again.
April 8, 2011Once you’ve determined what dataflow facts you will be collecting, the next step is to write the transfer function that actually performs this analysis for you!

Remember what your dataflow facts mean, and this step should be relatively easy: writing a transfer function usually involves going through every possible statement in your language and thinking about how it changes your state. We’ll walk through the transfer functions for constant propagation and liveness analysis.
Here is the transfer function for liveness analysis (once again, in Live.hs):
liveness :: BwdTransfer Insn Live
liveness = mkBTransfer live
where
live :: Insn e x -> Fact x Live -> Live
live (Label _) f = f
live n@(Assign x _) f = addUses (S.delete x f) n
live n@(Store _ _) f = addUses f n
live n@(Branch l) f = addUses (fact f l) n
live n@(Cond _ tl fl) f = addUses (fact f tl `S.union` fact f fl) n
live n@(Call vs _ _ l) f = addUses (fact f l `S.difference` S.fromList vs) n
live n@(Return _) _ = addUses (fact_bot liveLattice) n
fact :: FactBase (S.Set Var) -> Label -> Live
fact f l = fromMaybe S.empty $ lookupFact l f
addUses :: S.Set Var -> Insn e x -> Live
addUses = fold_EN (fold_EE addVar)
addVar s (Var v) = S.insert v s
addVar s _ = s
live is the meat of our transfer function: it takes an instruction and the current fact, and then modifies the fact in light of that information. Because this is a backwards transfer (BwdTransfer), the Fact x Live passed to live are the dataflow facts after this instruction, and our job is to calculate what the dataflow facts are before the instruction (the facts flow backwards).
If you look closely at this function, there’s something rather curious going on: in the line live (Label _) f = f, we simply pass out f (which ostensibly has type Fact x Live) as the result. How does that work? Well, Fact is actually a type family:
type family Fact x f :: *
type instance Fact C f = FactBase f
type instance Fact O f = f
Look, it’s the O and C phantom types again! If we recall our definition of Insn (in IR.hs):
data Insn e x where
Label :: Label -> Insn C O
Assign :: Var -> Expr -> Insn O O
Store :: Expr -> Expr -> Insn O O
Branch :: Label -> Insn O C
Cond :: Expr -> Label -> Label -> Insn O C
Call :: [Var] -> String -> [Expr] -> Label -> Insn O C
Return :: [Expr] -> Insn O C
That means for any of the instructions that are open on exit (x = O for Label, Assign and Store), our function gets Live, whereas for an instruction that is closed on exit (x = C for Branch, Cond, Call and Return), we get FactBase Live, which is a map of labels to facts (LabelMap Live)—for reasons we will get to in a second.
Because the type of our arguments actually change depending on what instruction we receive, some people (GHC developers among them) prefer to use the long form mkBTransfer3, which takes three functions, one for each shape of node. The rewritten code thus looks like this:
liveness' :: BwdTransfer Insn Live
liveness' = mkBTransfer3 firstLive middleLive lastLive
where
firstLive :: Insn C O -> Live -> Live
firstLive (Label _) f = f
middleLive :: Insn O O -> Live -> Live
middleLive n@(Assign x _) f = addUses (S.delete x f) n
middleLive n@(Store _ _) f = addUses f n
lastLive :: Insn O C -> FactBase Live -> Live
lastLive n@(Branch l) f = addUses (fact f l) n
lastLive n@(Cond _ tl fl) f = addUses (fact f tl `S.union` fact f fl) n
lastLive n@(Call vs _ _ l) f = addUses (fact f l `S.difference` S.fromList vs) n
lastLive n@(Return _) _ = addUses (fact_bot liveLattice) n
(with the same definitions for fact, addUses and addVar).
With this in mind, it should be fairly easy to parse the code for firstLive and middleLive. Labels don’t change the set of live libraries, so our fact f is passed through unchanged. For assignments and stores, any uses of a register in that expression makes that register live (addUses is a utility function that calculates this), but if we assign to a register, we lose its previous value, so it is no longer live. Here is some pseudocode demonstrating:
// a is live
x = a;
// a is not live
foo();
// a is not live
a = 2;
// a is live
y = a;
If you’re curious out the implementation of addUses, the fold_EE and fold_EN functions can be found in OptSupport.hs:
fold_EE :: (a -> Expr -> a) -> a -> Expr -> a
fold_EN :: (a -> Expr -> a) -> a -> Insn e x -> a
fold_EE f z e@(Lit _) = f z e
fold_EE f z e@(Var _) = f z e
fold_EE f z e@(Load addr) = f (f z addr) e
fold_EE f z e@(Binop _ e1 e2) = f (f (f z e2) e1) e
fold_EN _ z (Label _) = z
fold_EN f z (Assign _ e) = f z e
fold_EN f z (Store addr e) = f (f z e) addr
fold_EN _ z (Branch _) = z
fold_EN f z (Cond e _ _) = f z e
fold_EN f z (Call _ _ es _) = foldl f z es
fold_EN f z (Return es) = foldl f z es
The naming convention is as follows: E represents an Expr, while N represents a Node (Insn). The left letter indicates what kind of values are passed to the combining function, while the right letter indicates what is being folded over. So fold_EN folds all Expr in a Node and calls the combining function on it, while fold_EE folds all of the Expr inside an Expr (notice that things like Load and Binop can contain expressions inside themselves!) The effect of fold_EN (fold_EE f), then, is that f will be called on every expression in a node, which is exactly what we want if we’re checking for uses of Var.
We could have also written out the recursion explicitly:
addUses :: S.Set Var -> Insn e x -> Live
addUses s (Assign _ e) = expr s e
addUses s (Store e1 e2) = expr (expr s e1) e2
addUses s (Cond e _ _) = expr s e
addUses s (Call _ _ es _) = foldl expr s es
addUses s (Return es) = foldl expr s es
addUses s _ = s
expr :: S.Set Var -> Expr -> Live
expr s e@(Load e') = addVar (addVar s e) e'
expr s e@(Binop _ e1 e2) = addVar (addVar (addVar s e) e1) e2
expr s e = addVar s e
But as you can see, there’s a lot of junk involved with recursing down the structure, and you might accidentally forget an Expr somewhere, so using a pre-defined fold operator is preferred. Still, if you’re not comfortable with folds over complicated datatypes, writing out the entire thing in full at least once is a good exercise.
The last part to look at is lastLives:
lastLive :: Insn O C -> FactBase Live -> Live
lastLive n@(Branch l) f = addUses (fact f l) n
lastLive n@(Cond _ tl fl) f = addUses (fact f tl `S.union` fact f fl) n
lastLive n@(Call vs _ _ l) f = addUses (fact f l `S.difference` S.fromList vs) n
lastLive n@(Return _) _ = addUses (fact_bot liveLattice) n
There are several questions to ask.
Why does it receive a FactBase Live instead of a Live? This is because, as the end node in a backwards analysis, we may receive facts from multiple locations: each of the possible paths the control flow may go down.
In the case of a Return, there are no further paths, so we use fact_bot liveLattice (no live variables). In the case of Branch and Call, there is only one further path l (the label we’re branching or returning to), so we simply invoke fact f l. And finaly, for Cond there are two paths: tl and fl, so we have to grab the facts for both of them and combine them with what happens to be our join operation on the dataflow lattice.
Why do we still need to call addUses? Because instructions at the end of basic blocks can use variables (Cond may use them in its conditional statement, Return may use them when specifying what it returns, etc.)
What’s with the call to S.difference in Call? Recall that vs is the list of variables that the function call writes its return results to. So we need to remove those variables from the live variable set, since they will get overwritten by this instruction:
f (x, y) {
L100:
goto L101
L101:
if x > 0 then goto L102 else goto L104
L102:
// z is not live here
(z) = f(x-1, y-1) goto L103
L103:
// z is live here
y = y + z
x = x - 1
goto L101
L104:
ret (y)
}
You should already have figured out what fact does: it looks up the set of dataflow facts associated with a label, and returns an empty set (no live variables) if that label isn’t in our map yet.
Once you’ve seen one Hoopl analysis, you’ve seen them all! The transfer function for constant propagation looks very similar:
-- Only interesting semantic choice: values of variables are live across
-- a call site.
-- Note that we don't need a case for x := y, where y holds a constant.
-- We can write the simplest solution and rely on the interleaved optimization.
--------------------------------------------------
-- Analysis: variable equals a literal constant
varHasLit :: FwdTransfer Node ConstFact
varHasLit = mkFTransfer ft
where
ft :: Node e x -> ConstFact -> Fact x ConstFact
ft (Label _) f = f
ft (Assign x (Lit k)) f = Map.insert x (PElem k) f
ft (Assign x _) f = Map.insert x Top f
ft (Store _ _) f = f
ft (Branch l) f = mapSingleton l f
ft (Cond (Var x) tl fl) f
= mkFactBase constLattice
[(tl, Map.insert x (PElem (Bool True)) f),
(fl, Map.insert x (PElem (Bool False)) f)]
ft (Cond _ tl fl) f
= mkFactBase constLattice [(tl, f), (fl, f)]
ft (Call vs _ _ bid) f = mapSingleton bid (foldl toTop f vs)
where toTop f v = Map.insert v Top f
ft (Return _) _ = mapEmpty
The notable difference is that, unlike liveness analysis, constant propagation analysis is a forward analysis FwdTransfer. This also means the type of the function is Node e x -> f -> Fact x f, rather than Node e x -> Fact x f -> f: when the control flow splits, we can give different sets of facts for the possible outgoing labels. This is used to good effect in Cond (Var x), where we know that if we take the first branch the condition variable is true, and vice-versa. The rest is plumbing:
Branch: An unconditional branch doesn’t cause any of our variables to stop being constant. Hoopl will automatically notice if a different path to that label has contradictory facts and convert the mappings to Top as notice, using our lattice’s join function. mapSingleton creates a singleton map from the label l to the fact f.Cond: We need to create a map with two entries, which is can be done conveniently with mkFactBase, where the last argument is a list of labels to maps.Call: A function call is equivalent to assigning lots of unknown variables to all of its return variables, so we set all of them to unknown with toTop.Return: Goes nowhere, so an empty map will do.
Next time, we’ll talk about some of the finer subtleties about transfer functions and join functions, and discuss graph rewriting, and wrap it all up with some use of Hoopl’s debugging facilities to observe how Hoopl rewrites a graph.