ezyang's blog

Anatomy of "You could have invented..."

February 23, 2012

The You could have invented… article follows a particular scheme:

Introduce an easy to understand problem,
Attempt to solve the problem, but get stuck doing it the “obvious” way,
Introduce an easy to understand insight,
Methodically work out the rest of the details, arriving at the final result.

Why does framing the problem this way help?

While the details involved in step 4 result in a structure which is not necessarily obvious (thus giving the impression that the concept is hard to understand), the insight is very easy to understand and the rest is just “monkey-work”. The method of deriving the solution is more compressible than the solution itself, so it is easier to learn.
Picking a very specific, easy-to-understand problem helps ground us in a concrete example, whereas the resulting structure might be too general to get a good intuition off of.

It’s very important that the problem is easy to understand, and the process of “working out the details” is simple. Otherwise, the presentation feels contrived. This method is also inappropriate when the audience is in fact smart enough to just look at the end-result and understand on an intuitive level what is going on. Usually, this is because they have already seen the examples. But for the rest of us, it is a remarkably effective method of pedagogy.

I’ll take this opportunity to dissect two particular examples. The first is Dan Piponi’s canonical article, You Could Have Invented Monads. Here are the four steps:

Suppose we want to debug pure functions, with type signature f :: Float -> Float, so that they can also return a string message what happened, f' :: Float -> (Float, String).
If we want to compose these functions, the obvious solution of threading the state manually is really annoying.
We can abstract this pattern out as a higher-order function.
We can do the same recipe on a number of other examples, and then show that it generalizes. The generalization is a Monad.

The second is article of mine, You could have invented Zippers:

I want to do two things: have access to both the parent and children of nodes in a tree, and do a persistent update of the tree (e.g. without mutation.)
If we do the obvious thing, we have to update all the nodes in the tree.
We only need to flip one pointer, so that it points to the parent.
We can create a new data structure which holds this information (which, quite frankly, is a little ugly), and then show this procedure generalizes. The generalization is a Zipper.

So the next time you try to explain something that seems complicated on the outside, but simple on the inside, give this method a spin! Next time: You could have invented fractional cascading.

Postscript. This is a common way well-written academic papers are structured, though very few of them are titled as such. One noticeable difference, however, is that often the “detail work” is not obvious, or requires some novel technical methods. Sometimes, a researcher comes across a really important technical method, and it diffuses throughout the community, to the point where it is obvious to anyone working in the field. In some respects, this is one thing that characterizes a truly successful paper.

Transcript of "Inventing on Principle"

February 20, 2012

Here is a full transcript to Github of Bret Victor’s “Inventing on Principle”. It was transcribed by me, An Yu and Tal Benisty.

Below is a copy of the transcript which I will endeavor to keep up to date with the Github copy. The original content was licensed under CC-BY.

[[0:07]] So, unlike the previous session, I don’t have any prizes to give out. I’m just going to tell you how to live your life.

[[0:14]] This talk is actually about a way of living your life that most people don’t talk about. As you approach your career, you’ll hear a lot about following your passion, or doing something you love. I’m going to talk about something kind of different. I’m going to talk about following a principle — finding a guiding principle for your work, something you believe is important and necessary and right, and using that to guide what you do.

[[0:46]] There are three parts to this talk. I’m first going to talk about the principle that guides a lot of my work, and try to give you a taste of what comes out of that. And I’m going to talk about some other people that have lived this way; what their principles are, what they believe in. But these are all just examples, to help you think about what you believe in, and how you want to live your life.

[[1:10]] So to begin with me: Ideas are very important to me. I think that bringing ideas into the world is one of the most important things that people do. And I think that great ideas, in the form of great art, stories, inventions, scientific theories, these things take on lives of their own, which give meaning to our lives as people. So, I think a lot about how people create ideas and how ideas grow. And in particular, what sorts of tools create a healthy environment for ideas to grow. Now, I’ve spent a lot of time over the years making creative tools, using creative tools, thinking about them a lot, and here’s something I’ve come to believe: Creators need an immediate connection to what they’re creating. That’s my principle. Creators need an immediate connection to what they create. And what I mean by that is when you’re making something, if you make a change, or you make a decision, you need to see the effect of that immediately. There can’t be a delay, and there can’t be anything hidden. Creators have to be able to see what they’re doing. Now I’m going to show a series of cases where I noticed that that principle was violated and I’ll show you what I did about that, and then I’m going to talk about the larger context in which I do this work.

[[2:32]] So, to begin with, let’s think about coding. Here’s how coding works: you type a bunch of code into a text editor, kind of imagining in your head what each line of code is going to do. And then you compile and run, and something comes out. So in this case, this is just JavaScript, drawing to a Canvas, and it draws this little scene with a tree. But if there’s anything wrong with the scene, or if I go and make changes, or if I have further ideas, I have to go back to the code, and I edit the code, compile and run, see what it looks like. Anything wrong, I go back to the code. Most of my time is spent working in the code, working in a text editor blindly, without an immediate connection to this thing, which is what I’m actually trying to make.

[[3:20]] So I feel that this goes against this principle that I have, that creators need an immediate connection to what they’re making, so I have tried to come up with a coding environment that would be more in line with this principle I have. So what I have here is I’ve got this picture on the side, and the code on the side, and this part draws the sky and this draws the mountains and this draws the tree, and when I make any change to the code, the picture changes immediately. So the code and the picture are always in sync; there is no compile and run. I just change things in the code and I see things change in the picture. And now that we have this immediate connection between the code and the picture, we can start thinking about ways of changing the code other than typing. So for example if this number here is the length of the branches. If I want to control that number, I just point my mouse to it, hold down the control key, and I can dial it up and down. So I can see what it looks like for big branches or small branches, and I can kind of converge on what feels right to me artistically. And this works great on any part of the code, I just point to it, and dial it up and down. Some of these numbers here, I know what they do but it’s kind of surprising to see them do it. [chuckles] And other ones are just completely surprising. [more chuckles]

[[4:48]] So down here, I’ve got this for loop where I’m counting to sixteen, I’m putting sixteen little pink blossoms on every branch. And I can turn it down for less blossoms or turn it up for more. But, look at what I’m doing here: I’m just kind of moving the number up and down around twenty or so: and it has this really interesting shimmering effect; it kind of looks as if the wind was blowing through the tree. And the first time I saw this I immediately started thinking about how I could use this effect for an animation. How would I ever have discovered that if I had to compile and run between every change? So much of art, so much of creation is discovery, and you can’t discover anything if you can’t see what you’re doing.

[[5:33]] So I’ve shown you adjusting the code, let’s add some code. I’d like to put a sun up here in the sky, so I’ll go to the end of the drawSky function, and I’ll want to fill a circle, so I start typing context.fillCircle, and as soon as I start typing, I get this autocomplete list of the different fill methods. So these are the things I can type there: fillCircle, fillRect, fillText. And as I move up and down this autocomplete list, I immediately see what each of them is doing. So, I don’t have to imagine what it would do from the method name. I don’t have to look at the documentation, I just see it, immediately.

[[6:12]] So I want a circle, and I’m going to adjust the x coordinate and the y coordinate, change the radius a bit. That looks about right. Probably it should be yellow, so I’m going to set the fill style, context.fillStyle, same autocomplete as before, choose fillStyle, gives me white by default, and I can change that color code the same way I change any number, I hit the control key, and I get a color palette. So I can choose a nice yellow for my sun. Although, the white was kind of interesting, I thought. I kind of didn’t expect that. But, with white, it now looks like the moon instead, right? Hey look, it’s night time! [chuckles] So having this immediate connection allows ideas to surface and develop in ways that would be impossible before.

[[7:17]] But there’s still a problem here, I think, which is I’ve got this picture, and I’ve got this code over here and I have to maintain the mapping between the two in my head. So I’ve got all these lines of code, and just looking at this line I don’t immediately know what it does. So here’s what I can do. I can hold down the option key, and my cursor changes to a magnifying glass, and now as I roll over each line of code, it’s highlighting in the picture what’s being drawn in that line. So, if I want to know what’s going on in this function, I just roll down the function and see what highlights. So here I’ve got two calls to drawMountain; I don’t know which is which; well, that’s that mountain, and that’s that mountain. And this has to work the other way too; if I see part of the picture, I have to know what code was responsible for drawing it. So I do the same thing; I hold down the option key, and now as I move over each pixel of the picture, you’ll see on the right it’s jumping to the line of code that drew that pixel. So that drew the sky, and that drew the tree, and that drew the blossom. So this is really important for maintaining that mapping, but it’s also really useful just for navigating around. So you know, I want to make the sun a little bit bigger; I jump there, and make it a little bigger. Or I want to bring up the tree a little bit; I jump there, and bring up the tree a little bit; I want to bring up the mountains a little bit, so I jump there, bring up the mountains a little bit; and I can make these changes as quickly as I think of them, and that is so important to the creative process. To be able to try ideas as you think of them. If there is any delay in that feedback loop, between thinking of something and seeing it, and building on it, then there is this whole world of ideas which will just never be. These are thoughts that we can’t think.

[[9:36]] Ideas are very important to me. And the thing about ideas is that ideas start small. Ideas start out tiny, weak and fragile. In order to develop and mature, ideas need an environment where the creator can nurture them. Kind of take care of them, feed them, and shape their growth. And to me, that’s what the principle of immediate connection is all about. And because ideas are so precious to me, when I see this principle violated, when I see ideas stillborn or stunted because their creator couldn’t see what they were doing, I feel that’s wrong. And not wrong in the sense of violating some UI guideline or going against some best practice, but wrong in a deeper sense then that. And I’ll come back to this, but I want to show you another example of following this principle.

[[10:26]] So in this code here, there is no state, there is no persistent state, there is no time, there is no interactivity. And I was thinking about how we would handle those aspects of coding in a way that’s in line with these principles I have: creators need immediate connection. So what I have here is a little platform game. Here is my little guy, he can run around, he can jump, he can die [chuckles]. And the code for him is over here. So this code makes him run around, this makes him jump, this makes him collide with things… and down here, I’ve got some code for this little turtle. And the turtle is not doing much right now because I haven’t finished writing his code, so, I’m just going to do that right now. Say on each tick his x position plus equals his direction times the time interval, one sixtieth of a second times some speed, which, um, I dunno? Could be fast, could be slow, if it’s negative, he walks backwards. [chuckles] And these are all ideas I can use for other enemies but I think turtles are supposed to be slow, so let’s set that speed for our turtle. And up here, I’ve got some code that says, when my guy collides with the turtle, he gets some Y velocity, so he bounces into the air, and the turtle gets stomped. So that looks like that. And the turtle gets up after a bit.

[[12:01]] The problem is, I don’t want the player to be able to get up here yet. I want the player to bounce off the turtle, and go through this little passageway down here. And he’ll have to go around and solve puzzles and whatnot, and then come back and get the star. So, the turtle is too bouncy right now. Now of course I can just turn that down on the code, and now I can try it but now it’s not bouncy enough. So while it’s nice that I can adjust the code while it’s running, without having to stop and recompile and find my place again, I can’t immediately see what I need to see, which is whether or not he can make that jump.

[[12:43]] So here’s what I’m going to do. I’m going to bounce off the turtle, and pause the game. So I pause the game, and now there’s this slider up here, which lets me rewind through time. And now, I can rewind to back before I made the jump, and change the code, make it less bouncy, and now when I move it forward, it’s going to simulate it, using the same input controls, the same keyboard commands recorded as before, but with the new code. [applause]

[[13:20]] This is not good enough. [laughter] I need to be able to see the changes immediately. I need to be able to see immediately whether or not my bounciness is correct. None of this stuff. And if you have a process in time, and you want to see changes immediately, you have to map time to space. So here’s what I’m going to do. I’m going to bounce off my turtle, pause the game, and now hit this button here, which shows my guy’s trail. So now I can see where he’s been. And when I rewind, this trail in front of him is where he is going to be. This is his future. And when I change the code, I change his future. [gasps] So I can find exactly the value I need, so when I hit play, he slips right in there. [applause]

[[14:26]] So, creators need to be able to see what they’re doing. If you’re designing something embedded in time you need to be able to control time. You need to be able to see across time, otherwise you’re designing blind.

[[14:40]] As I was playing with this, I noticed it’s fun to play with gravity. So I can make gravity a little negative and he starts to float up in the air. [laughter] And I can kind of play with that and try to get him to stay there. And you could probably make an entire game around just the mechanic here, it’s gravity manipulation. In fact, I bet I could fiddle with any part of this code and come up with an idea for a game. Even if I just comment out the first statement in the code, now my guy can’t move left - he can only move right. Which sounds kinda silly, but Terry Cavanagh actually made a beautiful game around that concept called “Don’t Look Back”. Terry Cavanagh, he made another really wonderful game which you might have seen, called “VVVVVV”, spelled as the letter v six times. And, the way that game works, is that you can’t jump. Instead, you can only flip upside down, and you fall up instead of falling down. So it kinda works like this. You can walk on the ceiling or you can walk around on the ground. And so you have these levels which kinda look like this, and you kinda walk around… You have to learn how to navigate a terrain like this. And so if you had like a, something like that, you wouldn’t be able to jump over it. You’d have to flip over and flip over; he got an amazing amount of gameplay out of this concept.

[[16:07]] So again, being able to try ideas as you think of them. [pause] This example, and the last one with the tree, these are both very visual programs; we’re able to see our changes just by seeing how the picture changes. So I was thinking about, how we could do more abstract coding that’s more in line with this principle. How can we write a generic algorithm in such a way that we can see what we’re doing. So as an example, let’s take a look at binary search. Super quick refresher on how binary search works: you have an array of values that are in order, and you have a key, which is the value you’re trying to locate within the array. And you keep track of two variables, which are the lower and upper bounds of where you think that value could possibly be; right now, it could be anywhere. And you look right in the middle of that range - if what you find is too small, then the key has to be after that. Look in the middle of the range, if what you find is too big, the key has to be before that. And you kinda keep subdividing your range until you narrow in on the value you’re looking for. And in code, binary search looks like this. And from my perspective, you can’t see anything here. You can’t see anything. I see the word ‘array’, but I don’t actually see an array. And so in order to write code like this, you have to imagine an array in your head, and you essentially have to play computer. You have to simulate in your head what each line of code would do on a computer. And to a large extent, the people that we consider to be skilled software engineers are just those people that are really good at playing computer. But if we’re writing our code on a computer… why are we simulating what a computer would do in our head? Why doesn’t the computer just do it… and show us?

[[18:06]] So. Let’s write binary search. Function “binary search” takes a key and an array. And then over here on this side, it’s saying “Ok, it takes a key and an array, such as what? Give me an example; I need something to work with here.” So, for instance, my array might be ‘a’, ‘b’, ‘c’, ’d’, ’e’, ‘f’. And let’s say for instance we’re looking for the ’d’. So now let’s start coding. The lower bound starts out as zero. Over here it says ’low equals zero’, nothing amazing there. Upper bound starts out at the end of the array, so high equals array length minus one. And over here, it says ‘high equals five’. So I have my abstract formula in the code. Over here, it’s giving me the concrete value corresponding to these example arguments. So I don’t have to maintain this picture in my head; it’s just showing it to me.

[[19:09]] So now I need the index in the middle of the array, so I’m going to take the average of those two. Mid equals low plus high over two, and… well, that’s obviously not right. Two point five is not a valid array index. So I guess I need to round this off. So I’m going to add the floor function and it rounded it down to two. And I caught that bug literally the second I typed it, instead of writing the entire function in twenty unit tests. So now I get the value out of the array… and then I need to subdivide my range, which, so there’s an if statement which I’ll just paste in here. So in this case, the value I found is less than the key, so it’s taking this first branch of the if statement. This is adjusting the lower bound. Of course if the key was smaller, then it would take this branch of the if statement and adjust the upper bound. Or, if the key was ‘c’, then we would’ve just happened to find it on the first shot, and we’d return the index.

[[20:14]] So this is the first iteration of this algorithm. And now what we need to do, is loop. We’ve subdivided the array, we need to keep subdividing until we narrow in on what we’re looking for. So, we need to loop; I will just loop. While 1, do all this. And now what we have are three columns corresponding to three iterations of this loop. So this first column here is exactly what you saw before. Low and high span the entire array, we found a ‘c’, it was too low, so we adjusted our lower bound, and loop up to here. Second iteration, bounds are tighter; we found an ’e’. Adjust the upper bound. Third iteration, loop up to here; low and high are the same. We’ve narrowed it down to a single candidate - it’s indeed the key we’re looking for, and we returned this index. So there’s nothing hidden here; you see exactly what the algorithm is doing at every point. And I can go up to here and try different keys, so I can see how the algorithm behaves for these different input arguments.

[[21:20]] And by looking across this data, I can develop an intuition for how this algorithm works. So I’m trying different keys here, and say I try looking for a ‘g’. And this looks a little different. It’s not actually returning. And the reason for this is, I’m looking for a key which is not actually in the array. And the only way of breaking out of this loop, is by finding the key. So it’s kinda stuck here looping forever. So we can take a look at this and see what went wrong, where’s the algorithm going off the rails. These first few iterations look fine, but this iteration looks weird, because low is greater than high. Our range is completely collapsed. So if we get to this point, then we know the key can’t be found. So I see this faulty condition, and I say, “Oh, that’s not right; low has to be less than or equal to high.” Okay, well, I’ll just put that over as the condition of my while statement. Low, less than equal to high, and then that would break out of the loop, and I would return some signal to say that it couldn’t be found. So here we have three iterations of the loop, couldn’t be found, we return a not found value. So that’s what it might be like to write an algorithm without a blindfold on. [applause]

[[22:45]] So I’ve got this principle, again, that creators need to be able to see what they’re doing. They need this immediate connection with that they’re making. And I’ve tried to show this principle through three coding examples, but that’s just because this is a software engineering conference, I thought I was supposed to talk about programming. But to me, this principle has nothing to do with programming in particular. It has to do with any type of creation. So I’d like to show you a couple more demos, just to show you the breadth of what I have in mind here.

[[23:17]] So, to begin with, let’s take a look at a different branch of engineering. So here I have an electronic circuit that I drew. I’m not quite done drawing it, so let me finish up there. And we’ll put 2. And now we have a working circuit. I mean I assume it’s a working circuit. I don’t actually see anything working here. So this is exactly the same as writing code, that we work in a static representation. But what we actually care about is the data. The values of the variables, so we can’t see that here. Now in a circuit, the variables are the voltages on these different wires. So each of these wires has a voltage that’s changing over time, and we have to be able to see that. Now, if I was building this circuit on a lab bench, building it physically, I could at least take an oscilloscope and kinda poke around and see what’s going on in the different wires, what’s going on here, or here. So at the very least, I should be able to do that. So what I have here, is a plot of the voltage on this wire over time. You can see it’s high, it’s low, high and low, so this is clearly oscillating. If I built this physically, also I would be able to see the circuit doing something. In this case I have these two LED’s up here. These are LED’s, little lights, presumably they’re there for a reason. I can hit Play, and watch it simulate out in real time. So now you can see what the circuit is doing.

[[24:50]] In order to design a circuit like this, you have to understand the voltage on every wire. You have to understand how all the voltages are changing throughout the entire circuit. And just like coding, either the environment shows that to you, or you simulate it in your head. And I have better things to do with my head than simulating what electrons are doing. So what I’m gonna do, I’m gonna spread these out a bit. So same circuit, spread out a little bit, and I’m going to add the voltage at every node. So now you can see every voltage throughout the circuit. And I can even hit Play and watch it all kind of simulate out in real time.

[[25:30]] Although, what I prefer to do, is just move my mouse over it, and I can kind of look in areas that are interesting to me and see what the values are. I can compare any two nodes. So if you look at say the node over here, while I mouse over this one, you see the shadow of the one I’m mousing over is overlaid on that. The shadow of the one I’m mousing over is actually overlaid on all of them. And so I can compare any two nodes just by mousing over one of them and looking at the other one.

[[26:00]] And again, I can immediately see results of my changes. So I’ve got this 70k resistor here. I want to change its value, I just click and drag it, and now I see the waveforms change immediately. And you’ll notice that when I click and drag, it leaves behind the shadow of the waveform before I started dragging, so I can compare. I can immediately see the results of my changes.

[[26:26]] Two golden rules of information design: Show the data, show comparisons. That’s all I’m doing here. But even this isn’t quite good enough. What we’re seeing here are the voltages, but in electronics there are actually two data types. There is voltage and there is current. And what we’re not seeing is the current, flowing through each of these components. And in order to design a circuit, you need to understand both the voltage and the current. You need to understand the interplay between the two. That’s what analog design is.

[[26:51]] So what I’m gonna do is spread these out a little bit more. And now I’m gonna replace each of these components with a plot of the current going throw it over time. So each of these blue boxes represents a component. And you can see which component it is, because it has a little badge in the corner, a little icon, but now you can see everything that’s going on in the circuit. You can see how the current changes, you can see how the voltage and the current changes. There’s nothing hidden, there’s nothing to simulate in your head.

[[27:22]] So what we have here is a different way of representing the circuit. Just in general, you could draw any circuit with these blocks and instead of being made out of little squiggly symbols, it’s made out of data. And I think it’s important to ask: Why do we have these squiggly symbols in the first place? Why do they exist? They exist because they’re easy to draw with pencil on paper. This is not paper. So when you have a new medium, you have to rethink these things. You have to think how can this new medium allow us to have a more immediate connection to what we’re making. How can this new medium allow us to work in such a way that we can see what we’re doing.

[[28:00]] And it’s really the same situation with programming. Our current conception of what a computer program is — a list of textual definitions that you hand to a compiler — that’s derived straight from Fortran and ALGOL in the late ‘50’s. Those languages were designed for punchcards. So you’d type your program on a stack of cards, and hand them to the computer operator (it’s the guy in the bottom picture), and you would come back later. So there was no such thing as interactivity back then. And that assumption is baked into our current notions of what programming is.

[[28:34]] C was designed for teletypes. That’s Ken Thompson and Dennis Ritchie up there. Ritchie made C. And there are no video displays in this picture. Ritchie is basically typing on a fancy typewriter that types back to him. Any time you use a console or a terminal window, you’re emulating a teletype. And even today, people still think of a REPL or an interactive top-level as being interactive programming. Because that’s the best thing you could do on a teletype.

[[29:06]] So I have one more demo I want to show because I want to emphasize that this principle, immediate connection, is not even about engineering, it’s about any type of creation. So I want to move to a different field entirely, so let’s think about animation.

[[29:22]] So I’ve got this painting here, of a tree and a leaf on it and I want to make a little video with the leaf kinda drifting down the tree. And the normal way of doing this in a conventional animation package like Flash, is through keyframes. So you basically say where you want the leaf to be at different points in time, and then you hit Play and see what it looks like. So, I’m gonna say: ok, at frame 20, I’m gonna create a keyframe and the leaf should be there. And at frame 40, create a keyframe and the leaf should be there, and I’m just totally guessing here. I can not see the motion. I can not feel the timing, I’m just throwing things in time and space.

[[30:12]] So I’ve got this leaf at different points in time, and I’m gonna add a tween, which tells Flash to connect the dots. And then I’m gonna hit Play and see what it looks like. And it looks ridiculous, it looks like billiard balls bouncing back and forth.

[[30:32]] And the thing is I kind of know what I want, right? It’s a leaf. I want a leaf drifting down from a tree. And I can even perform that with my hand: leaf drifting down from a tree. But Flash doesn’t know how to listen to my hand. But maybe there’s a new medium that does know something about listening to my hand.

[[30:57]] So what I’m gonna show you here is a little app I made for performing animation. And we’re not really set up to do a live demo off the iPad so I’m just gonna play you a video of me making a video. The way this scene is going to play out is the leaf is gonna kind of drift down from the tree, and the scene is gonna pan over and the rabbit is gonna do something. And two things: one, this is going to move pretty quickly, and second, I’m going to be using both hands at almost all times. So I’ve got these different layers, the background, the mid-ground and the foreground. I’m choosing which layer to move using my left thumb. I’m gonna move my leaf to its position. I’m gonna move my bunny off stage and start time rolling. Now I’m gonna perform the leaf drifting down from the tree. Run it back, check out how that looked. The motion looks pretty good but the leaf kinda needs to rock back and forth. So I’m gonna pull out a rotation controller, run it back, find where the leaf is about to break off, and record the rotation. And I added a little flip there just because it felt right at the moment. It wasn’t even planned. Stop, because I want to pan over. So I’m gonna drag a whole bunch of layers at once, I grab all the layers into a list, I turn down the sensitivity of the background layers so they’ll move slower for a kind of parallax effect. I only want to move horizontally so I pull out a horizontal dragger and check out how it looks. I don’t quite like the parallax so I adjust the sensitivities just a little bit, try it out again, I like that better, so I get ready to go, I run it back to the beginning so I can get back into the rhythm of the piece. The leaf hits, I wait a beat, and I start panning. And I don’t know how many frames I waited, I don’t know how long it was, I went when it felt right.

[[32:50]] So I panned over this winter scene and kind of slowed down to a stop. And then I run it back, because I want to do something with my bunny. Throw away these tools because I’m done with them. And wait until I think my bunny should move and he hops away. And I have got a few different poses for my bunny. So I pull those out. And then I find the point where the bunny is about to take off the ground. Which is right there. I switch his pose and I kind of toggle between the poses as he hops away. And then I run it back because I wanna check out how it looked and I’m just gonna bring that up full screen for you. This is the piece.

[[33:50]] So I made that in 2 minutes, performing with my hands like a musical instrument. Very immediate connection between me and what I was trying to make. [applause]

[[34:08]] One of the inspirations for this tool was an animation that I tried to make several years ago. Not that one but it also began with a leaf drifting down from a tree. And I spent all day in Flash trying to keyframe that leaf. Couldn’t do it. And so that was the end of that. I still have my storyboards. Sometimes I play the music I wrote for the piece. But the piece itself is locked in my head. And so I always think about the millions of pieces that are locked in millions of heads. And not just animation, and not just art, but all kinds of ideas. All kinds of ideas including critically important ideas, world-changing inventions, life-saving scientific discoveries. These are all ideas that must be grown. And without an environment in which they can grow, or their creator can nurture them with this immediate connection, many of these ideas will not emerge. Or they’ll emerge stunted.

[[35:14]] So I have this principle that creators need an immediate connection and all of those demos that I just showed you simply came from me looking around, noticing places where this principle was violated, and trying to fix that. It’s really all I did. I just followed this guiding principle and it guided me to what I had to do.

[[35:40]] But I haven’t said much about the most important part of the story, which is why. Why I have this principle. Why I do this.

[[35:51]] When I see a violation of this principle, I don’t think of that as an opportunity. When I see creators constrained by their tools, their ideas compromised, I don’t say: Oh good, an opportunity to make a product. An opportunity to start a business. Or an opportunity to do research or contribute to a field. I’m not excited by finding a problem to solve. I’m not in this for the joy of making things. Ideas are very precious to me. And when I see ideas dying, it hurts. I see a tragedy. To me it feels like a moral wrong, it feels like an injustice. And if I think there’s anything I can do about it, I feel it’s my responsibility to do so. Not opportunity, but responsibility.

[[36:44]] Now this is just my thing. I’m not asking you to believe in this the way I believe I do. My point here is that these words that I’m using: Injustice, Responsibility, Moral wrong, these aren’t the words we normally hear in a technical field. We do hear these words associated with social causes. So things like censorship, gender discrimination, environmental destruction. We all recognize these things as moral wrongs. Most of us wouldn’t see a civil rights violation and think “Oh good, an opportunity.” I hope not.

[[37:23]] Instead, we’ve been very fortunate to have people throughout history who recognized these social wrongs and saw it as their responsibility to address them. And so there’s this activist lifestyle where these persons dedicate themselves to fighting for a cause that they believe in. And the purpose of this talk is to tell you that this activist lifestyle is not just for social activism. As a technologist, you can recognize a wrong in the world. You can have a vision of what a better world could be. And you can dedicate yourself to fighting for a principle. Social activists typically fight by organizing but you can fight by inventing.

[[38:07]] So now I’d like to tell you about a few other people who have lived this way, starting with Larry Tesler. Larry has done a lot of wonderful things in his life, but the work I’m gonna tell you about he did in the mid ’70s at Xerox PARC. And at the time, there really wasn’t such a thing as personal computers. The notion of personal computing was very young and Larry and his colleagues at PARC felt that they had transformative potential, that personal computing could change how people thought and lived. And I think all of us in this room would agree that they turned out to be right about that.

[[38:43]] But at the time, software interfaces were designed around modes. So, in a text editor for instance, you couldn’t just type and have words appear on the screen like on a typewriter. You would be in command mode and if you wanted to insert text you’d have to press I to go into insert mode then Escape back out to command mode or maybe you’d hit A to go into append mode. Or if you wanted to move text around you’d hit M go to the Move mode and then you’d have to select and you’d be in the mode to select and move things around. And Larry would watch people using the computer — they actually pioneered the concept of software user studies, another thing that he did — but he would watch people using the software and he found that many people even after training and weeks of use, many people would not become comfortable with the computer.

[[39:30]] And he believed that it was these modes that were to blame. That the complexity of modes was a kind of barrier that many people couldn’t cross. And so this kind of represented a threat to this dream of what personal computing could be. So Larry made it his personal mission to eliminate modes from software. And he formed a principle: No person should be trapped in a mode. His slogan that he would go around saying was ‘Don’t mode me in’ and he had it printed on his t-shirt. And this principle informed everything that he did. He thought about it with all the work that he did. And eventually he came up with a text editor called Gypsy, which essentially introduced text editing as we know today. There was an insertion point. And when you typed, words appeared on the screen. To select text, he invented modeless selection with click and drag. So you just click and drag over the text you want to select like using a highlighter — one of the first uses of drag. To move text around, he invented these commands that he called Cut, Copy, Paste. You select and cut. Later on you paste in whenever you’re ready. You’re never trapped in a mode, you never have to switch between modes. When you hit W on the keyboard you get W on the screen. Always.

[[40:48]] And he would watch people using his software and he found that someone who had never seen a computer before — which was most people back then — could be up and running in like half an hour. So this was clearly a transformative change in enabling people to connect with computers. And his ideas about modelessness spread to the rest of the desktop interface which was then being invented at PARC at the same time. And today they’re so ingrained in the computing experience that today we kind of take them for granted.

[[41:20]] Now I said that Larry made the elimination of modes his personal mission. That’s actually his words, and if you think he’s exaggerating, here’s Larry’s license plate for the last 30 years. Nowadays of course Larry has a website, at nomodes.com and he’s on twitter: @nomodes. And so like I said, Larry has done a lot of amazing work in his career but his self identity is clearly associated with this cause.

[[41:46]] And so I’d like to ask: What exactly did Larry do? Like how could we best describe what Larry did? A typical biography might say Larry Tesler invented Cut, Copy, Paste. Which is true, but I think that’s really misleading, because this invention was very different than say, Thomas Edison inventing the phonograph. Edison basically just stumbled over the technology for audio recording and he built it out as a novelty. And he came up with this list of possible applications for his technology but he didn’t have any cultural intent. Whereas what Larry did was entirely a reaction to a particular cultural context.

[[42:41]] So another thing that you might hear is that Larry Tesler solved the problem of modeless text manipulation. Solved the problem. And obviously that’s true, he worked on this problem for a long time, eventually solved it. But I think that’s really misleading too, because this problem that he solved only existed in his own head. Nobody else saw this as a problem. For everybody else modes were just how computers worked. There wasn’t anything wrong with them any more than we think there’s something wrong with having two arms. It just kind of was a fact of life.

[[43:18]] So the first thing that Larry did was that he recognized a wrong that had been unacknowledged in the culture. And the thing is, that’s how many great social changes began as well. So a 150 years ago, Elizabeth Cady Stanton had to stand up and say: women should vote. And everybody else said ‘That’s crazy, what are you talking about’. Today, we recognize gender discrimination as a wrong. Back then, it was part of society, it was invisible. She had to recognize it, and she had to fight it. And to me, that’s a much closer model to what Larry did than the Thomas Edison model of inventing a bunch of random technology so he could patent it.

[[44:01]] Now to be clear I’m not making any judgements about the relative importance or the impact of these two people, I’m just talking about their motivations and their approach. Both of them recognized a cultural wrong, they envisioned a world without that wrong and they dedicated themselves to fighting for a principle. She fought by organizing, he fought by inventing.

[[44:23]] And many other seminal figures in computing had similar motivations. So certainly Doug Engelbart. Doug Engelbart basically invented interactive computing. The concept of putting information on a screen. Navigating through it. Looking at information in different ways. Pointing at things and manipulating them. He came up with all this at a time when real-time interaction with a computer was just almost unheard of. Today he is best known as the inventor of the mouse, but what he really invented is this entirely new way of working with knowledge. His explicit goal from the beginning was to enable mankind to solve the world’s urgent problems. And his vision, he had this vision of what he called knowledge workers using complex powerful information tools to harness their collective intelligence. And he only got into computers because he had a hunch that these new things called computer things could help him realize that vision. Everything that he did was almost single-mindedly driven by pursuing this vision.

[[45:26]] Here’s Alan Kay. Alan Kay ran the lab at Xerox PARC where we got the desktop interface, so things like windows and icons, command menus. He also invented object-oriented programming and lots of other things. His goal, and I quote, was to ‘amplify human reach, and bring new ways of thinking to a faltering civilization that desperately needed it.’ Isn’t that great? His approach was through children. He believed that if children became fluent in thinking in the medium of the computer, meaning if programming was a form of basic literacy like reading and writing, then they’d become adults with new forms of critical thought, and new ways of understanding the world. And we’d have this more enlightened society, similar to the difference that literacy brought to society. And everything that he did, everything he invented, came out of pursuing this vision, this goal with children. And following principles that he adopted from Piaget and Montessori, Jerome Bruner, these people who would study how children think.

[[46:37]] And the figure probably most widely associated with software activism is Richard Stallman. Stallman started the GNU project which today makes up a big chunk of any Linux system. He also started the Free Software Foundation, wrote GCC, GPL, many, many other things. His principle is that software must be free, as in freedom, and he has very precise meaning associated with that statement. He’s always been very clear that software freedom is a matter of moral right and wrong. And he has taken a particularly uncompromising approach in his own life to that.

[[47:10]] All of these tremendously influential people dedicated their lives to fighting for a particular ideal with a very clear sense of right and wrong. Often really fighting against an authority or mainstream that did not recognize their wrong as being wrong. And today, the world is still very far from any of their ideals and so they still see a world in crisis and they keep fighting. They’re always fighting.

[[47:41]] Now I’m not saying you have to live this way. I’m not saying that you should live this way. What I’m saying is that you can. That this lifestyle is an option that’s available to you. And it’s not one you’re gonna hear about much. Your career counselor is not going to come back to you and say you should start a personal crusade. In a social field, they might, but not in technology. Instead the world will try to make you define yourself by a skill.

[[48:08]] That’s why you have a major in college. That’s why you have a job title. You are a software engineer. And you’ll probably specialize to be a database engineer or a front-end engineer, and you’ll be given front-ends and asked to engineer them. And that could be worthwhile and valuable, and if you want to spend your life pursuing excellence and practicing a skill, you can do that. That is the path of a craftsman. That is the most common path. The only other path you really hear about much is the path of the problem solver. So I see entrepreneurship and academic research as kind of two sides of that coin. There is the field. There’s the set of problems in that field, or needs in the market. You go in, you choose one, you work it, you make your contribution there. Maybe later on, you choose another problem, you work it, you make your contribution there. Again, that could be worthwhile and valuable and if that’s what you want to do, then you can take that path.

[[49:04]] But I don’t see Larry Tesler on either of those paths. I wouldn’t say that he was contributing to the field of user experience design because there was no such thing. He didn’t choose some open problem to solve, he came up with some problem that only existed in his own head, and no one else even recognized. And certainly he did not define himself by his craft, he defined himself by his cause. By the principle he fought to uphold. And I’m sure if you look at Wikipedia it will say that he’s a computer scientist or a user experience something but to me that’s like saying Elizabeth Cady Stanton was a community organizer. No, Elizabeth Cady Stanton established the principle of women’s suffrage. That’s who she was. That was the identity she chose and Larry Tesler established the principle of modelessness. He had this vision, and he brought the world to that vision.

[[50:01]] So, you can choose this life. Or maybe it will end up choosing you. It might not happen right away. It can take time to find a principle because finding a principle is essentially a form of self-discovery, that you’re trying to figure out what your life is supposed to be about. What you want to stand for as a person. Took me like a decade. Ten years before any real understanding of my principles solidified. That was my twenties. When I was young I felt I had to live this way but I would get little glimmers of what mattered to me but no big picture. It was really unclear. This was very distressing for me. What I had to do was just do a lot of things. Make many things, make many types of things. Study many things, experience many, many things. And use all these experiences as a way of analyzing myself. Taking all these experiences and saying ‘Does this resonate with me?’. Does this repel me? Do I not care? Building up this corpus of experiences that I felt very strongly about for some reason and trying to make sense of it. Trying to figure out why. What is this secret ingredient to all these experiences that I’m reacting to so strongly.

[[51:16]] Now I think everyone’s different. And all the guys I talked about they have their own origin stories which you can read about. I will just say that confining yourself to practicing a single skill can make it difficult to get that broad range of experience which seems to be so valuable for this sort of work.

[[51:35]] And finally, if you choose to follow a principle, a principle can’t just be any old thing you believe in. You’ll hear a lot of people say they that want to make software easier to use. Or they want to delight their users. Or they want to make things simple. That’s a really big one right now. Everyone wants to make things simple. And those are nice thoughts and maybe kind of give you a direction to go in but they’re too vague to be directly actionable. Larry Tesler likes simplicity. But his principle was this specific nugget of insight: No person should be trapped in a mode. And that is a powerful principle because it gave him a new way of seeing the world. It divided the world in right and wrong in a fairly objective way. So, he could look at somebody selecting text and ask: Is this person in a mode? Yes or no? If yes, he had to do something about that. And likewise, I believe that creators need powerful tools. It’s a nice thought, but it doesn’t really get me anywhere. My principle is that creators need this immediate connection. So I can watch you changing a line of code and I can ask: Did you immediately see the effect of that change? Yes or no? If no, I got to do something about that.

[[52:52]] And again, all those demos that I showed you came out of me doing that, of me following this principle and letting it lead me to exactly what I needed to do. So if you’re guiding a principle and bodies of specific insight, it will guide you. And you will always know if what you’re doing is right.

[[53:19]] There are many ways to live your life. That’s maybe the most important thing you can realize in your life, is that every aspect of your life is a choice. But there are default choices. You can choose to sleepwalk through your life and accept the path that’s been laid out for you. You can choose to accept the world as it is. But you don’t have to. If there is something in the world you feel is a wrong and you have a vision for what a better world could be, you can find your guiding principle. And you can fight for a cause. So after this talk, I’d like you to take a little time and think about what matters to you. What you believe in. And what you might fight for.

[[54:06]] Thank you. [applause]

Travel: Spring 2012 Edition

February 18, 2012

For various reasons (mostly PhD-related) I will be traveling a bit over the next month.

February 29 to March 2 in Princeton, NJ
March 5 to March 7 in Pittsburgh, PA
March 9 to March 12 in Palo Alto, CA

Let me know if you’re any of these areas and want to say hi!

How to build DRM you can trust

February 15, 2012

Abstract. Proof-carrying code can be used to implement a digital-rights management scheme, in the form of a proof-verifying CPU. We describe how this scheme would work and argue that DRM implemented this way is both desirable and superior to trusted (“treacherous”) computing schemes. This scheme permits users to retain control over their own machines, while allowing for specific limitations on software capabilities. The ability to impose these limitations will become especially important when 3D printers and biosynthesis machines become ubiquitous. This essay assumes some technical knowledge, although no background in formal methods is required. (If you know how proof-carrying code works, go away; this essay is not for you.)

~ * ~

It is a truth universally acknowledged that digital-rights management schemes are universally harmful to users. Existing DRM schemes are annoying, counterproductive, buggy and fundamentally ineffective. As implemented, they are nearly indistinguishable from spyware.

I’d like to challenge this assumption.

I have no interest in defending the state of current DRM technology. But I do think there is a way to do it better. My goal is to convince you that proof-based digital-rights management could work; that it has a sound theoretical foundation, that it offers a useful subset of DRM-like functionality, and that it does so in a way that avoids many of the privacy, security and trust problems associated with extant trusted computing platforms. I’d like to describe what this system would look like, and what its implications would be (for example, it does offer some control, but it certainly does not solve the analog hole). Unfortunately, this system doesn’t exist yet; the technology underlying formal methods is still being actively researched and is not yet ready for industry.

Why do I feel compelled to speak up about this “ivory tower vaporware” now? We are currently in the midst of what Cory Doctorow calls “The War on General Purpose Computing”, with bills like SOPA/PIPA being considered by Congress, and technical standards like UEFI being aggressively pushed by large software vendors. I feel that it is critical that we convince industries with a stake in digital rights management to invest in this formal methods research. The tools being pursued right now, namely, trusted (“treacherous”) computing, may enable the effective enforcement of DRM for the first time in human history, but it will come at the cost of general purpose computing as we know it today.

How to build proof-based DRM¶

Because we can’t describe the implications of a system without describing the system itself, the first thing to do is describe how proof-based digital rights management would be implemented. This description will also set the stage for a discussion of some of the issues surrounding such a system; primarily, whether or not this system is possible and whether or not it is desirable.

Proof-based DRM consists of two components. The first component is a proof verifier, which takes a theorem and a proof of that theorem as inputs, and returns a yes/no answer on whether or not the proof is correct. (We’ll discuss in more detail what exactly a “theorem” and a “proof” is in this context soon.) The second component is a set of theorems, which will describe the desired behavior of software that runs on this hardware (the DRM policies). These two components are integrated into the hardware and collectively serve as the gatekeepers for the programs that will run on the CPU. Code that is to be loaded and run on this CPU must first pass through the proof verifier chip; if the proof is in order, the code the user provided may be directly executed, its adherence to some digital rights policy ensured by the force of logic. (Nota bene: in the rest of this essay, we will not consider the issues of trusting the underlying hardware; a deficiency of our essay, but it is a deep and thorny issue, that also affects CPUs in use today.)

The proof verifier is the core of this system. It can be thought of as a “little mathematician”: someone who reviews a proof in order to check that it is correct. He is furnished with a set of assumptions and a goal (the “theorem”), and a series of deductions from the assumptions to the goals (the “proof”). All the verifier needs to do is, for each goal, check that every step logically follows from the previous one. “P, and P implies Q. Therefore, Q!” Proof verifiers are relatively well studied, and there exist multiple implementations, among which include Coq, HOL Light, Isabelle and F*. Usually, these are written in software, but there is also ongoing research on the design of proof verifiers suitable for embedded devices.

Let’s delve into the operation of a proof verifier a little more. The first input of a proof verifier is the theorem which is to be proved. So the very first task placed upon the user of a proof verifier is to state the theorem in a way that a computer can understand. It’s certainly not reasonable to expect the proof verifier to understand a paragraph of English or some LaTeX equations! What we do is write down mathematical logic as a computer language, this is the language we write our theorems in. Take, as an example, the statement of Fermat’s Last Theorem: no three positive integers a, b, and c, could satisfy $a^n + b^n = c^n$, for any $n > 2$. In Coq, this statement could be written as forall (x y z:Z) (n:nat), x^(n+3) + y^(n+3) = z^(n+3) -> x <= 0 \/ y <= 0 \/ z <= 0. Read out loud, it says “for all x, y, z which are integers (Z), and for all n which are natural numbers (nat), if it is the case that $x^{n+3} + y^{n+3} = z^{n+3}$, then either x, y or z is less than or equal to zero.” While the “computer-friendly” version looks a little different from the informal version (we’ve taken the contrapositive of the statement to avoid negations, and we’ve used addition by three to express the fact that the exponent must be greater than two), they are fairly similar.

Unfortunately, saying in similarly precise language what it means for a program to be “memory-safe” (that is, it never dereferences an invalid pointer) is considerably more difficult. Transforming an informal statement into a formal one is something of an art, for which there is no “correct” answer. During this process, you must balance competing desires: the statement should be easy for a human to understand, but it should also be easy to prove in a computer. Even in the case of Fermat’s theorem, we’ve elided some details: what does it actually mean for something to be an integer or a natural number? What is exponentiation? For that matter, what is addition? What does it mean for two integers to be equal? Fortunately, there are conventional answers for these questions; even in the case of more complicated properties like “memory-safety”, there is a reasonable understanding of the general design principles behind writing a theorem like this down.

For proof-based DRM, we need to scale up security properties like “memory-safety” to the properties one might need to enforce in a digital rights management scheme. How do we show that a program never leaks the contents of a hardware-based private key or prove that a program transmits within a limited set of radio frequencies set by the law? The possibilities multiply, as do the risks. As we move from the realm of well-defined concepts to more irregular, fuzzy ones, it becomes more difficult to tell if a formalization does what we want, or if it is merely a set of rules with a loophole. A criminal may abide by the letter of the law, but not the spirit. In a computer system, there is no judge which can make this assessment. So a reasonable question is whether or not there are any properties which we have some hope of formalizing. Fortunately, the answer is yes; we’ll return to this question in the next section, as it will influence what kinds of DRM we can hope to implement.

The second input of a proof verifier is a proof. Now, we’ve claimed that a proof is a long list (actually, it is more like a tree, as the proof of one goal may require you to prove a few subgoals) of logical steps, each of which can easily be checked. Now, if you have ever attempted to read a real mathematics proof, you’ll know that checking if a proof is correct is never quite this simple. Mathematical proofs leave out steps. Like a writer, a mathematician optimizes his proof for his audience. If they have some relevant background knowledge, he will elide information in order to ensure the clarity of the higher-level structure of a proof. You see, the mathematician is not only interested in what is true, but why it is true. We cannot be so facile when it comes to proofs for computer consumption: as a dumb calculating machine, the computer requires every step of the proof to be explicitly spelled out. This is one of the primary challenges of mechanically checked proofs: a human may write out a three line proof for Euclid’s algorithm, but for the computer, you might end up writing a page. For more complicated theorems about computer programs, a verification project can easily get crushed by the sheer amount of code involved. Scaling up automated theorem proving technology is yet another area of active research, with current technology on the level of abstraction as assembly languages are for traditional programming.

However, once we are in possession of a verified, mechanized proof, we have something that a traditional mathematician does not: assurance that the proof is correct, and that the program has the property we demanded of it. (On the contrary, mathematical proofs published in papers can be, and sometimes are, wrong! Though, even more interesting, the theorem still ends up being true anyway.) This is a good situation to be in: by the principle of proof erasure, we’re allowed to ignore the elaborate proofs we constructed and execute the program directly. We can run the program straight on our hardware, without any untrusted DRM-enabled operating system running underneath. We’ll return to this question later, when we compare our scheme to existing “treacherous computing” schemes.

So what have we covered here? We’ve described how a proof verifier works by looking more closely at its two inputs: theorems and proofs, and touched over some of the active research areas involved with scaling this technology for the real world. In the next two sections, I’d like to go in more detail about two particular aspects of this system as they apply to digital rights management: the theorems associated with DRM, and the relationship between this scheme and existed “trusted computing” schemes.

What policies are machine-checkable?¶

The MPAA would quite appreciate it there was a way to make it impossible to copy a digital video. But even the most sophisticated technological scheme cannot work around the fact that I can simply setup a video recorder trained on my screen as I watch the movie: this is the so-called “analog hole”, a fundamental limitation of any copy protection scheme. Proof-based DRM cannot be directly used to eliminate the copying of static materials, such as books, music or movies.

Does this mean that proof-based DRM has no useful applications? The new capability we have gained through this scheme is the ability to select what code will run on the hardware. Any legal ramifications are strictly side effects of this technological enforcement. Furthermore, since general purpose computing devices are ubiquitous, a proof-certifying CPU gains us nothing unless the hardware itself affords something extra. A proof-verifying CPU can be thought of as a specialized appliance like a toaster or a microwave, both of which are only interesting insofar as much as they perform non-computational tasks (such as toast bread or heat food).

But there are certainly many interesting hardware peripherals for which this would be useful. In fact, modern CPUs already have some of the specialized chips which are developing along these lines: the Trusted Platform Module, a specification for a cryptoprocessor that can be used to securely store cryptographic keys, is present in most modern laptops. Intel’s Trusted Execution Technology allows the specification of “curtained” regions of memory, which cannot necessarily be accessed by the operating system. The creation of these features has been driven by the trusted (“treacherous”) computing movement, and these are features that can be used both for good and for evil. In a proof-based DRM world, we can give users far more precise control over this secret data, as the user is permitted to write whatever code they want to manipulate it: they simply need to prove that this code won’t get leaked outside of the module. This is information-flow control analysis, which lets us track the flow of private data. (For secret data which has a low number of bits, such as secret keys, a lot of care needs to be taken to mitigate timing attacks, which can be used to slowly leak data in non-obvious ways.) This private data could even be proprietary code, in the case of using a proof-verifying CPU to assist in the distribution of software. This would be more flexible than current software distribution schemes, which are either “in the cloud” (software-as-a-service), or where a proprietary, all-in-one “application box” must be physically hosted by the customer.

Another important application of proof-based DRM is for auditing; e.g. the guaranteed logging of certain events to external stores. The logging store may be some sort of write-only device, and we guarantee that all of the relevant events processed by our CPU are sent to this device by proving that, whenever the event associated with an auditing requirement is triggered, the associated logging call will also be called. This would be a great boon for electronic voting systems, a class of machines which are particularly plagued by unaccountability.

However, looking even further into the future, I think the most interesting hardware peripherals will not truly be peripherals. Rather, the relationship will be inverted: we will be looking at specialized machinery, which happens to need the power of a general purpose computer. We do not ride in cars and fly in planes: we ride in computers and fly in computers. But just as I would prefer my car not to be hackable, or become infected by a computer virus, I would prefer the computational power of a car computer to be limited. This is precisely what proof-based DRM does: it restricts the set of runnable programs. 3D printers and biosynthesis machines are other examples of “peripherals”, which I suspect governments around the world will have a large interest in regulating.

Coming up with useful theorems that define the very nature of the device in question is a bit more challenging: how do you define the line between legitimate use, or an attempt to jam an aircraft radio, create a bioweapon or forge currency? How do you mathematically specify what it means to be “working car software”, as opposed “car software that will cause accidents”? The key insight here is that while it is impossible to completely encapsulate what it means to “be a radio” or “be a car”, we can create useful partial specifications which are practical. Rather than state “steering works properly”, we can state, “given an appropriate input by the wheel, within a N millisecond latency an appropriate mechanical output will be generated.” Rather than state “GPS works”, we can state, “during the operation of GPS, information recording the current location of the vehicle is not transmitted to the public Internet.”

It is also possible to modularize the specification, so that extremely complicated aspects of operation can be verified by an independent systems, and our proof checker only verifies that the independent system is invoked and acted upon. Here are two concrete examples:

We can require the specific ranges of parameters, ranges which may be mandated by law. In the case of simple parameters, such as radio frequencies, what is permissible could be built into the specification; more complicated enforcement rules may rely on black box chips which can be invoked for a yes-no answer. It’s important to note that while the internal implementation of such a chip would not be visible to the user, it would have limited influence over the behavior of a program (only being called when our program code), nor would they have network access, to “phone home”. The specification may simply stipulate that these subroutines are invoked and their result (success or failure) handled appropriately.
We can require the implementation of steganographic protocols for watermarking and anti-counterfeiting measures. These can rely on an identity that is unknown to the manufacturer of the device, but immutably initialized upon the first usage of the device and can aid law enforcement if the originating device is acquired by a proper search and seizure that does not impinge on fourth amendment rights. Verifying that a code adheres to such a protocol requires formalizing what it means for a steganographic protocol to be correct, and also demonstrating that no other input/output invocations interfere with stenographic output.

It should be clear that the ability to limit what code is run on a device has practical applications. Indeed, it should seem that proof-based DRM is very similar to the trusted (treacherous) computing platform. So, in the next section, I would like to directly describe the differences between these two schemes.

Why is this better?¶

What is the difference between proof-based DRM and current trusted (“treacherous”) computing schemes? Both operate by limiting the code that may be run directly on a piece of hardware.

I think the easiest way to see the difference is to consider how each defines the set of permissible programs that may be run on a computer. In trusted computing, this set is defined to be the set of programs signed by a private key held by some corporation. The corporation has complete control over the set of programs that you are allowed to run. Want to load your own software? No can do: it hasn’t been signed.

In proof-based DRM, the set of allowed programs is larger. It will include the code that the corporation provides, but it will also include any other program that you or the open-source community could write, so long as it provides the appropriate proofs for the digital policies. This means, for example, that there is no reason to choke down proprietary software which may have rootkits installed, may spy on your usage habits, etc. The user retains control. The theorems are public knowledge, and available for anyone to inspect, analyze, and base their own software off of.

What if the user isn’t able to, in practice, load his own code? Given the current difficulties in theorem proving, it is certainly a concern that what may happen in practice is that corporations will generate specifications that are overfitted to their proprietary code: that have such stringent parameters on the code’s operation that it would be effectively impossible for anyone to run anything else. Or, even more innocuously, the amount of effort involved with verifying software will put it out of reach for all but well-funded corporations. Unfortunately, this is a question that cannot be resolved at the moment: we have no data in this area. However, I have reason to be optimistic. One reason is that it has been extremely important for current formal methods work for specifications to be simple; a complicated specification is more likely to have bugs and is harder to prove properties about. Natural selection will take its course here, and if a company is tempted to slip in a “back door” to get their code in, this back door will also be exploitable by open source hackers. Another reason to be optimistic is that we may be able to develop correct-by-construction programming languages, languages which you write in, and when compiled, automatically provide you with the proof for the particular theorems you were looking for.

And of course, there is perhaps one last reason. Through the ages, we’ve seen that open source hackers are highly motivated. There is no reason to believe this will not be the case here.

Conclusion¶

Of course, demonstrating that proof-based digital-rights management is “better” than the rather bad current alternatives doesn’t mean that it is “good.” But, through the course of this essay, I’ve touched upon many the reasons why I think such a scheme could be valuable. It would provide further impetus in the development of proof-carrying code, a technology that is interesting in and of itself. It would provide a sound basis for limiting the functionality of general purpose computers, in circumstances where you really don’t want a general purpose computing device, without requiring rootkits or spyware. The ability to more precisely say what ways you would like your product be used gives producers more flexibility when negotiating in the market (for example, if I can ensure a video game will have limited distribution during its first month, I may be willing to sell it for less). And as general purpose computers gain the ability to influence reality in unprecedented ways, there will be a growing desire for this technology that provides this capability. I think that it is uncontroversial that many, powerful bodies will have an interest in controlling what can be run on certain computing devices. Cory Doctorow has said that “all attempts at controlling PCs will converge on rootkits”; it should be at least worth considering if there is another way.

POPL

January 28, 2012

Last night, I returned from my very first POPL, very exhausted, and very satisfied. It was great putting faces to names, chatting with potential PhD supervisors (both from the US and in the UK), and reveling in the atmosphere.

Highlights from my files:

Tony Hoare, on being awarded the ACM SIGPLAN Programming Languages Achievement Award, even though he has received so many other awards, “…it makes me feel a little guilty. I didn’t ask for it!”
Hoare: “I like mistakes; if I find it’s my fault, I can rectify it. If it’s someone else’s fault, there’s not much you can do.”
Hoare: “Simplicity is not an engineering goal.”
Tony had just described how he retracted an accepted paper because the reasoning on it was intricate, and he didn’t think it presented program proving in a good light. The interviewer complimented him for his principles, saying that if he had a paper accepted which he didn’t think was up to snuff, he probably wouldn’t have the courage to retract it. It was “so brave.” To which Tony replies, “Well, I wish you could!” [laughter] “Unfortunately, the pressure to publish has increased. We feel obliged to publish every year, and the quality of the average paper is not improved.”
One recurring theme: Tony Hoare mentioned that proofs and tests were not rivals, really they were just the same thing… just different. In the “Run your Research” talk, these theme came up again, where this time the emphasis was executable papers (the “run” in “Run your Research”).
“Don’t just support local reasoning, demand it!” (Brute force proofs not allowed!)
On JavaScript: “null is sort of object-y, while undefined is sort of primitive-y.”
The next set come from the invited speaker from the computer networks community. “During 9/11, on average, the Internet was more stable. The reason for this was the NetOps went home.” (on the causes of network outages in practice.)
“Cisco routers have twenty million lines of code: there’s actually an Ada interpreter in there, as well as a Lisp interpreter.”
“It used to be acceptable to go down for 100ms… now we have video games.”
Speaker: “I am the walrus.” (Goo goo g’ joob.)
“I believe we do not reuse theorems. We reuse proof methods, but not the actual theorems. When we write papers, we create very shallow models, and we don’t build on previous work. It’s OK. It’s the design. It doesn’t matter too much. The SML standard was distributed with a bug report, with 100+ mistakes in the original definition. Doesn’t detract from its impact.”
“Put on the algebraic goggles.”
“The Navy couldn’t install it [a system for detecting when classified words were being transmitted on insecure channels], because doing so would be admitting there was a mistake.”
“Move to something really modern, like Scheme!” (It’s like investing one trillion, and moving from the thirteenth century to the fourteenth.)
Strother Moore (co-creator of ACL2): “After retirement, I’ll work more on ACL2, and then I’ll die.”
“What’s the best way to make sure your C program is conforming?” “Write deterministic code.” [laughter]

Modelling IO: MonadIO and beyond

January 24, 2012

The MonadIO problem is, at the surface, a simple one: we would like to take some function signature that contains IO, and replace all instances of IO with some other IO-backed monad m. The MonadIO typeclass itself allows us to transform a value of form IO a to m a (and, by composition, any function with an IO a as the result). This interface is uncontroversial and quite flexible; it’s been in the bootstrap libraries ever since it was created in 2001 (originally in base, though it migrated to transformers later). However, it was soon discovered that when there were many functions with forms like IO a -> IO a, which we wanted to convert into m a -> m a; MonadIO had no provision for handling arguments in the negative position of functions. This was particularly troublesome in the case of exception handling, where these higher-order functions were primitive. Thus, the community began searching for a new type class which captured more of IO.

While the semantics of lift were well understood (by the transformer laws), it wasn’t clear what a more powerful mechanism looked like. So, early attacks at the problem took the approach of picking a few distinguished functions which we wanted, placing them in a typeclass, and manually implementing lifted versions of them. This lead to the development of the already existing MonadError class into a more specialized MonadCatchIO class. However, Anders Kaseorg realized that there was a common pattern to the implementation of the lifted versions of these functions, which he factored out into the MonadMorphIO class. This approach was refined into the MonadPeelIO and MonadTransControlIO typeclasses. However, only MonadError was in the core, and it had failed to achieve widespread acceptance due to some fundamental problems.

I believe it is important and desirable for the community of library writers to converge on one of these type classes, for the primary reason that it is important for them to implement exception handling properly, a task which is impossible to do if you want to export an interface that requires only MonadIO. I fully expected monad-control to be the “winner”, being the end at a long lineage of type classes. However, I think it would be more accurate to describe MonadError and MonadCatchIO as one school of thought, and MonadMorphIO, MOnadPeelIO and MonadTransControlIO as another.

In this blog post, I’d like to examine and contrast these two schools of thought. A type class is an interface: it defines operations that some object supports, as well as laws that this object abides by. The utility in a type class is both in its generality (the ability to support multiple implementations with one interface) as well as its precision (the restriction on permissible implementations by laws, making it easier to reason about code that uses an interface). This is the essential tension: and these two schools have very different conclusions about how it should be resolved.

Modelling exceptions¶

This general technique can be described as picking a few functions to generalize in a type class. Since a type class with less functions is preferable to one with more (for generality reasons), MonadError and MonadCatchIO have a very particular emphasis on exceptions:

class (Monad m) => MonadError e m | m -> e where
  throwError :: e -> m a
  catchError :: m a -> (e -> m a) -> m a

class MonadIO m => MonadCatchIO m where
  catch   :: Exception e => m a -> (e -> m a) -> m a
  block   :: m a -> m a
  unblock :: m a -> m a

Unfortunately, these functions are marred by some problems:

MonadError encapsulates an abstract notion of errors which does not necessarily include asynchronous exceptions. That is to say, catchError undefined h will not necessarily run the exception handler h.
MonadError is inadequate for robust handling of asynchronous exceptions, because it does not contain an interface for mask; this makes it difficult to write bracketing functions robustly.
MonadCatchIO explicitly only handles asynchronous exceptions, which means any pure error handling is not handled by it. This is the “finalizers are sometimes skipped” problem.
MonadCatchIO, via the MonadIO constraint, requires the API to support lifting arbitrary IO actions to the monad (whereas a monad designer may create a restricted IO backed monad, limiting what IO actions the user has access to.)
MonadCatchIO exports the outdated block and unblock function, while modern code should use mask instead.
MonadCatchIO exports an instance for the ContT transformer. However, continuations and exceptions are known to have nontrivial interactions which require extra care to handle properly.

In some sense, MonadError is a non-sequitur, because it isn’t tied to IO at all; perfectly valid instances of it exist for non-IO backed monads as well. MonadCatchIO is closer; the latter three points are not fatal ones could be easily accounted for:

class MonadException m where
  throwM  :: Exception e => e -> m a
  catch   :: Exception e => m a -> (e -> m a) -> m a
  mask    :: ((forall a. m a -> m a) -> m b) -> m b

(With a removal of the ContT instance.) However, the “finalizers are sometimes skipped” problem is a bit more problematic. In effect, it is the fact that there may exist zeros which a given instance of MonadCatchIO may not know about. It has been argued that these zeros are none of MonadCatchIO’s business; one inference you might draw from this is that if you have short-circuiting which you would like to respect finalizers installed using MonadException, it should be implemented using asynchronous exceptions. In other words, ErrorT is a bad idea.

However, there is another perspective you can take: MonadException is not tied just to asynchronous exceptions, but any zero-like value which obeys the same laws that exceptions obey. The semantics of these exceptions are described in the paper Asynchronous Exceptions in Haskell. They specify exactly the interaction of masking, throw and catch, as well as how interrupts can be introduced by other threads. In this view, whether or not this behavior is prescribed by the RTS or by passing pure values around is an implementation detail: as long as an instance is written properly, zeros will be properly handled. This also means that it is no longer acceptable to provide a MonadException instance for ErrorT e, unless we also have an underlying MonadException for the inner monad: we can’t forget about exceptions on the lower layers!

There is one last problem with this approach: once the primitives have been selected, huge swaths of the standard library have to be redefined by “copy pasting” their definitions but having them refer to the generalized versions. This is a significant practical hurdle for implementing a library based on this principle: it’s simply not enough to tack a liftIO to the beginning of a function!

I think an emphasis on the semantics of the defined type class will be critical for the future of this lineage of typeclasses; this is an emphasis that hasn’t really existed in the past. From this perspective, we define with our typeclass not only a way to access otherwise inaccessible functions in IO, but also how these functions should behave. We are, in effect, modeling a subset of IO. I think Conal Elliott would be proud.

There is a lively debate going on about extensions to the original semantics of asynchronous exceptions, allowing for the notion of “recoverable” and “unrecoverable” errors. (It’s nearer to the end of the thread.)

Threading pure effects¶

This technique can be described as generalizing the a common implementation technique which was used to implement many of the original functions in MonadCatchIO. These are a rather odd set of signatures:

class Monad m => MonadMorphIO m where
  morphIO :: (forall b. (m a -> IO b) -> IO b) -> m a

class MonadIO m => MonadPeelIO m where
  peelIO :: m (m a -> IO (m a))

class MonadBase b m => MonadBaseControl b m | m -> b where
  data StM m :: * -> *
  liftBaseWith :: (RunInBase m b -> b a) -> m a
  restoreM :: StM m a → m a
type RunInBase m b = forall a. m a -> b (StM m a)

The key intuition behind these typeclasses is that they utilize polymorphism in the IO function that is being lifted in order to thread the pure effects of the monad stack on top of IO. You can see this as the universal quantification in morphIO, the return type of peelIO (which is IO (m a), not IO a), and the StM associated type in MonadBaseControl. For example, Int -> StateT s IO a, is equivalent to the type Int -> s -> IO (s, a). We can partially apply this function with the current state to get Int -> IO (s, a); it should be clear then that as long as the IO function we’re lifting lets us smuggle out arbitrary values, we can smuggle out our updated state and reincorporate it when the lifted function finishes. The set of functions which are amenable to this technique are precisely the ones for which this threaded is possible.

As I described in this post, this means that you won’t be able to get any transformer stack effects if they aren’t returned by the function. So perhaps a better word for MonadBaseControl is not that it is unsound (although it does admit strange behavior) but that it is incomplete: it cannot lift all IO functions to a form where the base monad effects and the transformer effects always occur in lockstep.

This has some interesting implications. For example, this forgetfulness is in fact precisely the reason why a lifted bracketing function will always run no matter if there are other zeros: finally by definition is only aware of asynchronous exceptions. This makes monad-control lifted functions very explicitly only handling asynchronous exceptions: a lifted catch function will not catch an ErrorT zero. However, if you manually implement finally using lifted versions of the more primitive functions, finalizers may be dropped.

It also suggests an alternate implementation strategy for monad-control: rather than thread the state through the return type of a function, it could instead be embedded in a hidden IORef, and read out at the end of the computation. In effect, we would like to embed the semantics of the pure monad transformer stack inside IO. Some care must be taken in the forkIO case, however: the IORefs need to also be duplicated appropriately, in order to maintain thread locality, or MVars used instead, in order to allow coherent non-local communication.

It is well known that MonadBaseControl does not admit a reasonable instance for ContT. Mikhail Vorozhtsov has argued that this is too restrictive. The difficulty is that while unbounded continuations do not play nice with exceptions, limited use of continuation passing style can be combined with exceptions in a sensible way. Unfortunately, monad-control makes no provision for this case: the function it asks a user to implement is too powerful. It seems the typeclasses explicitly modeling a subset of IO are, in some sense, more general! It also highlights the fact that these type classes are first and foremost driven by an abstraction of a common implementation pattern, rather than any sort of semantics.

Conclusion¶

I hope this essay has made clear why I think of MonadBaseControl as an implementation strategy, and not as a reasonable interface to program against. MonadException is a more reasonable interface, which has a semantics, but faces significant implementation hurdles.

monad-control is tricky

January 23, 2012

Editor’s note. I’ve toned down some of the rhetoric in this post. The original title was “monad-control is unsound”.

MonadBaseControl and MonadTransControl, from the monad-control package, specify an appealing way to automatically lift functions in IO that take “callbacks” to arbitrary monad stacks based on IO. Their appeal comes from the fact that they seem to offer a more general mechanism than the alternative: picking some functions, lifting them, and then manually reimplementing generic versions of all the functions built on top of them.

Unfortunately, monad-control has rather surprising behavior for many functions you might lift.

For example, it doesn’t work on functions which invoke the callback multiple times:

{-# LANGUAGE FlexibleContexts #-}

import Control.Monad.Trans.Control
import Control.Monad.State

double :: IO a -> IO a
double m = m >> m

doubleG :: MonadBaseControl IO m => m a -> m a
doubleG = liftBaseOp_ double

incState :: MonadState Int m => m ()
incState = get >>= \x -> put (x + 1)

main = execStateT (doubleG (incState)) 0 >>= print

The result is 1, rather than 2 that we would expect. If you are unconvinced, suppose that the signature of double was Identity a -> Identity a, e.g. a -> a. There is only one possible implementation of this signature: id. It should be obvious what happens, in this case.

If you look closely at the types involved in MonadBaseControl, the reason behind this should become obvious: we rely on the polymorphism of a function we would like to lift in order to pass StM m around, which is the encapsulated “state” of the monad transformers. If this return value is discarded by IO, as it is in our function double, there is no way to recover that state. (This is even alluded to in the liftBaseDiscard function!)

My conclusion is that, while monad-control may be a convenient implementation mechanism for lifted versions of functions, the functions it exports suffer from serious semantic incoherency. End-users, take heed!

Postscript. A similar injunction holds for the previous versions of MonadBaseControl/MonadTransControl, which went by the names MonadPeel and MonadMorphIO.

Mystery Hunt and the Scientific Endeavour

January 16, 2012

It can be hard to understand the appeal of spending three days, without sleep, solving what some have called “the hardest recreational puzzles in the world,”; but over this weekend, hundreds of people converged on the MIT campus to do just that, as part of MIT Mystery Hunt. To celebrate the finding of the coin, I’d like to share this little essay that I found in my files, which compares Mystery Hunt and the scientific endeavour. (If you are not familiar with Mystery Hunt, I recommend listening to the linked This American Life program.)

Thomas Kuhn, in his famous book The Structure of Scientific Revolutions, states that “normal science” is “puzzle solving”: what he means is that the every day endeavors of scientists are the addressing of small, tractable problems, these problems are “puzzles” not “grand mysteries of the universe.” Kuhn goes on to describe what is involved with normal science: generation of facts, increasing the fit between theory and observation, and paradigm articulation. We will see that, as one might expect, these activities are part of “normal” puzzle solving. But (perhaps more unexpectedly) Popperian falsification and Kuhnian revolutions also have something to say about this situation. There are limits to the analogy of puzzles to science, the perhaps most striking of which is that a puzzle has a single, definite solution. But because it is not possible to call up the puzzle writers midway through a puzzle and ask them, “Am I on the right track?” (you are only allowed to phone in the final answer) the intermediate steps still are somewhat informative of the practice of science. In this context, I argue that Popper assumes a microscopic view of scientific progress, whereas Kuhn assumes a macroscopic view of scientific progress.

What is a Mystery Hunt puzzle? This is a question that, at first glance, may seem to defy answering: puzzles can vary from a crossword to an album of images of birds to a single audio file of seemingly random electronic pitches. The answer is always a phrase, perhaps “KEPLERS THIRD LAW” or “BOOTS,” but a puzzle, like a scientific problem, doesn’t come with instructions for how to solve it. Thus, every Mystery Hunt puzzle starts off in Kuhn’s pre-science stage: without any theory about how the puzzle works, puzzlers roll up their sleeves and start collecting miscellaneous facts that may be relevant to the puzzle at hand. If there are pictures of birds, one starts off identifying what the birds are; if there are short video clips of people waving flags, one starts off decoding the semaphore messages. This is a stage that doesn’t require very much insight: there is an obvious thing to do. Some of the information collected may be irrelevant (just as the Linnaean classification of species was broadly modified in light of modern information about observable characteristics of plants and animals), but all-in-all this information forms a useful basis for theory formation. But while Popper doesn’t have much to say about data collection, Kuhn’s concept of the theory-ladenness of observation is important. The theory-ladenness of observation states that it is impossible to make an observation without some preconceptions and theories about how the world works. For example, a list of images of birds may suggest that each bird needs to be identified, but in the process of this identification it may be realized that the images corresponded directly to watercolor engravings from Audubon’s birds of America (in which case the new question is, which plates?) Even during pre-science, small theories are continually being invented and falsified.

Once the data has been collected, a theory about how all the data is to be put together must be formed: this step is called “answer extraction”. In regular science, forming the right theory is something that can take many, many years; for a puzzle, the process is accelerated by a collection of pre-existing theories that an experienced puzzler may attempt (e.g. each item has a numbering which corresponds to a letter) as well as hints that a puzzle writer may place in a puzzle’s flavor text and title (e.g. “Song of birds” refers to “tweeting” refers to “Twitter”, which means the team should take an extracted phrase to mean a Twitter account.) Naïve inductivism suggests that unprejudiced observation of the data should lead to a theory which explains all of the information present. However, puzzles are specifically constructed to resist this sort of straightforward observation: instead, what more frequently happens is akin to Popper’s falsificationism, where theories are invented out of sheer creativity (or perhaps historical knowledge of previous puzzles) and then they are tested against the puzzle. To refer once again to the bird puzzle, one proposed theory may be that the first letter of the scientific names of the birds spells a word in the English language. When the scientific names are gathered and the first letters found not to form a word, the theory is falsified, and we go and look for something else to do. This makes the Popperian view highly individualistic: while only some people may come up with the “good ideas”, anyone can falsify a theory by going out and doing the necessary calculation. Sophisticated falsification allows for the fact that someone might go out and do the calculation incorrectly (thus sending the rest of the puzzlers on a wild goose hunt until someone comes back to the original idea and realizes it actually does work.) However, it only explains the scientific endeavor at very high resolution: it explains the process for a single theory; and we’ll see that Kuhn’s paradigms help broaden our perspective on the overall puzzle solving endeavor.

Kuhn states that normal science organizes itself around paradigms, which are characterized by some fundamental laws (Maxwell’s equations, Newton’s laws) as well conventions for how these laws are to be used to solve problems. Unlike a theory, a paradigm cannot be “falsified”, at least in the Popperian sense: a paradigm can accommodate abnormalities, which may resolve themselves after further investigation. The difference is one of scope. So, in a puzzle, while we might have a straightforwardly falsifiable theory “the first letters form a word,” a more complicated, thematic idea such as “the puzzle is Dr. Who themed” is closer to a paradigm, in that the precise mechanism by which you get answer from “Dr. Who” is unspecified, to be resolved by “normal puzzling.” The paradigm is vague, but it has “the right idea”, and with sufficient effort, the details can be worked out. Which paradigm is to be worked on is a social process: if a puzzler comes in freshly to a puzzle and finds that a group of people is already working within one paradigm, he is more likely to follow along those lines. Of course, if this group is stuck, they may call in someone from the outside precisely to think outside of the paradigm. In this manner, revolutions, as Kuhn describes them, occur. When a paradigm is failing to make progress (e.g. failing to admit an answer), the puzzlers will continue to attempt to apply it until a convincing alternative paradigm comes along, at which point they may all jump to this new method. The revolution is compressed, but it still carries many common traits: including the lone puzzler who still thinks the old method will admit an answer with “just a little more work.” (Sometimes they’re right!)

We see a striking correspondence between the activities of a scientist working in a lab, and the activities of a scientist working on a Mystery Hunt puzzle. If you’ll allow me to indulge in a little psychoanalysis, I think this similarity is part of the reason why Mystery Hunt is so appealing to people with a science, technology, engineering and maths background: in your every day life, you are faced with the most vicious puzzles known to man, as they are by definition the ones that have resisted any attempt to resolution. Months can go by without any progress. In Mystery Hunt, you are once again faced with some of the most vicious puzzles known to man. But there’s a difference. You see, in Mystery Hunt, there is an answer. And you can find it.

Happy puzzling!

Problem Set: The Codensity Transformation

January 7, 2012

Have you ever wondered how the codensity transformation, a surprisingly general trick for speeding up the execution of certain types of monads, worked, but never could understand the paper or Edward Kmett’s blog posts on the subject?

Look no further: below is a problem set for learning how this transformation works.

The idea behind these exercises is to get you comfortable with the types involved in the codensity transformation, achieved by using the types to guide yourself to the only possible implementation. We warm up with the classic concrete instance for leafy trees, and then generalize over all free monads (don’t worry if you don’t know what that is: we’ll define it and give some warmup exercises).

Experience writing in continuation-passing style may be useful, although in practice this amounts to “listen to the types!”

Solutions and more commentary may be found in Janis Voigtlander’s paper “`Asymptotic Improvement of Computations over Free Monads. <http://www.iai.uni-bonn.de/~jv/mpc08.pdf>`_”

To read more, see Edward Kmett’s excellent article which further generalizes this concept:

If there is a demand, I can add a hints section for the exercises.

{-# LANGUAGE Rank2Types, MultiParamTypeClasses, FlexibleInstances #-}
import Prelude hiding (abs)

_EXERCISE_ = undefined

-----------------------------------------------------------------------------
-- Warmup: Hughes lists
-----------------------------------------------------------------------------

-- Experienced Haskellers should feel free to skip this section.

-- We first consider the problem of left-associative list append.  In
-- order to see the difficulty, we will hand-evaluate a lazy language.
-- For the sake of being as mechanical as possible, here are the
-- operational semantics, where e1, e2 are expressions and x is a
-- variable, and e1[e2/x] is replace all instances of x in e1 with e2.
--
--        e1 ==> e1'
--   ---------------------
--     e1 e2 ==> e1' e2
--
--   (\x -> e1[x]) e2 ==> e1[e2/x]
--
-- For reference, the definition of append is as follows:
--
--      a ++ b = foldr (:) b a
--
-- Assume that, on forcing a saturated foldr, its third argument is
-- forced, as follows:
--
--                e1 ==> e1'
--    -----------------------------------
--      foldr f e2 e1 ==> foldr f e2 e1'
--
--  foldr f e2 (x:xs) ==> f x (foldr f e2 xs)
--
-- Hand evaluate this implementation by forcing the head constructor,
-- assuming 'as' is not null:

listsample as bs cs = (as ++ bs) ++ cs

-- Solution:
--
--        (as ++ bs) ++ cs
--      = foldr (:) cs (as ++ bs)
--      = foldr (:) cs (foldr (:) bs as)
--      = foldr (:) cs (foldr (:) bs (a:as'))
--      = foldr (:) cs (a : foldr (:) b as')
--      = a : foldr (:) cs (foldr (:) bs as')
--
-- Convince yourself that this takes linear time per append, and that
-- processing each element of the resulting tail of the list will also
-- take linear time.

-- We now present Hughes lists:

type Hughes a = [a] -> [a]

listrep :: Hughes a -> [a]
listrep = _EXERCISE_

append :: Hughes a -> Hughes a -> Hughes a
append = _EXERCISE_

-- Now, hand evaluate your implementation on this sample, assuming all
-- arguments are saturated.

listsample' a b c = listrep (append (append a b) c)

-- Solution:
--
--        listrep (append (append a b) c)
--      = (\l -> l []) (append (append a b) c)
--      = (append (append a b) c) []
--      = (\z -> (append a b) (c z)) []
--      = (append a b) (c [])
--      = (\z -> a (b z)) (c [])
--      = a (b (c []))
--
-- Convince yourself that the result requires only constant time per
-- element, assuming a, b and c are of the form (\z -> a1:a2:...:z).
-- Notice the left-associativity has been converted into
-- right-associative function application.

-- The codensity transformation operates on similar principles.  This
-- ends the warmup.

-----------------------------------------------------------------------------
-- Case for leafy trees
-----------------------------------------------------------------------------

-- Some simple definitions of trees

data Tree a = Leaf a | Node (Tree a) (Tree a)

-- Here is the obvious monad definition for trees, where each leaf
-- is substituted with a new tree.

instance Monad Tree where
    return = Leaf
    Leaf a   >>= f = f a
    Node l r >>= f = Node (l >>= f) (r >>= f)

-- You should convince yourself of the performance problem with this
-- code by considering what happens if you force it to normal form.

sample = (Leaf 0 >>= f) >>= f
    where f n = Node (Leaf (n + 1)) (Leaf (n + 1))

-- Let's fix this problem.  Now abstract over the /leaves/ of the tree

newtype CTree a = CTree { unCTree :: forall r. (a -> Tree r) -> Tree r }

-- Please write functions which witness the isomorphism between the
-- abstract and concrete versions of trees.

treerep :: Tree a -> CTree a
treerep = _EXERCISE_

treeabs :: CTree a -> Tree a
treeabs = _EXERCISE_

-- How do you construct a node in the case of the abstract version?
-- It is trivial for concrete trees.

class Monad m => TreeLike m where
    node :: m a -> m a -> m a
    leaf :: a -> m a
    leaf = return

instance TreeLike Tree where
    node = Node

instance TreeLike CTree where
    node = _EXERCISE_

-- As they are isomorphic, the monad instance carries over too.  Don't
-- use rep/abs in your implementation.

instance Monad CTree where
    return = _EXERCISE_
    (>>=)  = _EXERCISE_ -- try explicitly writing out the types of the arguments

-- We now gain efficiency by operating on the /abstracted/ version as
-- opposed to the ordinary one.

treeimprove :: (forall m. TreeLike m => m a) -> Tree a
treeimprove m = treeabs m

-- You should convince yourself of the efficiency of this code.
-- Remember that expressions inside lambda abstraction don't evaluate
-- until the lambda is applied.

sample' = treeabs ((leaf 0 >>= f) >>= f)
    where f n = node (leaf (n + 1)) (leaf (n + 1))

-----------------------------------------------------------------------------
-- General case
-----------------------------------------------------------------------------

-- Basic properties about free monads

data Free f a = Return a | Wrap (f (Free f a))
instance Functor f => Monad (Free f) where
    return = _EXERCISE_
    (>>=)  = _EXERCISE_ -- tricky!

-- Leafy trees are a special case, with F as the functor. Please write
-- functions which witness this isomorphism.

data F a = N a a

freeFToTree :: Free F a -> Tree a
freeFToTree = _EXERCISE_

treeToFreeF :: Tree a -> Free F a
treeToFreeF = _EXERCISE_

-- We now define an abstract version of arbitrary monads, analogous to
-- abstracted trees.  Witness an isomorphism.

newtype C m a = C { unC :: forall r. (a -> m r) -> m r }

rep :: Monad m => m a -> C m a
rep = _EXERCISE_

abs :: Monad m => C m a -> m a
abs = _EXERCISE_

-- Implement the monad instance from scratch, without rep/abs.

instance Monad (C m) where
    return = _EXERCISE_
    (>>=)  = _EXERCISE_ -- also tricky; if you get stuck, look at the
                        -- implementation for CTrees

-- By analogy of TreeLike for free monads, this typeclass allows
-- the construction of non-Return values.

class (Functor f, Monad m) => FreeLike f m where
    wrap :: f (m a) -> m a

instance Functor f => FreeLike f (Free f) where
    wrap = Wrap

instance FreeLike f m => FreeLike f (C m) where
    -- Toughest one of the bunch. Remember that you have 'wrap' available for the
    -- inner type as well as functor and monad instances.
    wrap = _EXERCISE_

-- And for our fruits, we now have a fully abstract improver!

improve :: Functor f => (forall m. FreeLike f m => m a) -> Free f a
improve m = abs m

-- Bonus: Why is the universal quantification over 'r' needed?   What if
-- we wrote C r m a = ...?  Try copypasting your definitions for that
-- case.

Why iteratees are hard to understand

January 4, 2012

There are two primary reasons why the low-level implementations of iteratees, enumerators and enumeratees tend to be hard to understand: purely functional implementation and inversion of control. The strangeness of these features is further exacerbated by the fact that users are encouraged to think of iteratees as sinks, enumerators as sources, and enumeratees as transformers. This intuition works well for clients of iteratee libraries but confuses people interested in digging into the internals.

In this article, I’d like to explain the strangeness imposed by the purely functional implementation by comparing it to an implementation you might see in a traditional, imperative, object-oriented language. We’ll see that concepts which are obvious and easy in an imperative setting are less-obvious but only slightly harder in a purely functional setting.

Types¶

The following discussion uses nomenclature from the enumerator library, since at the time of the writing it seems to be the most popular implementation of iteratees currently in use.

The fundamental entity behind an iteratee is the Step. The usual intuition is that is represents the “state” of an iteratee, which is either done or waiting for more input. But we’ve cautioned against excessive reliance on metaphors, so let’s look at the types instead:

data Step a b = Continue (Stream a -> m (Step a b)) | Yield b
type Iteratee     a b =                  m (Step a b)
type Enumerator   a b = Step a b ->      m (Step a b)
type Enumeratee o a b = Step a b -> Step o (Step a b)

I have made some extremely important simplifications from the enumerator library, most of important of which is explicitly writing out the Step data type where we would have seen an Iteratee instead and making Enumeratee a pure function. The goal of the next three sections is to explain what each of these type signatures means; we’ll do this by analogy to the imperative equivalents of iteratees. The imperative programs should feel intuitive to most programmers, and the hope is that the pure encoding should only be a hop away from there.

Step/Iteratee¶

We would like to design an object that is either waiting for input or finished with some result. The following might be a proposed interface:

interface Iteratee<A,B> {
  void put(Stream<A>);
  Maybe<B> result();
}

This implementation critically relies on the identity of an object of type Iteratee, which maintains this identity across arbitrary calls to put. For our purposes, we need to translate put :: IORef s -> Stream a -> IO () (first argument is the Iteratee) into a purely functional interface. Fortunately, it’s not too difficult to see how to do this if we understand how the State monad works: we replace the old type with put :: s -> Stream a -> s, which takes the original state of the iteratee (s = Step a b) and some input, and transforms it into a new state. The final definition put :: Step a b -> Stream a -> m (Step a b) also accomodates the fact that an iteratee may have some other side-effects when it receives data, but we are under no compulsion to use this monad instance; if we set it to the identity monad our iteratee has no side effects (StateT may be the more apt term here). In fact, this is precisely the accessor for the field in the Continue constructor.

Enumerator¶

We would like to design an object that takes an iteratee and feeds it input. It’s pretty simple, just a function which mutates its input:

void Enumerator(Iteratee<A,B>);

What does the type of an enumerator have to say on the matter? :

type Enumerator a b = Step a b -> m (Step a b)

If we interpret this as a state transition function, it’s clear that an enumerator is a function that transforms an iteratee from one state to another, much like the put. Unlike the put, however, the enumerator takes no input from a stream and may possibly cause multiple state transitions: it’s a big step, with all of the intermediate states hidden from view.

The nature of this transformation is not specified, but a common interpretation is that the enumerator repeatedly feeds an input to the continuation in step. An execution might unfold to something like this:

-- s :: Step a b
-- x0, x1 :: Stream a
case s of
    Yield r -> return (Yield r)
    Continue k -> do
        s' <- k x0
        case s' of
            Yield r -> return (Yield r)
            Continue k -> do
                s'' <- k x1
                return s''

Notice that our type signature is not:

type Enumerator a b = Step a b -> m ()

as the imperative API might suggest. Such a function would manage to run the iteratee (and trigger any of its attendant side effects), but we’d lose the return result of the iteratee. This slight modification wouldn’t do either:

type Enumerator a b = Step a b -> m (Maybe b)

The problem here is that if the enumerator didn’t actually manage to finish running the iteratee, we’ve lost the end state of the iteratee (it was never returned!) This means you can’t concatenate enumerators together.

It should now be clear why I have unfolded all of the Iteratee definitions: in the enumerator library, the simple correspondence between enumerators and side-effectful state transformers is obscured by an unfortunate type signature:
type Enumerator a b = Step a b -> Iteratee a b
Oleg’s original treatment is much clearer on this matter, as he defines the steps themselves to be the iteratees.

Enumeratee¶

At last, we are now prepared to tackle the most complicated structure, the enumeratee. Our imperative hat tells us a class like this might work:

interface Enumeratee<O,A,B> implements Iteratee<O,B> {
    Enumeratee(Iteratee<A,B>);
    bool done();
    // inherited from Iteratee<O,B>
    void put(Stream<O>);
    Maybe<B> result();
}

Like our original Iteratee class, it sports a put and result operation, but upon construction it wraps another Iteratee: in this sense it is an adapter from elements of type O to elements of type A. A call to the outer put with an object of type O may result in zero, one or many calls to put with an object of type A on the inside Iteratee; the call to result is simply passed through. An Enumeratee may also decide that it is “done”, that is, it will never call put on the inner iteratee again; the done method may be useful for testing for this case.

Before we move on to the types, it’s worth reflecting what stateful objects are involved in this imperative formulation: they are the outer Enumeratee and the inner Iteratee. We need to maintain two, not one states. The imperative formulation naturally manages these for us (after all, we still have access to the inner iteratee even as the enumeratee is running), but we’ll have to manually arrange for this is in the purely functional implementation.

Here is the type for Enumeratee:

type Enumeratee o a b = Step a b -> Step o (Step a b)

It’s easy to see why the first argument is Step a b; this is the inner iteratee that we are wrapping around. It’s less easy to see why Step o (Step a b) is the correct return type. Since our imperative interface results in an object which implements the Iteratee<O,B> interface, we might be tempted to write this signature instead:

type Enumeratee o a b = Step a b -> Step o b

But remember; we need to keep track of two states! We have the outer state, but what of the inner one? In a situation similar reminiscent of our alternate universe Enumerator earlier, the state of the inner iteratee is lost forever. Perhaps this is not a big deal if this enumeratee is intended to be used for the rest of the input (i.e. done always returns false), but it is quite important if we need to stop using the Enumeratee and then continue operating on the stream Step a b.

By the design of iteratees, we can only get a result out of an iteratee once it finishes. This forces us to return the state in the second parameter, giving us the final type:

type Enumeratee o a b = Step a b -> Step o (Step a b)

“But wait!” you might say, “If the iteratee only returns a result at the very end, doesn’t this mean that the inner iteratee only gets updated at the end?” By the power of inversion of control, however, this is not the case: as the enumeratee receives values and updates its own state, it also executes and updates the internal iteratee. The intermediate inner states exist; they are simply not accessible to us. (This is in contrast to the imperative version, for which we can peek at the inner iteratee!)

Another good question is “Why does the enumerator library have an extra monad snuck in Enumeratee?”, i.e. :
Step a b -> m (Step o (Step a b))
My understanding is that the monad is unnecessary, but may be useful if your Enumeratee needs to be able to perform a side-effect prior to receiving any input, e.g. for initialization.

Conclusion¶

Unfortunately, I can’t claim very much novelty here: all of these topics are covered in Oleg’s notes. I hope, however, that this presentation with reference to the imperative analogues of iteratees makes the choice of types clearer.

There are some important implications of using this pure encoding, similar to the differences between using IORefs and using the state monad:

Iteratees can be forked and run on different threads while preserving isolation of local state, and
Old copies of the iteratee state can be kept around, and resumed later as a form of backtracking (swapping a bad input for a newer one).

These assurances would not be possible in the case of simple mutable references. There is one important caveat, however, which is that while the pure component of an iteratee is easily reversed, we cannot take back any destructive side-effects performed in the monad. In the case of forking, this means any side-effects must be atomic; in the case of backtracking, we must be able to rollback side-effects. As far as I can tell, the art of writing iteratees that take advantage of this style is not well studied but, in my opinion, well worth investigating. I’ll close by noting that one of the theses behind the new conduits is that purity is not important for supporting most stream processing. In my opinion, the jury is still out.

« Newer Posts Older Posts »