ezyang's blog

the arc of software bends towards understanding

Writing generator friendly code

I’ve come a long ways from complaining to the html5lib list that the Python version gratuitously used generators, making it hard to port to PHP. Having now drunk the laziness kool-aid in Haskell, I enjoy trying to make my code fit the generator idiom. While Python generators have notable downsides compared to infinite lazy lists (for example, forking them for multiple use is nontrivial), they’re pretty nice.

Unfortunately, the majority of code I see that expects to see lists isn’t robust enough to accept generators too, and it breaks my heart when I have to say list(generator). I’ll forgive you if you’re expecting O(1) accesses of arbitrary indexes in your internal code, but all too often I see code that only needs sequential access, only to botch it all up by calling len(). Duck typing won’t save you there.

The trick for making code generator friendly is simple: use the iteration interface. Don’t mutate the list. Don’t ask for arbitrary items. Don’t ask for the length. This also is a hint that for range(0, len(l)) is absolutely the wrong way to traverse a list; if you need indices, use enumerate.

Update (September 1, 2012). Hilariously enough, PHP has finally gotten generators.

History as documentation

It’s real easy to argue about the utility, style and implementation of source code comments, those good ole’ pals of code that try to add supplementary information when the pure code isn’t enough.

However, to focus solely on the latest snapshot of any particular source file is to miss a wealth of information that is not inside the file; namely, the history of the file and the genealogy of every line that graces the file. This is not so relevant when you are rapidly prototyping functionality and versions of the file in source control history represent incomplete, half-baked figments of thought, but once a codebase transitions into a more maintenance-oriented workflow, the history takes on a keen and unusual importance. In particular:

  • A log of the evolution of the file over time can illustrate what the original intent of the module was, and then how it got retrofitted or extended or hacked up over time. If you have inherited code from someone else that you need to rearchitect, what better way to get in the heads of the original designers than to study the revisions they went through.
  • Any particular line may have simply been part of the ambient code present during the initial check-in, or it may have been touched by a highly targeted commit addressing some issue. In this case, the output of git blame is highly relevant for identifying why that particular line might be special, or why a subtly different permutation is incorrect. In the case of delocalized changes, associating a line with a commit can give you the fast pass to understanding how one operation is orchestrated with many others for some global effect.

Developers should be highly encouraged to write impeccably descriptive commit messages (with the diff in hand: never write a commit message without the diff in hand) for the sake of those who may pick through the logs in the future. It’s ok to even be a little wordier than you might be in an inline comment, since:

  • Log messages never grow old: they are always relevant to the revision they are attached to!
  • A good commit message facilitates code review, since it poses an informal specification of what the change does, which an external observer can then take and verify against the code. Otherwise, the reviewer would have to determine both the intended and actual semantics of the code, stylistic issues aside.

Finally, a few words about keeping the history clean and easy to use:

  • Logically organized patch sets mean that any given change is immediately relevant to the log message. If you push a big commit which contains lots of semantic changes, the reader has to disambiguate which particular semantic change is associated with which part of the diff. It is certainly worth your time to git add -p to stage hunks individually.
  • Make high quality diffs, which avoid touching unnecessary code. High traffic mailing lists such as LKML which receive many patches have published patch submission guidelines in order to make diffs as readable as possible to a possible reviewer. Even if you don’t need to convince a temperamental upstream to take your changes, later in time you may care about your diffs.
  • Stylistic changes are highly disruptive the git blame output, since they result in a line being marked as changed even though no semantic difference took place. If you must, they should be strictly alone. Infrequent is best.
  • Utilize history rewriting to allow for cheap commits which are polished up later for submission.

The Art of Posing a Problem

Last week, I was talking with Alexey Radul to figure out some interesting research problems that I can cut my teeth on. His PhD thesis discusses “propagation networks”, which he argues is a more general substrate for computation than traditional methods. It’s a long work, and it leaves open many questions, both theoretical and practical. I’m now tackling one very small angle with regards to the implementation of the system, but while we were still figuring a problem out, Alexy commented, “the more work I realize it takes to do a good job of giving someone a problem.”

I wholeheartedly agree, though my experiences come from a different domain: SIPB. Some of the key problems with assigning interested prospectives projects to work on include:

  • Many projects are extremely large and complex, and in many cases it’s simply not possible to assign someone an interesting, high-level project and expect them to make significant headway. They’re more likely to progress on a wax on wax off style training, but that’s not interesting.
  • No one ever tells you what they’re interested in! Even if you ask, you’ll probably get the answer, “Eh, I’d be up for anything.” As someone who has used this phrase before, I also emphatically understand that this is not true; people have different interests and will enjoy the same task dramatically differently.
  • It’s easy to exert too much or too little control over the direction of the project. Too much control and you’ve defined the entire technical specification for the person, taken away their creative input, made them feel bad when they’ve not managed to get work done, and are bound to be dismayed when they failed to understand your exacting standards in the first place. Too little control and the person can easily get lost or waste hours fighting incidental issues and not the core of the problem.
  • Being a proper mentor is a time-consuming process, even if you exert minimal control. Once the person comes back with a set of patches, you still have to read them, make sure they’re properly tested, and send back reviews on how the patches need to be reviewed (and for all but the most trivial of changes, this will be inevitable). You might wonder why you didn’t just do the damn task yourself. Reframing the problem as a purely educational exercise can also be disappointing, if not done properly.
  • As people refine the art of bootstrapping, the number of possible projects they can work on explode, and what makes you think that they’re going to work on your project? People decide what they want to work on, whether it’s because they made it themselves, or it’s in a field they’re interested in, or it’s a tool they use day-by-day, and if you don’t get the person to buy in, you can easily loose them.

I imagine similar tensions come up for open-source project maintainers, internship programs and Google Summer of Code organizers. And I still have no feeling for what strategies actually work in this space, even though I’ve certainly been on both sides of the fence. I’d love to hear from people who have tried interesting strategies and had them work!

Comonads and Convolutions

> {-# LANGUAGE GeneralizedNewtypeDeriving #-}
> import Control.Comonad
> import Data.List

That scary Control.Comonad import from category-extras is going to be the subject of today’s post. We’re going to look at one possible implementation of comonads for non-empty lists that model causal time-invariant systems, systems whose outputs depend only on inputs that are in the past. We will see that computation in these systems follows a comonadic structure and that one instance of this structure strongly enforces causality and weakly enforces time-invariance.

Our causal lists are simply a newtype of list with the added restriction that they are non-empty; causal is a “smart constructor” that enforces this restriction. We use GeneralizedNewtypeDeriving to get the Functor instance for free.

> newtype Causal a = Causal [a]
>    deriving (Functor, Show)
>
> causal :: a -> [a] -> Causal a
> causal x xs = Causal (x:xs)
>
> unCausal :: Causal a -> [a]
> unCausal (Causal xs) = xs
>
> type Voltage = Float

Background. (If you’re already familiar with signal processing, feel free to skip this section.) One such system models point-to-point communication of voltage samples across an imperfect wire channel. In an ideal world, we would very much like to be able to pretend that any voltage I put into this channel would instantly perfectly transmit this voltage to the other end of the channel. In practice, we’ll see any number of imperfections, including time to rise and fall, a delay, ringing and noise. Noise is a party pooper, so we’re going to ignore it for the purposes of this post.

To a first approximation, we can impose the following important conditions on our system:

  • Causality. Our wire can’t peek into the future and transmit some voltage before it has even gotten it.
  • Time-invariance. Any signal will get the same response whether or not it gets sent now or later.
  • Linearity. A simple and useful approximation for wires, which states this mathematical property: if an input x1 results in an output y1, and an input x2 results in an output y2, then the input Ax1 + Bx2 results in the output Ay1 + By2. This also means we get superposition, which is an important technique that we’ll use soon.

When you see a linear time-invariant system, it means that we get to use a favorite mathematical tool, the convolution.

Discrete convolutions. The overall structure of the discretized computation that a channel performs is [Voltage] -> [Voltage]; that is, we put in a sequence of input voltage samples, and get out another sequence of output voltage samples. On the other hand, the discrete convolution is the function calculated by (with variables suggestively named):

(u ∗ f)[n] = sum from m = -∞ to ∞ of f[m]u[n-m]

It’s not quite obvious why the convolution is the mathematical abstraction we’re looking for here, so we’ll sketch a brief derivation.

One special case of our computation is when the input corresponds to [1, 0, 0, 0 ...], called the unit sample. In fact, due to linearity and time-invariance, the output that our system gives when posed with the unit sample, the unit sample response, precisely specifies the behavior of a system for all inputs: any possible input sequence could be composed of any number of delayed and scaled unit samples, and linearity says we can sum all of the results together to get a result.

A list is actually a function ℕ → a, and we can extend the domain to be over integers if we propose the convention f[n] = 0 for n < 0. Suppose that f[n] represents our input samples varying over time, δ[n] represents a unit sample (δ[0] = 1, δ[n] = 0 for all other n; you’ll commonly see δ[n-t], which is a unit sample at time t), and u[n] represents our unit sample response. Then, we decompose f[n] into a series of unit samples:

f[n] = f[0]δ[n] + f[1]δ[n-1] + ...

and the use linearity to retrieve our response g[n]:

g[n] = f[0]u[n] + f[1]u[n-1] + ...
     = sum from m = 0 to ∞ of f[m]u[n-m]

which looks just like the discrete convolution, just without the -∞ bound. Remember that we defined f[m] = 0 for m < 0, so the two are actually equivalent.

I’d like to linger on that final mathematical definition for a moment, before writing out the equivalent Haskell. We originally stated that the input-response computation had the type [Voltage] -> [Voltage]; however, in our math, we’ve actually defined a relation [Voltage] -> Voltage, a channel specific function that takes all of the inputs up to time n, i.e. f[0]..f[n], and returns a single output g[n]. I’ve written the following definition in a suggestive curried form to reflect this:

> ltiChannel :: [Voltage] -> Causal Voltage -> Voltage
> ltiChannel u = \(Causal f) -> sum $ zipWith (*) (reverse f) u

The unit sample response may be a finite or infinite list, for reasons of efficiency a finite list is recommended:

> usr :: [Voltage]
> usr = [1,2,5,2,1]

Comonads. By now, it should be clear where we’ve been working towards: we have ltiChannel usr :: Causal Voltage -> Voltage and we want: Causal Voltage -> Causal Voltage. This is precisely the form of computation that the comonad induces! For your convenience, here is the definition of the Copointed and Comonad type classes:

class Functor f => Copointed f where
    extract :: f a -> a

class Copointed w => Comonad w where
    duplicate :: w a -> w (w a)
    extend :: (w a -> b) -> w a -> w b

The Copointed instance is straight-forward, but demonstrates why the Causal must contain a non-empty list:

> instance Copointed Causal where
>    extract (Causal xs) = head xs

The Comonad instance can be defined using either duplicate or extend; both have default implementations defined in terms of each other. Deriving these default implementations is left as an exercise to the reader; we’ll define both here:

> instance Comonad Causal where
>    extend f  = Causal . map (f . Causal) . tail . inits . unCausal
>    duplicate = Causal . map      Causal  . tail . inits . unCausal

The intent of the code is somewhat obscured by the unwrapping and wrapping of Causal; for a pure list the instance would look like this:

instance Comonad [] where
    extend f  = map f . tail . inits
    duplicate = tail . inits

The function duplicate really gets to the heart of what this comonad instance does: we take our input list and transform it into a list of histories, each one one step further than the last. The tail tags along to drop the first value of inits which is an empty list. duplicate builds up w (w a), and then the user-supplied function tears it back down to w b (if you think of monads, the lifted user function builds up m (m b), and then join tears it back down to m b.)

One quick test to make sure it works:

> unitStep :: Causal Voltage
> unitStep = Causal (repeat 1)
>
> result :: Causal Voltage
> result = unitStep =>> ltiChannel usr

and sure enough, the result is:

Causal [1.0, 3.0, 8.0, 10.0, 11.0, 11.0, ...]

=>> is a flipped extend, and the comonadic equivalent of the monadic >>=.

Enforced invariants. Structuring our computation in this form (as opposed to writing the darn convolution out explicitly) gives us some interesting enforced invariants in our code. Our channels need not be linear; I could have squared all of the inputs before convolving them with the unit sample response, and that certainly would not be linear. However, any channel we write must be causal and and will usually be time-invariant: it must be causal because we never pass any values from the future to the user function, and it is weakly time invariant because we don’t explicitly let the user know how far along the are the input stream they are. In practice with our implementation, they could divine this information using length; we could get stronger guarantees employing a combinator that reverses the list and then appends repeat 0:

> tiChannel :: ([Voltage] -> Voltage) -> Causal Voltage -> Voltage
> tiChannel f (Causal xs) = f (reverse xs ++ repeat 0)
>
> ltiChannel' :: [Voltage] -> Causal Voltage -> Voltage
> ltiChannel' u = tiChannel (\xs -> sum $ zipWith (*) u xs)

u in this case must be finite, and if it is infinite can be truncated at some point to specify how precise our computation should be.

Open question. The unit sample response has been expressed in our sample code as [Voltage], but it really is Causal Voltage. Unfortunately, the comonad doesn’t seem to specify mechanisms for combining comonadic values the same way the list monad automatically combines the results of computations for each of the values of a list. I’m kind of curious how something like that might work.

Type manipulation: Tricks of the trade

I present here a few pieces of folklore, well known to those practiced in Haskell, that I’ve found to be really useful techniques when analyzing code whose types don’t seem to make any sense. We’ll build practical techniques for reasoning about types, to be able to derive the type of fmap fmap fmap by ourselves. Note that you could just ask GHCI what the type is, but that would spoil the fun! (More seriously, working out the example by hand, just like a good problem set, helps develop intuition for what might be happening.)

Currying and types. Three type signatures that have a superficial similarities are a -> b -> c, (a -> b) -> c and a -> (b -> c). If you don’t have a visceral feel for Haskell’s automatic currying, it can be easy to confuse the three. In this particular case, a -> b -> c which reads as “takes two arguments a and b and returns c” is equivalent to a -> (b -> c) read “takes a and returns a function that takes b and returns c. These are distinct from (a -> b) -> c read “takes a function of a -> b and returns c”. A visual rule you can apply, in these cases, is that parentheses that are flush with the right side of the type signature can be freely added or removed, whereas any other parentheses cannot be.

Higher-order functions. If I pass an Int to id :: a -> a, it’s reasonably obvious that id takes the shape of Int -> Int. If I pass a function a -> a to id :: a -> a, id then takes the shape (a -> a) -> a -> a. Personally, I find the overloading of type parameters kind of confusing, so if I have a cadre of functions that I’m trying to derive the type of, I’ll give them all unique names. Since id id is a tad trivial, we’ll consider something a little nastier: (.) (.). Recall that (.) :: (b -> c) -> (a -> b) -> a -> c. We’re not actually going to use those letters for our manipulation: since our expression has two instances of (.), we’ll name the first a and the second b, and we’ll number them from one to three. Then:

(.) :: (a2 -> a3) -> (a1 -> a2) -> a1 -> a3
(.) :: (b2 -> b3) -> (b1 -> b2) -> b1 -> b3

Slightly less aesthetically pleasing, but we don’t have anymore conflicting types. Next step is to identify what equivalences are present in the type variables, and eliminate redundancy. Since we’re passing the second (.) to the first (.) as an argument:

(a2 -> a3) == (b2 -> b3) -> (b1 -> b2) -> b1 -> b3

to which you might say, “those function signatures don’t look anything alike!” which leads us to our next point:

Currying and type substitution. If your function’s type is n-ary, and the type you’re trying to match it against is m-ary, curry so that your function is m-ary to! So, if you have a -> b -> c, and you want to pass it as d -> e, then you actually have a -> (b -> c), and thus d == a and e == (b -> c). A curious case if it’s in the other direction, in which case d -> e is actually restricted to be d -> (e1 -> e2), where e == (e1 -> e2) and the obvious equalities hold.

To go back to our original example, the second (.) would be grouped as such:

(.) :: (b2 -> b3) -> ((b1 -> b2) -> b1 -> b3)

and we achieve the type equalities:

a2 == (b2 -> b3)
a3 == (b1 -> b2) -> b1 -> b3

Now, let’s substitute in these values for the first (.):

(.) :: ((b2 -> b3) -> (b1 -> b2) -> b1 -> b3) ->
       (a1 -> b2 -> b3) -> a1 -> (b1 -> b2) -> b1 -> b3

and drop the first argument, since it’s been applied:

(.) (.) :: (a1 -> b2 -> b3) -> a1 -> (b1 -> b2) -> b1 -> b3

You might be wondering what that monstrous type signature does…

Interpreting type signatures. A great thing about polymorphic types is that there’s not very much non-pathological behavior that can be specified: because the type is fully polymorphic, we can’t actually stick our hand in the box and use the fact that it’s actually an integer. This property makes programs like Djinn, which automatically derive a function’s contents given a type signature, possible, and with a little practice, you can figure it out too.

Working backwards: we first take a look at b3. There’s no way for our function to magically generate a value of type b3 (excluding undefined or bottom, which counts as pathological), so there’s got to be something else in our script that generates it. And sure enough, it’s the first argument, but we need to pass it a1 and b2 first:

(.) (.) w x y z = w undefined undefined

We repeat the process for each of those types in turn: where is a1 specified? Well, we pass it in as the second argument. Where is b2 specified? Well, we have another function y :: b1 -> b2, but we need a b1 which is z. Excellent, we now have a full implementation:

(.) (.) w x y z = w x (y z)

Pointfree style as operator composition. So, we now know what (.) (.) does, but we don’t really have a good motivation for why this might be the case. (By motivation, I mean, look at (.) (.) taking function composition at face value, and then realizing, “oh yes, it should do that.”) So what we’d really like to focus on is the semantics of (.), namely function composition, and the fact that we’re currying it. One line of thought might be:

  1. Function composition is defined to be (f . g) x = f (g x).
  2. We’re partially applying the composition, so actually we have (f.) g x, but g is missing. (if the (f.) looks funny to you, compare it to (2+), which is partially applied addition. Note that addition is commutative, so you’re more likely to see (+2), which becomes (x+2) when applied.)
  3. f is actually another composition operator. Since functional composition is single-argument oriented, we want to focused on the curried version of (.), which takes a function and returns a function (1) that takes another function (2) and a value and returns the first function applied to the result of the second function applied to the value.
  4. Read out the arguments. Since (f.) is on the outside, the first argument completes the curry. The next argument is what will actually get passed through the first argument, and the result of that will get passed through f. The return value of that is another function, but (barring the previous discussion) we haven’t figured out what that would be yet. Still, we’ve figured out what the first two arguments might look like.

If we now cheat and look at the type signature, we see our hypotheses are verified:

(.) (.) :: (a1 -> b2 -> b3) -> a1 -> (b1 -> b2) -> b1 -> b3

The first argument g :: a1 -> b2 -> b3 completes the curry, and then the next argument is fed straight into it, so it would have to be a1. The resulting value b2 -> b3 is fed into the next composition operator (notice that it’s not a single variable, since the next composition forces it to be a 1-ary function) and is now waiting for another function to complete the curry, which is the next argument b1 -> b2 (i.e. b1 -> b2 -> b3). Then it’s a simple matter of supplying the remaining arguments.

I find thinking of functions as partially applied and waiting to be “completed” leads to a deeper intuitive understanding of what a complex chain of higher order functions might do.

Putting it together. It is now time to work out the types for fmap fmap fmap. We first write out the types for each fmap:

fmap :: (Functor f) => (a1 -> a2) -> f a1 -> f a2
fmap :: (Functor g) => (b1 -> b2) -> g b1 -> g b2
fmap :: (Functor h) => (c1 -> c2) -> h c1 -> h c2

Perform application and we see:

(a1 -> a2) == (b1 -> b2) -> g b1 -> g b2
f a1 == (c1 -> c2) -> h c1 -> h c2

Luckily enough, we have enough arguments to fill up the first fmap, so that’s one layer less of complexity. We can also further break these down:

-- from the first argument
a1 == b1 -> b2
a2 == g b1 -> g b2
-- from the second argument
a1 == h c1 -> h c2
f == (->) (c1 -> c2)

The last equality stems from the fact that there’s only one reasonable functor instance for (c1 -> c2) -> h c1 -> h c2; the functor for functions i.e. the reader monad, taking (c1 -> c2) as its “read-in”.

We can do a few more simplifications:

h c1 -> h c2 == b1 -> b2
b1 == h c1
b2 == h c2

Substitute everything in, and now we see:

fmap fmap fmap :: (Functor g, Functor h) =>
   (c1 -> c2) -> g (h c1) -> g (h c2)

Interpret the types and we realize that fmap fmap fmap does a “double” lift of a function c1 -> c2 to two functors. So we can run fmap fmap fmap (+2) [Just 3] and get back [Just 5] (utilizing the functor instance for the outer list and the inner maybe).

We also notice that the f functor dropped out; this is because it was forced to a specific form, so really fmap fmap fmap == fmap . fmap. This makes it even more obvious that we’re doing a double lift: the function is fmap‘ed once, and then the result is fmap‘ed again.

We can even use this result to figure out what (.) (.) (.) (or (.) . (.)) might do; in the functions fmap = (.), so a normal function is lifted into one reader context by the first fmap, and another reader context with the second fmap. So we’d expect (.) . (.) :: (a -> b) -> (r2 -> r1 -> a) -> (r2 -> r1 -> b) (recall that f a if f = (->) r becomes r -> a) and indeed, that is the case. Compose composed with compose is merely a compose that can take a 2-ary function as it’s second argument and “do the right thing!”

Why we stay up late

I was having a discussion a few nights ago about attention, and someone mentioned the fact that contiguous blocks of time are precious. It’s obvious once it’s been pointed out to you, and with it in mind I’ve noticed my tendency to bucket useful activities into various categories: checking email, reading news feeds and simple tasks fall into the “less than an hour” time bucket, while really actually creating software, fixing hard bugs or reading code fall into the “more than an hour, preferably several hours” time bucket. I’ve recognized that attempting to tackle the “more than an hour” bucket in the snatches of time between classes and extracurriculars is simply wasteful.

But unlike the programmer working a day job that Paul Graham describes in his essay, I am a college student. I mostly don’t do my work during the day; that’s the time to imbibe information from lectures and recitations. In particular, I’m an MIT student, which means that there’s no 5:00 “oh, time to go home and relax” period; it’s work all the time (and you grab precious moments of relief with flights of procrastination when you can.) My mornings and evenings are saturated by meetings: they require me to physically relocate myself and pay attention to something else on someone else’s schedule. And throw in orchestra rehearsal from 7:30-10PM or small ensemble rehearsal from 5-7PM or SIPB meetings from 7:30-8:30PM and you’ve got a recipe for fragmentation.

When can you get that uninterrupted time? Only late at night, because when you are working on that problem set at two in the morning, there is one good thing: no one has scheduled a meeting at 2:30 AM. And while it’s definitely a shame that these benefits may be outweighed by the intoxication of sleep deprivation, you’re not going to find this sort of contiguous block of time anywhere else. And that’s why we stay up late.

Hunting for constraints

> import Data.List
> import Control.Monad

The following question appeared as part of a numbers-based puzzle in the 2010 MIT Mystery Hunt:

His final level on Wizard of Wor was equal to the smallest number that can be written as the sum of 4 non-zero squares in exactly 9 ways

I’d like to explore constraint search in Haskell to solve this problem. The hope is to find a (search) program that directly reflects the problem as posed, and also gives us an answer!

Because we are looking for the smallest number, it makes sense to start testing from a small number and start counting up. We’ll assume that the answer to this question won’t overflow Int.

Now, we need to test if it can be written as the sum of 4 non-zero squares in exactly 9 ways. This problem reduces to “how many ways can n be written as the sum of squares”, which is another search problem.

We’ll assume that 4+1+1+1 and 1+4+1+1 don’t constitute distinct for the purposes of our nine squares. This results in the first piece of cleverness: if we impose a strict ordering on our squares, we once again get uniqueness.

We also need to bound our search space; while fair search can help to some degree with infinite failure, our implementation will be much simpler if we can do some early termination. A very simple condition to terminate on is if the sum of the squares exceeds the number we’re looking for.

Considering the case where we are matching for x, and we have candidate roots a, b and c. Then, the maximum the remaining square can be is x - a^2 - b^2 - c^2, and the maximum value for d is the floor of the square root. Square roots are cheap, and we’re using machine size integers, so things are good.

> floorSqrt :: Int -> Int
> floorSqrt = floor . sqrt . fromIntegral
>
> sumSquares :: [Int] -> Int
> sumSquares as = sum (map (^2) as)
>
> rootMax :: Int -> [Int] -> Int
> rootMax x as = floorSqrt (x - sumSquares as)

From there, we just write out the search for distinct sums of squares of a number:

> searchSumFourSquares :: Int -> [(Int, Int, Int, Int)]
> searchSumFourSquares x = do
>       a <- [1..(rootMax x [])]
>       b <- [a..(rootMax x [a])]
>       c <- [b..(rootMax x [a,b])]
>       d <- [c..(rootMax x [a,b,c])]
>       guard $ sumSquares [a,b,c,d] == x
>       return (a,b,c,d)

And from there, the solution falls out:

> search :: Maybe Int
> search = findIndex (==9) (map (length . searchSumFourSquares) [0..])

We cleverly use [0..] so that the index is the same as the number itself. Alternative methods might use tuples.

Scheming environments

Environments are first-class objects in MIT/GNU Scheme. This is neat, because an integral part of the lexical structure of a Scheme is also a data-structure in its own right, able to encode data and behavior. In fact, the environment data structure is precisely what Yegge calls property lists, maps that can be linked up with inheritance. So not only is it a data structure, it’s a highly general one too.

Even without the ability to pass around an environment as a first class variable, you can still leverage its syntactic ties to stash away local state in an object-oriented manner. The data is only accessible inside procedures that pointed to the appropriate environment frame, and traditionally the closure returns a lambda (with its double-bubble pointed to the newly born environment frame) that acts as the only interface into the enclosing state. This requires a fair amount of boilerplate, since this lambda has to support every possible operation you might want to do to the innards of the function.

With a first class environment object, however, you can futz around with the closure’s bindings arbitrarily. Unfortunately, there’s no way to directly get-current-environent (except for the top-level REPL environment, which doesn’t count), so we resort to the following trick:

(procedure-environment (lambda () '()))

procedure-environment can grab the environment pointer from the double-bubble for some procedure. So, we force the creation of a double-bubble pointing to the environment we care about with the empty lambda.

I most recently used this technique in a 6.945 project, in which I used a lambda to generate a bunch of procedures with various parameters swapped out (encouraging code-reuse), something akin to the time-honored trick of including C files multiple times with different macro definitions. Instead of returning these procedures as a hash-table which then people would have to explicitly call, I just returned the environment, and thus any consumer could enter “a different universe” by using an appropriate environment.

Scheme is pretty unique in its celebration of environments as first-class objects. I could try this trick in Python, but the func_closure attribute on functions is read-only, and also Python has some pretty lame scoping rules. A shame, since this technique allows for some lovely syntactic simplifications.

Comment. Oleg Kiselyov writes in to mention that “*MIT Scheme* specifically is unique in its celebration of environments as first class environments,” and notes that even some developers of MIT scheme have second thoughts about the feature. It makes code difficult to optimize, and is both theoretically and practically dangerous: theoretically dangerous since environments are really an implementation detail, and practically dangerous because it makes it very difficult to reason about code.

From in-person discussions, I’m not surprised that Sussman’s favored dialect of Scheme allows such a dangerous feature; Sussman has always been in favor of letting people have access to dangerous toys and trusting them to use them correctly.

Sources of music

I love listening to music, especially new and interesting pieces that I’ve never heard before. Unfortunately, being a somewhat frugal spender my own personal collection of music grows very slowly; perhaps a bit too slowly for my own tastes. When I need new music, where do I turn?

  • MixApp is a collaborative music listening application. At its worst, it can be simply used as an extension to your current music library; anyone who is your friend and who is online, you can search for music and queue it up for yourself. However, the serendipitous part of MixApp is when you’ve dropped into a room of people you don’t know and music you don’t know, but man, it sounds good and suddenly you’re being taken on a sonic adventure across artists you’ve never heard of and a genre you’ve just discovered and wham: you just got MixApp’ed. Update: MixApp is dead (the founders went on to build Meteor), though there are replacements popping up like turntable.fm
  • Pandora and last.fm are both pretty reliable methods to get a stream of genre appropriate singles, one after another. The serendipity level is not as nice as MixApp, though, so I don’t find myself turning to these much.
  • There’s not really much that can beat a good radio host. People like David Garland and John Schaefer have such a diverse and deep palette of musical knowledge, and they’ve had every evening for who knows how many years to hone the craft of sharing this with the listeners of public radio. I was very pleased when WQXR finally managed to get a high-quality internet stream back online.
  • I was room-skipping on MixApp one evening, and was caught by the Kleptone’s latest album Uptime/Downtime. I have nothing against mix artists: the whole tradition of music has been founded upon the borrowing, stealing, and building upon of earlier work, and in many cases, an adept mix artist can improve the “popular music” material it was founded upon. Or sometimes the source material is just really awesome, and should be listened to in its own right: one of the most interesting musical adventures I’ve had recently was taking the samples list for Uptime/Downtime and listening to each source piece in turn.
  • Orchestra, wind ensemble, small ensemble, or really any type of ensemble, rehearsal, affords time several months to get intimately familiar with a particular piece of music. I would have never have gotten the chance to fully appreciate contemporary works such as Bells for Stokowski or Persichetti’s Masquerade for Band without this really in-depth exploration into a piece.

I should consider myself extremely lucky to be living in an era where new music is constantly at my fingertips. How do you seek out new and interesting music?

Nested loops and exceptions (Oleg Kiselyov)

Editorial. Today we interrupt our regularly scheduled programming to bring you a guest post by Oleg Kiselyov, reinterpreting our previous post about nested loops and continuations with exceptions.


Hello!

I noticed your recent article about nested loops and continuations. I should have commented on it using the provided form, but I was not sure how formatting would come out. The comment includes a lot of code. Please feel free to post the code in whole or in part, or do anything else with it.

The thesis of my comment is that callCC is not necessary for the implementation of break and continue in single and nested loops. We observe that the continuations of each iteration and of the entire loop are invoked either 0 or 1 time (but never more than once). That is the pattern of exceptions. So, the problem posed by your article can be solved with exceptions. Here are several variations of the solution.

First, a few preliminaries: this message is the complete literate Haskell code.

> import Prelude hiding (break, catch)
> import Control.Monad
> import Control.Monad.Trans

Alas, ErrorT in Control.Monad.Error has the stupid Error constraint. So, we have to write our own Exception monad transformer. The code below is standard.

> newtype ExT e m a = ExT{unExT :: m (Either e a)}
>
> instance Monad m => Monad (ExT e m) where
>     return  = ExT . return . Right
>     m >>= f = ExT $ unExT m >>= either (return . Left) (unExT . f)
>
> instance MonadTrans (ExT e) where
>     lift m = ExT $ m >>= return . Right
>
> instance MonadIO m => MonadIO (ExT e m) where
>     liftIO m = ExT $ liftIO m >>= return . Right
>
> raise :: Monad m => e -> ExT e m a
> raise = ExT . return . Left
>
> catch :: Monad m => ExT e m a -> (e -> ExT e' m a) -> ExT e' m a
> catch m h = ExT $ unExT m >>= either (unExT . h) (return . Right)
>
> runExT :: Monad m => ExT e m a -> m a
> runExT m = unExT m >>= either (const $ fail "Unhandled exc") return

We are ready to code the first solution, for simple, non-nested loops. The idea is to treat ‘break’ and ‘continue’ as exceptions. After all, both control operators cause computations to be skipped—which is what exceptions do. We define the datatype of our ’exceptions’:

> data BC = Break | Cont
>
> break, continue :: Monad m => ExT BC m a
> break    = raise Break
> continue = raise Cont

Here is the code for the loop: it catches exceptions at some points:

> for_in :: Monad m => [a] -> (a -> ExT BC m ()) -> m ()
> for_in xs f = runExT $ mapM_ iter xs `catch` hbreak
>  where 
>  iter x = catch (f x) hcont
>  hcont  Cont  = return () -- handle Cont, re-raise Break
>  hcont  e     = raise e
>  hbreak Break = return ()
>  hbreak Cont  = return ()     -- Shouldn't happen actually

Here is your test:

> loopLookForIt1 :: IO ()
> loopLookForIt1 =
>     for_in [0..100] $ \x -> do
>         when (x `mod` 3 == 1) $ continue
>         when (x `div` 17 == 2) $ break
>         lift $ print x

Running it:

> tf1 = loopLookForIt1 :: IO ()

prints 23 numbers starting with 0, 2, 3 and ending with 30, 32, 33.

We have to generalize to nested loops. Two solutions are apparent. I would call the first one ‘dynamic’. We index the exceptions by levels, which are natural numbers. Level 0 pertains to the current loop, level 1 is for the parent loop, etc.

> data BCN = BCN BC Int         -- Add the level of breaking

Operators break and continue now take the number: how many loop scopes to break. I think Perl has a similar breaking-with-number operator.

> breakN    = raise . BCN Break
> continueN = raise . BCN Cont

The new iterator:

> for_inN :: Monad m => [a] -> (a -> ExT BCN m ()) -> ExT BCN m ()
> for_inN xs f = mapM_ iter xs `catch` hbreak
>  where 
>  iter x = catch (f x) hcont
>  hcont  (BCN Cont 0)  = return () -- handle Cont, re-raise Break
>  hcont  e             = raise e
>  -- If the exception is for a parent, re-raise it, decrementing its level
>  hbreak (BCN Break 0) = return ()
>  hbreak (BCN Cont 0)  = return ()     -- Shouldn't happen actually
>  hbreak (BCN exc n)   = raise (BCN exc (n-1))

The single-loop test now looks as follows.

> loopLookForItN :: ExT BCN IO ()
> loopLookForItN =
>     for_inN [0..100] $ \x -> do
>         when (x `mod` 3 == 1) $ continueN 0
>         when (x `div` 17 == 2) $ breakN 0
>         lift $ print x
>
> tfN = runExT loopLookForItN :: IO ()

We can now write the nested loop test. I took a liberty to enhance the example in your article, so to exercises all cases:

> loopBreakOuter1 :: ExT BCN IO ()
> loopBreakOuter1 =
>     for_inN [1,2,3] $ \x -> do
>         lift $ print x
>         for_inN [4,5,6] $ \y -> do
>             lift $ print y
>             when (y == 4) $ continueN 0
>             when (x == 1) $ breakN 0
>             when (x == 3) $ breakN 1
>             when (y == 5) $ continueN 1
>             breakN 1
>         lift $ print x
>
> tbN1 = runExT loopBreakOuter1 :: IO ()

The result is the sequence of numbers: 1 4 5 1 2 4 5 3 4 5

There exists another solution for the nested-loop problem, which I call ‘static’. What if we just iterate the single-loop solution? We can nest ExT BC monad transformers to any given depth. To refer to particular layer in the transformer stack, we use lift. We can use the for_in iterator and the operators break, continue defined earlier. We write the nested test as follows:

> loopBreakOuterS1 :: IO ()
> loopBreakOuterS1 =
>     for_in [1,2,3] $ \x -> do
>         liftIO $ print x
>         for_in [4,5,6] $ \y -> do
>             liftIO $ print y
>             when (y == 4) $ continue
>             when (x == 1) $ break
>             when (x == 3) $ lift $ break
>             when (y == 5) $ lift $ continue
>             lift $ break
>         liftIO $ print x
> tbS1 = loopBreakOuterS1 :: IO ()

I guess the lesson here might be that callCC is often not needed (I would argue that callCC is never needed, but that’s the argument for another time). Here is another example of simple exceptions sufficing where call/cc was thought to be required:

http://okmij.org/ftp/Computation/lem.html

Cheers,
Oleg