ezyang's blog

the arc of software bends towards understanding

Being an expert considered harmful

It’s a sunny day in your advanced symbolic programming class. Your teacher has just started going over monads—in Scheme, though—and you sit in the back of the classroom snarking about little tidbits of knowledge you know from Haskell. Suddenly, the teacher says (quite earnestly too), “Edward here seems to know a lot about monads. Why don’t we have him come up and teach them to the class?” Suddenly, you’re up expounding types to people who have never used Haskell before and failing utterly to explain to people how the continuation monad works. Only after several iterations do you manage to partially rewrite the presentation in a form that doesn’t assume fluency in Haskell. You’ve fallen into the expert trap.

You’re an expert. You are in possession of in-depth knowledge, have accumulated wisdom and intuition, and all-in-all can work much more effectively than others within your domain. You might have an ego; you might get into hot arguments with other experts. Or you might be very unassuming and thoughtful; your expertise has little to do with your ego.

But unless you’ve been paying attention to the pre-requisite knowledge you assume, you will be terrible at teaching your area of expertise. Your expertise is getting in the way of teaching effectively, for the expert assumes too much prerequisite knowledge.

image

What do I mean when I speak of prerequisite knowledge? I don’t mean prerequisite “facts”—what is an iterative algorithm to solve linear equations, how does one reverse a list using a fold, how do X in my favorite framework. I do mean foundational knowledge: abstractions and higher-order primitives to think with—linear algebra, reducing higher-order operators and the architecture of said framework. One answers “how.” Another answers “Why.”

All of engineering and mathematics is perpetually in search of the right abstraction to tackle a problem. Perhaps the most striking change that occurs when you’ve put the problem in the right representation is that it becomes substantially shorter and easier to manipulate at a higher level. It’s no surprise that Newton needed to invent Calculus in order to develop his ideas about physics. The high-level programming languages and systems we build today would have been inconceivable in pure assembly language or silicon.

Finding and understanding the right abstraction is enlightenment: it makes hard things easy and impossible things possible. Calculations that used to take a page now are succinctly described in a sentence. The structure of the verbose system is encoded into the abstraction, leaving behind the salient pieces of the problem. Much of the same could be said for programs: before the advent of high level languages, assembly programs could fit on a few pages and be understood by a single programmer. They had to be. Modern software has gone far beyond.

In both cases, an expert will look at this new formulation, and immediately understand. The beginner, perhaps familiar with but not proficient in this encoding, has to now work out the underlying foundation again (or risk stumbling around with a simpler but faulty set of premises).

You might say, “Well, that’s not the problem of the expert; they just don’t have the prerequisites! I will teach them this topic once they learn that foundation.” This is not acceptable. It is true that formal education can grant them a familiarity with the basic primitives and relations of the abstraction; it is especially effective at weeding out false conceptions. But the facility that an expert has for an abstraction only comes when you spend some time “in the trenches”, using and applying the abstraction to bigger problems.

You might say, “I am not that uncharitable; I’ll teach the prerequisites too.” You might even expect to be able to impart knowledge upon the listener! Undelude yourself. In all but the simple topics (the ones where the simple statement of the solution is enough to illuminate), they won’t understand if you simply lecture to them. The teaching is just a roadmap for doing, the only way to truly get a visceral feel for any hard problem.

What you should say is, “I am just one lantern in the novice’s arsenal of understanding. I seek to illuminate precisely what the novice doesn’t think to look at.” In fact, there is an easy way to fulfill this purpose: force the novice to teach! They will start off with a very limited and ill-defined mental model of the concept: of the many roads to understanding, there is only one that they know. They will explain it in brutal detail of all their missteps and your implicit knowledge. They will be asked questions, and those questions will force them to clarify their understanding of this path. Eventually they will feel confident in their knowledge of the path, and if they continue learning, that path will expand to encompass many paths, different routes to understanding. The novice has become an expert. But, as the Buddha might say, they are the ones who must discover enlightenment. The teacher merely shows them the path.

How to use Vim's textwidth like a pro

There are lots of little blog posts containing advice about various one-line options you can do in Vim. This post falls into that category, but I’m hoping to do a more comprehensive view into one small subsystem of Vim’s configuration: automatic line wrapping.

When programming, automatic line wrapping can be a little obnoxious because even if a piece of code is hanging past the recommended 72/80 column width line, you probably don’t want to immediately break it; but if you’re writing a text document or an email message, that is specifically the behavior you want. By default, vim does no automatic line wrapping for you; turning it on is a question of being able to toggle it on and off when you want it.

Here are the configuration options you care about:

  • textwidth (or tw): controls the wrap width you would like to use. Use :set tw=72 to set the wrap width; by default it’s unset and thus disables line-wrapping. If this value is set, you’re entirely at the whimsy of the below formatoptions, which is often filetype sensitive.
  • formatoptions (or fo): controls whether or not automatic text wrapping is enabled, depending on whether or not the t flag is set. Toggle the flag on with :set fo+=t, and toggle it off with :set fo-=t. There are also a number of auxiliary format options, but they’re not as important.
  • wrapmargin (or wm): controls when to wrap based on terminal size; I generally find using this to be a bad idea.

Understanding the interaction between these two options is important. Here is a short table of interactions:

  • tw=0 fo=cq wm=0: No automatic wrapping, rewrapping will wrap to 80
  • tw=72 fo=cq wm=0: No automatic wrapping, rewrapping will wrap to 72
  • tw=0 fo=cqt wm=0: No automatic wrapping, rewrapping will wrap to 72
  • tw=0 fo=cqt wm=5: Automatic wrapping at a 5 col right margin
  • tw=72 fo=cqt wm=0: Automatic wrapping at col 72

Notice that to get automatic wrapping you need both fo+=t as well as tw or wm to be nonzero. Note also that some filetype will automatically give you fo+=t, while others won’t.

Here are the keystrokes you care about:

  • gq: performs a “formatting operation”, which in our universe means “rewrap the text.” This will respect leading indent and symbolic characters, which is usually nice but a little obnoxious if you’re reflowing a bullet point (since the text will suddenly acquire asterisks in front of everything).
  • The paragraph motions. The big one is vip (preceding v puts us in visual mode, for selection), which selects an “inner paragraph”; this means that if you’re anywhere inside of a paragraph, you can type vip and have the entire thing instantly selected for you, possibly for you to run gq subsequently. vap is also equivalent, although it selects a whole paragraph and is more appropriate if you want to, say, delete it. The curly braces move you between paragraphs.

The value of format-options will drastically change the way Vim behaves, so I highly recommend keeping it displayed some where you can reference it quickly. I use:

set statusline=...[%{&fo}]...

You probably have a statusline of your own; just add that small snippet minus the ellipses in somewhere convenient. For further good measure, I explicitly say set fo-=t in my vimrc, to prevent myself from being surprised (since I do primarily coding in vim).

One more neat trick:

augroup vimrc_autocmds
  autocmd BufEnter * highlight OverLength ctermbg=darkgrey guibg=#592929 
  autocmd BufEnter * match OverLength /\%74v.*/
augroup END

This will highlight all characters past 74 columns (tweak that number as desired) in dark grey (tweak that color as desired), and is a nice visual cue when auto linewrapping isn’t turned on when you should think about breaking things.

Third-party unattended upgrades in three steps

unattended-upgrades is a nifty little package that will go ahead and automatically install updates for you as they become enabled. No serious system administrator should use this (you are testing updates before pushing them to the servers, right?) but for many personal uses automatic updates are really what you want; if you run sudo aptitude full-upgrade and don’t read the changelog, you might as well turn on unattended upgrades. You can do this by adding the line APT::Periodic::Unattended-Upgrade "1" to /etc/apt/apt.conf.d/10periodic (thanks Ken!)

Of course, the default configuration they give you in /etc/apt/apt.conf.d/50unattended-upgrades only pulls updates from their security repository, and they only give you a commented out line for normal updates. People have asked, “well, how do I pull automatic updates from other repositories?” Maybe you have installed Chromium dailies; seeing the “you have updates” icon every day can be kind of tiresome.

Well, here’s how you do it:

  1. Find out what URL the PPA you’re interested in points to. You can dig this up by looking at /etc/apt/sources.list or /etc/apt/sources.list.d/ (the former is if you manually added a PPA at some point; the latter is likely if you used add-apt-repository).
  2. Navigate to that URL in your browser. Navigate to dists, and then navigate to the name of the distribution you’re running (for me, it was karmic). Finally, click Release. (For those who want to just enter the whole URL, it’s http://example.com/apt/dists/karmic/Release).
  3. You will see a number of fields Fieldname: Value. Find the field Origin and the field Suite. The two values are the ones to put in Allowed-Origins.

For example, the Ksplice repository has the following Release file:

Origin: Ksplice
Label: Ksplice
Suite: karmic
Codename: karmic
Version: 9.10
Date: Sun, 07 Feb 2010 20:51:12 +0000
Architectures: amd64 i386
Components: ksplice
Description: Ksplice packages for Ubuntu 9.10 karmic

This translates into the following configuration:

Unattended-Upgrade::Allowed-Origins {
       "Ksplice karmic";
};

And that’s it! Go forth and make your systems more secure through more timely updates.

Bonus tip. You can turn on unattended kernel updates via Ksplice by editing /etc/uptrack/uptrack.conf and setting autoinstall = yes.

Writing generator friendly code

I’ve come a long ways from complaining to the html5lib list that the Python version gratuitously used generators, making it hard to port to PHP. Having now drunk the laziness kool-aid in Haskell, I enjoy trying to make my code fit the generator idiom. While Python generators have notable downsides compared to infinite lazy lists (for example, forking them for multiple use is nontrivial), they’re pretty nice.

Unfortunately, the majority of code I see that expects to see lists isn’t robust enough to accept generators too, and it breaks my heart when I have to say list(generator). I’ll forgive you if you’re expecting O(1) accesses of arbitrary indexes in your internal code, but all too often I see code that only needs sequential access, only to botch it all up by calling len(). Duck typing won’t save you there.

The trick for making code generator friendly is simple: use the iteration interface. Don’t mutate the list. Don’t ask for arbitrary items. Don’t ask for the length. This also is a hint that for range(0, len(l)) is absolutely the wrong way to traverse a list; if you need indices, use enumerate.

Update (September 1, 2012). Hilariously enough, PHP has finally gotten generators.

History as documentation

It’s real easy to argue about the utility, style and implementation of source code comments, those good ole’ pals of code that try to add supplementary information when the pure code isn’t enough.

However, to focus solely on the latest snapshot of any particular source file is to miss a wealth of information that is not inside the file; namely, the history of the file and the genealogy of every line that graces the file. This is not so relevant when you are rapidly prototyping functionality and versions of the file in source control history represent incomplete, half-baked figments of thought, but once a codebase transitions into a more maintenance-oriented workflow, the history takes on a keen and unusual importance. In particular:

  • A log of the evolution of the file over time can illustrate what the original intent of the module was, and then how it got retrofitted or extended or hacked up over time. If you have inherited code from someone else that you need to rearchitect, what better way to get in the heads of the original designers than to study the revisions they went through.
  • Any particular line may have simply been part of the ambient code present during the initial check-in, or it may have been touched by a highly targeted commit addressing some issue. In this case, the output of git blame is highly relevant for identifying why that particular line might be special, or why a subtly different permutation is incorrect. In the case of delocalized changes, associating a line with a commit can give you the fast pass to understanding how one operation is orchestrated with many others for some global effect.

Developers should be highly encouraged to write impeccably descriptive commit messages (with the diff in hand: never write a commit message without the diff in hand) for the sake of those who may pick through the logs in the future. It’s ok to even be a little wordier than you might be in an inline comment, since:

  • Log messages never grow old: they are always relevant to the revision they are attached to!
  • A good commit message facilitates code review, since it poses an informal specification of what the change does, which an external observer can then take and verify against the code. Otherwise, the reviewer would have to determine both the intended and actual semantics of the code, stylistic issues aside.

Finally, a few words about keeping the history clean and easy to use:

  • Logically organized patch sets mean that any given change is immediately relevant to the log message. If you push a big commit which contains lots of semantic changes, the reader has to disambiguate which particular semantic change is associated with which part of the diff. It is certainly worth your time to git add -p to stage hunks individually.
  • Make high quality diffs, which avoid touching unnecessary code. High traffic mailing lists such as LKML which receive many patches have published patch submission guidelines in order to make diffs as readable as possible to a possible reviewer. Even if you don’t need to convince a temperamental upstream to take your changes, later in time you may care about your diffs.
  • Stylistic changes are highly disruptive the git blame output, since they result in a line being marked as changed even though no semantic difference took place. If you must, they should be strictly alone. Infrequent is best.
  • Utilize history rewriting to allow for cheap commits which are polished up later for submission.

The Art of Posing a Problem

Last week, I was talking with Alexey Radul to figure out some interesting research problems that I can cut my teeth on. His PhD thesis discusses “propagation networks”, which he argues is a more general substrate for computation than traditional methods. It’s a long work, and it leaves open many questions, both theoretical and practical. I’m now tackling one very small angle with regards to the implementation of the system, but while we were still figuring a problem out, Alexy commented, “the more work I realize it takes to do a good job of giving someone a problem.”

I wholeheartedly agree, though my experiences come from a different domain: SIPB. Some of the key problems with assigning interested prospectives projects to work on include:

  • Many projects are extremely large and complex, and in many cases it’s simply not possible to assign someone an interesting, high-level project and expect them to make significant headway. They’re more likely to progress on a wax on wax off style training, but that’s not interesting.
  • No one ever tells you what they’re interested in! Even if you ask, you’ll probably get the answer, “Eh, I’d be up for anything.” As someone who has used this phrase before, I also emphatically understand that this is not true; people have different interests and will enjoy the same task dramatically differently.
  • It’s easy to exert too much or too little control over the direction of the project. Too much control and you’ve defined the entire technical specification for the person, taken away their creative input, made them feel bad when they’ve not managed to get work done, and are bound to be dismayed when they failed to understand your exacting standards in the first place. Too little control and the person can easily get lost or waste hours fighting incidental issues and not the core of the problem.
  • Being a proper mentor is a time-consuming process, even if you exert minimal control. Once the person comes back with a set of patches, you still have to read them, make sure they’re properly tested, and send back reviews on how the patches need to be reviewed (and for all but the most trivial of changes, this will be inevitable). You might wonder why you didn’t just do the damn task yourself. Reframing the problem as a purely educational exercise can also be disappointing, if not done properly.
  • As people refine the art of bootstrapping, the number of possible projects they can work on explode, and what makes you think that they’re going to work on your project? People decide what they want to work on, whether it’s because they made it themselves, or it’s in a field they’re interested in, or it’s a tool they use day-by-day, and if you don’t get the person to buy in, you can easily loose them.

I imagine similar tensions come up for open-source project maintainers, internship programs and Google Summer of Code organizers. And I still have no feeling for what strategies actually work in this space, even though I’ve certainly been on both sides of the fence. I’d love to hear from people who have tried interesting strategies and had them work!

Comonads and Convolutions

> {-# LANGUAGE GeneralizedNewtypeDeriving #-}
> import Control.Comonad
> import Data.List

That scary Control.Comonad import from category-extras is going to be the subject of today’s post. We’re going to look at one possible implementation of comonads for non-empty lists that model causal time-invariant systems, systems whose outputs depend only on inputs that are in the past. We will see that computation in these systems follows a comonadic structure and that one instance of this structure strongly enforces causality and weakly enforces time-invariance.

Our causal lists are simply a newtype of list with the added restriction that they are non-empty; causal is a “smart constructor” that enforces this restriction. We use GeneralizedNewtypeDeriving to get the Functor instance for free.

> newtype Causal a = Causal [a]
>    deriving (Functor, Show)
>
> causal :: a -> [a] -> Causal a
> causal x xs = Causal (x:xs)
>
> unCausal :: Causal a -> [a]
> unCausal (Causal xs) = xs
>
> type Voltage = Float

Background. (If you’re already familiar with signal processing, feel free to skip this section.) One such system models point-to-point communication of voltage samples across an imperfect wire channel. In an ideal world, we would very much like to be able to pretend that any voltage I put into this channel would instantly perfectly transmit this voltage to the other end of the channel. In practice, we’ll see any number of imperfections, including time to rise and fall, a delay, ringing and noise. Noise is a party pooper, so we’re going to ignore it for the purposes of this post.

To a first approximation, we can impose the following important conditions on our system:

  • Causality. Our wire can’t peek into the future and transmit some voltage before it has even gotten it.
  • Time-invariance. Any signal will get the same response whether or not it gets sent now or later.
  • Linearity. A simple and useful approximation for wires, which states this mathematical property: if an input x1 results in an output y1, and an input x2 results in an output y2, then the input Ax1 + Bx2 results in the output Ay1 + By2. This also means we get superposition, which is an important technique that we’ll use soon.

When you see a linear time-invariant system, it means that we get to use a favorite mathematical tool, the convolution.

Discrete convolutions. The overall structure of the discretized computation that a channel performs is [Voltage] -> [Voltage]; that is, we put in a sequence of input voltage samples, and get out another sequence of output voltage samples. On the other hand, the discrete convolution is the function calculated by (with variables suggestively named):

(u ∗ f)[n] = sum from m = -∞ to ∞ of f[m]u[n-m]

It’s not quite obvious why the convolution is the mathematical abstraction we’re looking for here, so we’ll sketch a brief derivation.

One special case of our computation is when the input corresponds to [1, 0, 0, 0 ...], called the unit sample. In fact, due to linearity and time-invariance, the output that our system gives when posed with the unit sample, the unit sample response, precisely specifies the behavior of a system for all inputs: any possible input sequence could be composed of any number of delayed and scaled unit samples, and linearity says we can sum all of the results together to get a result.

A list is actually a function ℕ → a, and we can extend the domain to be over integers if we propose the convention f[n] = 0 for n < 0. Suppose that f[n] represents our input samples varying over time, δ[n] represents a unit sample (δ[0] = 1, δ[n] = 0 for all other n; you’ll commonly see δ[n-t], which is a unit sample at time t), and u[n] represents our unit sample response. Then, we decompose f[n] into a series of unit samples:

f[n] = f[0]δ[n] + f[1]δ[n-1] + ...

and the use linearity to retrieve our response g[n]:

g[n] = f[0]u[n] + f[1]u[n-1] + ...
     = sum from m = 0 to ∞ of f[m]u[n-m]

which looks just like the discrete convolution, just without the -∞ bound. Remember that we defined f[m] = 0 for m < 0, so the two are actually equivalent.

I’d like to linger on that final mathematical definition for a moment, before writing out the equivalent Haskell. We originally stated that the input-response computation had the type [Voltage] -> [Voltage]; however, in our math, we’ve actually defined a relation [Voltage] -> Voltage, a channel specific function that takes all of the inputs up to time n, i.e. f[0]..f[n], and returns a single output g[n]. I’ve written the following definition in a suggestive curried form to reflect this:

> ltiChannel :: [Voltage] -> Causal Voltage -> Voltage
> ltiChannel u = \(Causal f) -> sum $ zipWith (*) (reverse f) u

The unit sample response may be a finite or infinite list, for reasons of efficiency a finite list is recommended:

> usr :: [Voltage]
> usr = [1,2,5,2,1]

Comonads. By now, it should be clear where we’ve been working towards: we have ltiChannel usr :: Causal Voltage -> Voltage and we want: Causal Voltage -> Causal Voltage. This is precisely the form of computation that the comonad induces! For your convenience, here is the definition of the Copointed and Comonad type classes:

class Functor f => Copointed f where
    extract :: f a -> a

class Copointed w => Comonad w where
    duplicate :: w a -> w (w a)
    extend :: (w a -> b) -> w a -> w b

The Copointed instance is straight-forward, but demonstrates why the Causal must contain a non-empty list:

> instance Copointed Causal where
>    extract (Causal xs) = head xs

The Comonad instance can be defined using either duplicate or extend; both have default implementations defined in terms of each other. Deriving these default implementations is left as an exercise to the reader; we’ll define both here:

> instance Comonad Causal where
>    extend f  = Causal . map (f . Causal) . tail . inits . unCausal
>    duplicate = Causal . map      Causal  . tail . inits . unCausal

The intent of the code is somewhat obscured by the unwrapping and wrapping of Causal; for a pure list the instance would look like this:

instance Comonad [] where
    extend f  = map f . tail . inits
    duplicate = tail . inits

The function duplicate really gets to the heart of what this comonad instance does: we take our input list and transform it into a list of histories, each one one step further than the last. The tail tags along to drop the first value of inits which is an empty list. duplicate builds up w (w a), and then the user-supplied function tears it back down to w b (if you think of monads, the lifted user function builds up m (m b), and then join tears it back down to m b.)

One quick test to make sure it works:

> unitStep :: Causal Voltage
> unitStep = Causal (repeat 1)
>
> result :: Causal Voltage
> result = unitStep =>> ltiChannel usr

and sure enough, the result is:

Causal [1.0, 3.0, 8.0, 10.0, 11.0, 11.0, ...]

=>> is a flipped extend, and the comonadic equivalent of the monadic >>=.

Enforced invariants. Structuring our computation in this form (as opposed to writing the darn convolution out explicitly) gives us some interesting enforced invariants in our code. Our channels need not be linear; I could have squared all of the inputs before convolving them with the unit sample response, and that certainly would not be linear. However, any channel we write must be causal and and will usually be time-invariant: it must be causal because we never pass any values from the future to the user function, and it is weakly time invariant because we don’t explicitly let the user know how far along the are the input stream they are. In practice with our implementation, they could divine this information using length; we could get stronger guarantees employing a combinator that reverses the list and then appends repeat 0:

> tiChannel :: ([Voltage] -> Voltage) -> Causal Voltage -> Voltage
> tiChannel f (Causal xs) = f (reverse xs ++ repeat 0)
>
> ltiChannel' :: [Voltage] -> Causal Voltage -> Voltage
> ltiChannel' u = tiChannel (\xs -> sum $ zipWith (*) u xs)

u in this case must be finite, and if it is infinite can be truncated at some point to specify how precise our computation should be.

Open question. The unit sample response has been expressed in our sample code as [Voltage], but it really is Causal Voltage. Unfortunately, the comonad doesn’t seem to specify mechanisms for combining comonadic values the same way the list monad automatically combines the results of computations for each of the values of a list. I’m kind of curious how something like that might work.

Type manipulation: Tricks of the trade

I present here a few pieces of folklore, well known to those practiced in Haskell, that I’ve found to be really useful techniques when analyzing code whose types don’t seem to make any sense. We’ll build practical techniques for reasoning about types, to be able to derive the type of fmap fmap fmap by ourselves. Note that you could just ask GHCI what the type is, but that would spoil the fun! (More seriously, working out the example by hand, just like a good problem set, helps develop intuition for what might be happening.)

Currying and types. Three type signatures that have a superficial similarities are a -> b -> c, (a -> b) -> c and a -> (b -> c). If you don’t have a visceral feel for Haskell’s automatic currying, it can be easy to confuse the three. In this particular case, a -> b -> c which reads as “takes two arguments a and b and returns c” is equivalent to a -> (b -> c) read “takes a and returns a function that takes b and returns c. These are distinct from (a -> b) -> c read “takes a function of a -> b and returns c”. A visual rule you can apply, in these cases, is that parentheses that are flush with the right side of the type signature can be freely added or removed, whereas any other parentheses cannot be.

Higher-order functions. If I pass an Int to id :: a -> a, it’s reasonably obvious that id takes the shape of Int -> Int. If I pass a function a -> a to id :: a -> a, id then takes the shape (a -> a) -> a -> a. Personally, I find the overloading of type parameters kind of confusing, so if I have a cadre of functions that I’m trying to derive the type of, I’ll give them all unique names. Since id id is a tad trivial, we’ll consider something a little nastier: (.) (.). Recall that (.) :: (b -> c) -> (a -> b) -> a -> c. We’re not actually going to use those letters for our manipulation: since our expression has two instances of (.), we’ll name the first a and the second b, and we’ll number them from one to three. Then:

(.) :: (a2 -> a3) -> (a1 -> a2) -> a1 -> a3
(.) :: (b2 -> b3) -> (b1 -> b2) -> b1 -> b3

Slightly less aesthetically pleasing, but we don’t have anymore conflicting types. Next step is to identify what equivalences are present in the type variables, and eliminate redundancy. Since we’re passing the second (.) to the first (.) as an argument:

(a2 -> a3) == (b2 -> b3) -> (b1 -> b2) -> b1 -> b3

to which you might say, “those function signatures don’t look anything alike!” which leads us to our next point:

Currying and type substitution. If your function’s type is n-ary, and the type you’re trying to match it against is m-ary, curry so that your function is m-ary to! So, if you have a -> b -> c, and you want to pass it as d -> e, then you actually have a -> (b -> c), and thus d == a and e == (b -> c). A curious case if it’s in the other direction, in which case d -> e is actually restricted to be d -> (e1 -> e2), where e == (e1 -> e2) and the obvious equalities hold.

To go back to our original example, the second (.) would be grouped as such:

(.) :: (b2 -> b3) -> ((b1 -> b2) -> b1 -> b3)

and we achieve the type equalities:

a2 == (b2 -> b3)
a3 == (b1 -> b2) -> b1 -> b3

Now, let’s substitute in these values for the first (.):

(.) :: ((b2 -> b3) -> (b1 -> b2) -> b1 -> b3) ->
       (a1 -> b2 -> b3) -> a1 -> (b1 -> b2) -> b1 -> b3

and drop the first argument, since it’s been applied:

(.) (.) :: (a1 -> b2 -> b3) -> a1 -> (b1 -> b2) -> b1 -> b3

You might be wondering what that monstrous type signature does…

Interpreting type signatures. A great thing about polymorphic types is that there’s not very much non-pathological behavior that can be specified: because the type is fully polymorphic, we can’t actually stick our hand in the box and use the fact that it’s actually an integer. This property makes programs like Djinn, which automatically derive a function’s contents given a type signature, possible, and with a little practice, you can figure it out too.

Working backwards: we first take a look at b3. There’s no way for our function to magically generate a value of type b3 (excluding undefined or bottom, which counts as pathological), so there’s got to be something else in our script that generates it. And sure enough, it’s the first argument, but we need to pass it a1 and b2 first:

(.) (.) w x y z = w undefined undefined

We repeat the process for each of those types in turn: where is a1 specified? Well, we pass it in as the second argument. Where is b2 specified? Well, we have another function y :: b1 -> b2, but we need a b1 which is z. Excellent, we now have a full implementation:

(.) (.) w x y z = w x (y z)

Pointfree style as operator composition. So, we now know what (.) (.) does, but we don’t really have a good motivation for why this might be the case. (By motivation, I mean, look at (.) (.) taking function composition at face value, and then realizing, “oh yes, it should do that.”) So what we’d really like to focus on is the semantics of (.), namely function composition, and the fact that we’re currying it. One line of thought might be:

  1. Function composition is defined to be (f . g) x = f (g x).
  2. We’re partially applying the composition, so actually we have (f.) g x, but g is missing. (if the (f.) looks funny to you, compare it to (2+), which is partially applied addition. Note that addition is commutative, so you’re more likely to see (+2), which becomes (x+2) when applied.)
  3. f is actually another composition operator. Since functional composition is single-argument oriented, we want to focused on the curried version of (.), which takes a function and returns a function (1) that takes another function (2) and a value and returns the first function applied to the result of the second function applied to the value.
  4. Read out the arguments. Since (f.) is on the outside, the first argument completes the curry. The next argument is what will actually get passed through the first argument, and the result of that will get passed through f. The return value of that is another function, but (barring the previous discussion) we haven’t figured out what that would be yet. Still, we’ve figured out what the first two arguments might look like.

If we now cheat and look at the type signature, we see our hypotheses are verified:

(.) (.) :: (a1 -> b2 -> b3) -> a1 -> (b1 -> b2) -> b1 -> b3

The first argument g :: a1 -> b2 -> b3 completes the curry, and then the next argument is fed straight into it, so it would have to be a1. The resulting value b2 -> b3 is fed into the next composition operator (notice that it’s not a single variable, since the next composition forces it to be a 1-ary function) and is now waiting for another function to complete the curry, which is the next argument b1 -> b2 (i.e. b1 -> b2 -> b3). Then it’s a simple matter of supplying the remaining arguments.

I find thinking of functions as partially applied and waiting to be “completed” leads to a deeper intuitive understanding of what a complex chain of higher order functions might do.

Putting it together. It is now time to work out the types for fmap fmap fmap. We first write out the types for each fmap:

fmap :: (Functor f) => (a1 -> a2) -> f a1 -> f a2
fmap :: (Functor g) => (b1 -> b2) -> g b1 -> g b2
fmap :: (Functor h) => (c1 -> c2) -> h c1 -> h c2

Perform application and we see:

(a1 -> a2) == (b1 -> b2) -> g b1 -> g b2
f a1 == (c1 -> c2) -> h c1 -> h c2

Luckily enough, we have enough arguments to fill up the first fmap, so that’s one layer less of complexity. We can also further break these down:

-- from the first argument
a1 == b1 -> b2
a2 == g b1 -> g b2
-- from the second argument
a1 == h c1 -> h c2
f == (->) (c1 -> c2)

The last equality stems from the fact that there’s only one reasonable functor instance for (c1 -> c2) -> h c1 -> h c2; the functor for functions i.e. the reader monad, taking (c1 -> c2) as its “read-in”.

We can do a few more simplifications:

h c1 -> h c2 == b1 -> b2
b1 == h c1
b2 == h c2

Substitute everything in, and now we see:

fmap fmap fmap :: (Functor g, Functor h) =>
   (c1 -> c2) -> g (h c1) -> g (h c2)

Interpret the types and we realize that fmap fmap fmap does a “double” lift of a function c1 -> c2 to two functors. So we can run fmap fmap fmap (+2) [Just 3] and get back [Just 5] (utilizing the functor instance for the outer list and the inner maybe).

We also notice that the f functor dropped out; this is because it was forced to a specific form, so really fmap fmap fmap == fmap . fmap. This makes it even more obvious that we’re doing a double lift: the function is fmap‘ed once, and then the result is fmap‘ed again.

We can even use this result to figure out what (.) (.) (.) (or (.) . (.)) might do; in the functions fmap = (.), so a normal function is lifted into one reader context by the first fmap, and another reader context with the second fmap. So we’d expect (.) . (.) :: (a -> b) -> (r2 -> r1 -> a) -> (r2 -> r1 -> b) (recall that f a if f = (->) r becomes r -> a) and indeed, that is the case. Compose composed with compose is merely a compose that can take a 2-ary function as it’s second argument and “do the right thing!”

Why we stay up late

I was having a discussion a few nights ago about attention, and someone mentioned the fact that contiguous blocks of time are precious. It’s obvious once it’s been pointed out to you, and with it in mind I’ve noticed my tendency to bucket useful activities into various categories: checking email, reading news feeds and simple tasks fall into the “less than an hour” time bucket, while really actually creating software, fixing hard bugs or reading code fall into the “more than an hour, preferably several hours” time bucket. I’ve recognized that attempting to tackle the “more than an hour” bucket in the snatches of time between classes and extracurriculars is simply wasteful.

But unlike the programmer working a day job that Paul Graham describes in his essay, I am a college student. I mostly don’t do my work during the day; that’s the time to imbibe information from lectures and recitations. In particular, I’m an MIT student, which means that there’s no 5:00 “oh, time to go home and relax” period; it’s work all the time (and you grab precious moments of relief with flights of procrastination when you can.) My mornings and evenings are saturated by meetings: they require me to physically relocate myself and pay attention to something else on someone else’s schedule. And throw in orchestra rehearsal from 7:30-10PM or small ensemble rehearsal from 5-7PM or SIPB meetings from 7:30-8:30PM and you’ve got a recipe for fragmentation.

When can you get that uninterrupted time? Only late at night, because when you are working on that problem set at two in the morning, there is one good thing: no one has scheduled a meeting at 2:30 AM. And while it’s definitely a shame that these benefits may be outweighed by the intoxication of sleep deprivation, you’re not going to find this sort of contiguous block of time anywhere else. And that’s why we stay up late.

Hunting for constraints

> import Data.List
> import Control.Monad

The following question appeared as part of a numbers-based puzzle in the 2010 MIT Mystery Hunt:

His final level on Wizard of Wor was equal to the smallest number that can be written as the sum of 4 non-zero squares in exactly 9 ways

I’d like to explore constraint search in Haskell to solve this problem. The hope is to find a (search) program that directly reflects the problem as posed, and also gives us an answer!

Because we are looking for the smallest number, it makes sense to start testing from a small number and start counting up. We’ll assume that the answer to this question won’t overflow Int.

Now, we need to test if it can be written as the sum of 4 non-zero squares in exactly 9 ways. This problem reduces to “how many ways can n be written as the sum of squares”, which is another search problem.

We’ll assume that 4+1+1+1 and 1+4+1+1 don’t constitute distinct for the purposes of our nine squares. This results in the first piece of cleverness: if we impose a strict ordering on our squares, we once again get uniqueness.

We also need to bound our search space; while fair search can help to some degree with infinite failure, our implementation will be much simpler if we can do some early termination. A very simple condition to terminate on is if the sum of the squares exceeds the number we’re looking for.

Considering the case where we are matching for x, and we have candidate roots a, b and c. Then, the maximum the remaining square can be is x - a^2 - b^2 - c^2, and the maximum value for d is the floor of the square root. Square roots are cheap, and we’re using machine size integers, so things are good.

> floorSqrt :: Int -> Int
> floorSqrt = floor . sqrt . fromIntegral
>
> sumSquares :: [Int] -> Int
> sumSquares as = sum (map (^2) as)
>
> rootMax :: Int -> [Int] -> Int
> rootMax x as = floorSqrt (x - sumSquares as)

From there, we just write out the search for distinct sums of squares of a number:

> searchSumFourSquares :: Int -> [(Int, Int, Int, Int)]
> searchSumFourSquares x = do
>       a <- [1..(rootMax x [])]
>       b <- [a..(rootMax x [a])]
>       c <- [b..(rootMax x [a,b])]
>       d <- [c..(rootMax x [a,b,c])]
>       guard $ sumSquares [a,b,c,d] == x
>       return (a,b,c,d)

And from there, the solution falls out:

> search :: Maybe Int
> search = findIndex (==9) (map (length . searchSumFourSquares) [0..])

We cleverly use [0..] so that the index is the same as the number itself. Alternative methods might use tuples.