ezyang's blog

the arc of software bends towards understanding

Accessing lazy structures from C

Someone recently asked on haskell-beginners how to access an lazy (and potentially infinite) data structure in C. I failed to find some example code on how to do this, so I wrote some myself. May this help you in your C calling Haskell endeavours!

The main file Main.hs:

{-# LANGUAGE ForeignFunctionInterface #-}

import Foreign.C.Types
import Foreign.StablePtr
import Control.Monad

lazy :: [CInt]
lazy = [1..]

main = do
    pLazy <- newStablePtr lazy
    test pLazy -- we let C deallocate the stable pointer with cfree

chead = liftM head . deRefStablePtr
ctail = newStablePtr . tail <=< deRefStablePtr
cfree = freeStablePtr

foreign import ccall test :: StablePtr [CInt] -> IO ()
foreign export ccall chead :: StablePtr [CInt] -> IO CInt
foreign export ccall ctail :: StablePtr [CInt] -> IO (StablePtr [CInt])
foreign export ccall cfree :: StablePtr a -> IO ()

The C file export.c:

#include <HsFFI.h>
#include <stdio.h>
#include "Main_stub.h"

void test(HsStablePtr l1) {
    int x = chead(l1);
    printf("first = %d\n", x);
    HsStablePtr l2 = ctail(l1);
    int y = chead(l2);
    printf("second = %d\n", y);
    cfree(l2);
    cfree(l1);
}

And a simple Cabal file to build it all:

Name:                export
Version:             0.1
Cabal-version:       >=1.2
Build-type:          Simple

Executable export
  Main-is:             Main.hs
  Build-depends:       base
  C-sources:           export.c

Happy hacking!

Transparent xmobar

Things I should be working on: graduate school personal statements.

What I actually spent the last five hours working on: transparent xmobar.

image

It uses the horrible “grab Pixmap from root X window” hack. You can grab the patch here but I haven’t put in enough effort to actually make this a configurable option; if you just compile that branch, you’ll get an xmobar that is at 100/255 transparency, tinted black. (The algorithm needs a bit of work to generalize over different tints properly; suggestions solicted!) Maybe someone else will cook up a more polished patch. (Someone should also drum up a more complete set of XRender bindings!)

This works rather nicely with trayer, which support near identical tint and transparency behavior. Trayer also is nice on Oneiric, because it sizes the new battery icon sensibly, whereas stalonetray doesn’t. If you’re wondering why the fonts look antialiased, that’s because I compiled with XFT support.

(And yes, apparently I have 101% battery capacity. Go me!)

Update. Feature has been prettified and made configurable. Adjust alpha in your config file: 0 is transparent, 255 is opaque. I’ve submitted a pull request.

Ubuntu Oneiric upgrade (Thinkpad/Xmonad)

I upgraded from Ubuntu Natty Narwhal to Oneiric Ocelot (11.10) today. Lots of things broke. In order:

  • “Could not calculate the upgrade.” No indication of what the error might be; in my case, the error ended up being old orphan OpenAFS kernel modules (for whom no kernel modules existed). I also took the opportunity to clean up my PPAs.
  • “Reading changelogs.” apt-listchanges isn’t particularly useful, and I don’t know why I installed it. But it’s really painful when it’s taking more time to read changelogs than to install your software. Geoffrey suggested gdb -p `pgrep apt-listchanges and then forcing it to call exit(0)`, which worked like a charm. Had to do this several times; thought it was infinitely looping.
  • Icons didn’t work, menus ugly. Go to “System Settings > Appearance” and go set a new theme; in all likelihood your old theme went away. This AskUbuntu question gave a clue.
  • Network Manager stopped working. For some inscrutable reason the default NetworkManager config file /etc/NetworkManager/NetworkManager.conf has managed=false for ifupdown. Flip back to true.
  • New window manager, new defaults to dunk you in Unity at least once. Just make sure you pick the right window manager from the little gear icon.
  • gnome-power-manager went away. If you fix icons a not-so-useful icon will show up anyway when you load gnome-settings-daemon.
  • “Waiting for network configuration.” There were lots of suggestions here. My /var/run and /var/lock were borked so I did these instructions, I also hear that you should punt wlan0 from /etc/network/interfaces and remove it from /etc/udev/rules.d70-persistent-net.rules. I also commented out the sleeps in /init/failsafe.conf for good measure.
  • Default GHC is 7.0.3! Blow away your .cabal (but hold onto .cabal/config) and go reinstall Haskell Platform. Don’t forget to make sure you install profiling libraries, and grab xmonad and xmonad-contrib. Note that previous haskell-platform installs will be rather broken, on account of missing GHC 6 binaries (you can reinstall them, but it looks like they get replaced.)
  • ACPI stopped knowing about X, so if you have scripts for handling rotation, source /usr/share/acpi-support/power-funcs and run getXuser and getXconsole
  • DBUS didn’t start. This is due to leftover pid and socket files, see this bug
  • Was mysteriously fscking my root drive on every boot. Check your pass param in /etc/fstab; should be 0.
  • Redshift mysteriously was being reset by xrandr calls; worked around by calling it oneshot immediately after running xrandr.
  • Not sure if this was related to the upgrade, but fixed an annoyance where suspend-checking (in case you are coming out of hibernate) was taking a really long time in boot. Set resume to right swap in /etc/initramfs-tools/conf.d/resume and update-initramfs -u with great prejudice).

Unresolved annoyances: X11 autolaunching in DBUS, the power icon doesn’t always properly show AC information and is too small in stalonetray, xmobar doesn’t support percentage battery and AC coloring simultaneously (I have a patch), a totem built from scratch segfaults.

How to read Haskell like Python

tl;dr — Save this page for future reference.

Have you ever been in the situation where you need to quickly understand what a piece of code in some unfamiliar language does? If the language looks a lot like what you’re comfortable with, you can usually guess what large amounts of the code does; even if you may not be completely familiar how all the language features work.

For Haskell, this is a little more difficult, since Haskell syntax looks very different from traditional languages. But there’s no really deep difference here; you just have to squint at it just right. Here is a fast, mostly incorrect, and hopefully useful guide for interpreting Haskell code like a Pythonista. By the end, you should be able to interpret this fragment of Haskell (some code elided with ...):

runCommand env cmd state = ...
retrieveState = ...
saveState state = ...

main :: IO ()
main = do
    args <- getArgs
    let (actions, nonOptions, errors) = getOpt Permute options args
    opts <- foldl (>>=) (return startOptions) actions
    when (null nonOptions) $ printHelp >> throw NotEnoughArguments
    command <- fromError $ parseCommand nonOptions
    currentTerm <- getCurrentTerm
    let env = Environment
            { envCurrentTerm = currentTerm
            , envOpts = opts
            }
    saveState =<< runCommand env command =<< retrieveState

Types. Ignore everything you see after :: (similarly, you can ignore type, class, instance and newtype. Some people claim that types help them understand code; if you’re a complete beginner, things like Int and String will probably help, and things like LayoutClass and MonadError won’t. Don’t worry too much about it.)


Arguments. f a b c translates into f(a, b, c). Haskell code omits parentheses and commas. One consequence of this is we sometimes need parentheses for arguments: f a (b1 + b2) c translates into f(a, b1 + b2, c).


Dollar sign. Since complex statements like a + b are pretty common and Haskellers don’t really like parentheses, the dollar sign is used to avoid parentheses: f $ a + b is equivalent to the Haskell code f (a + b) and translates into f(a + b). You can think of it as a big opening left parenthesis that automatically closes at the end of the line (no need to write )))))) anymore!) In particular, if you stack them up, each one creates a deeper nesting: f $ g x $ h y $ a + b is equivalent to f (g x (h y (a + b))) and translates into f(g(x,h(y,a + b)) (though some consider this bad practice).

In some code, you may see a variant of $: <$> (with angled brackets). You can treat <$> the same way as you treat $. (You might also see <*>; pretend that it’s a comma, so f <$> a <*> b translates to f(a, b). There’s not really an equivalent for regular $)


Backticks. x `f` y translates into f(x,y). The thing in the backticks is a function, usually binary, and the things to the left and right are the arguments.


Equals sign. Two possible meanings. If it’s at the beginning of a code block, it just means you’re defining a function:

doThisThing a b c = ...
  ==>
def doThisThing(a, b, c):
  ...

Or if you see it to near a let keyword, it’s acting like an assignment operator:

let a = b + c in ...
  ==>
a = b + c
...

Left arrow. Also acts like an assignment operator:

a <- createEntry x
  ==>
a = createEntry(x)

Why don’t we use an equals sign? Shenanigans. (More precisely, createEntry x has side effects. More accurately, it means that the expression is monadic. But that’s just shenanigans. Ignore it for now.)


Right arrow. It’s complicated. We’ll get back to them later.


Do keyword. Line noise. You can ignore it. (It does give some information, namely that there are side effects below, but you never see this distinction in Python.)


Return. Line-noise. Also ignore. (You’ll never see it used for control flow.)


Dot. f . g $ a + b translates to f(g(a + b)). Actually, in a Python program you’d probably have been more likely to see:

x = g(a + b)
y = f(x)

But Haskell programmers are allergic to extra variables.


Bind and fish operators. You might see things like =<<, >>=, <=< and >=>. These are basically just more ways of getting rid of intermediate variables:

doSomething >>= doSomethingElse >>= finishItUp
  ==>
x = doSomething()
y = doSomethingElse(x)
finishItUp(y)

Sometimes a Haskell programmer decides that it’s prettier if you do it in the other direction, especially if the variable is getting assigned somewhere:

z <- finishItUp =<< doSomethingElse =<< doSomething
  ==>
x = doSomething()
y = doSomethingElse(x)
z = finishItUp(y)

The most important thing to do is to reverse engineer what’s actually happening by looking at the definitions of doSomething, doSomethingElse and finishItUp: it will give you a clue what’s “flowing” across the fish operator. If you do that, you can read <=< and >=> the same way (they actually do function composition, like the dot operator). Read >> like a semicolon (e.g. no assignment involved):

doSomething >> doSomethingElse
  ==>
doSomething()
doSomethingElse()

Partial application. Sometimes, Haskell programmers will call a function, but they won’t pass enough arguments. Never fear; they’ve probably arranged for the rest of the arguments to be given to the function somewhere else. Ignore it, or look for functions which take anonymous functions as arguments. Some of the usual culprits include map, fold (and variants), filter, the composition operator ., the fish operators (=<<, etc). This happens a lot to the numeric operators: (+3) translates into lambda x: x + 3.


Control operators. Use your instinct on these: they do what you think they do! (Even if you think they shouldn’t act that way.) So if you see: when (x == y) $ doSomething x, it reads like “When x equals y, call doSomething with x as an argument.”

Ignore the fact that you couldn’t actually translate that into when(x == y, doSomething(x)) (Since, that would result in doSomething always being called.) In fact, when(x == y, lambda: doSomething x) is more accurate, but it might be more comfortable to just pretend that when is also a language construct.

if and case are built-in keywords. They work the way you’d expect them to.


Right arrows (for real!) Right arrows have nothing to do with left arrows. Think of them as colons: they’re always nearby the case keyword and the backslash symbol, the latter of which is lambda: \x -> x translates into lambda x: x.

Pattern matching using case is a pretty nice feature, but a bit hard to explain in this blog post. Probably the easiest approximation is an if..elif..else chain with some variable binding:

case moose of
  Foo x y z -> x + y * z
  Bar z -> z * 3
  ==>
if isinstance(moose, Foo):
  x = moose.x # the variable binding!
  y = moose.y
  z = moose.z
  return x + y * z
elif isinstance(moose, Bar):
  z = moose.z
  return z * 3
else:
  raise Exception("Pattern match failure!")

Bracketing. You can tell something is a bracketing function if it starts with with. They work like contexts do in Python:

withFile "foo.txt" ReadMode $ \h -> do
  ...
  ==>
with open("foo.txt", "r") as h:
  ...

(You may recall the backslash from earlier. Yes, that’s a lambda. Yes, withFile is a function. Yes, you can define your own.)


Exceptions. throw, catch, catches, throwIO, finally, handle and all the other functions that look like this work essentially the way you expect them to. They may look a little funny, however, because none of these are keywords: they’re all functions, and follow all those rules. So, for example:

trySomething x `catch` \(e :: IOException) -> handleError e
  ===
catch (trySomething x) (\(e :: IOException) -> handleError e)
  ==>
try:
  trySomething(x)
except IOError as e:
  handleError(e)

Maybe. If you see Nothing, it can be thought of as None. So isNothing x tests if x is None. What’s the opposite of it? Just. For example, isJust x tests if x is not None.

You might see a lot of line noise associated with keeping Just and None in order. Here’s one of the most common ones:

maybe someDefault (\x -> ...) mx
  ==>
if mx is None:
  x = someDefault
else:
  x = mx
...

Here’s one specific variant, for when a null is an error condition:

maybe (error "bad value!") (\x -> ...) x
  ==>
if x is None:
  raise Exception("bad value!")

Records. The work they way you’d expect them too, although Haskell lets you create fields that have no names:

data NoNames = NoNames Int Int
data WithNames = WithNames {
  firstField :: Int,
  secondField :: Int
}

So NoNames would probably be represented as a tuple (1, 2) in Python, and WithNames a class:

class WithNames:
  def __init__(self, firstField, secondField):
    self.firstField = firstField
    self.secondField = secondField

Then creation is pretty simple NoNames 2 3 translates into (2, 3), and WithNames 2 3 or WithNames { firstField = 2, secondField = 3 } translates into WithNames(2,3).

Accessors are a little more different. The most important thing to remember is Haskellers put their accessors before the variable, whereas you might be most familiar with them being after. So field x translates to x.field. How do you spell x.field = 2? Well, you can’t really do that. You can copy one with modifications though:

return $ x { field = 2 }
  ==>
y = copy(x)
y.field = 2
return y

Or you can make one from scratch if you replace x with the name of the data structure (it starts with a capital letter). Why do we only let you copy data structures? This is because Haskell is a pure language; but don’t let that worry you too much. It’s just another one of Haskell’s quirks.


List comprehensions. They originally came from the Miranda-Haskell lineage! There are just more symbols. :

[ x * y | x <- xs, y <- ys, y > 2 ]
  ==>
[ x * y for x in xs for y in ys if y > 2 ]

It also turns out Haskellers often prefer list comprehensions written in multi-line form (perhaps they find it easier to read). They look something like:

do
  x <- xs
  y <- ys
  guard (y > 2)
  return (x * y)

So if you see a left arrow and it doesn’t really look like it’s doing side effects, maybe it’s a list comprehension.


More symbols. Lists work the way you would expect them to in Python; [1, 2, 3] is in fact a list of three elements. A colon, like x:xs means construct a list with x at the front and xs at the back (cons, for you Lisp fans.) ++ is list concatenation. !! means indexing. Backslash means lambda. If you see a symbol you don’t understand, try looking for it on Hoogle (yes, it works on symbols!).


More line noise. The following functions are probably line noise, and can probably be ignored. liftIO, lift, runX (e.g. runState), unX (e.g. unConstructor), fromJust, fmap, const, evaluate, an exclamation mark before an argument (f !x), seq, a hash sign (e.g. I# x).


Bringing it all together. Let’s return to the original code fragment:

runCommand env cmd state = ...
retrieveState = ...
saveState state = ...

main :: IO ()
main = do
    args <- getArgs
    let (actions, nonOptions, errors) = getOpt Permute options args
    opts <- foldl (>>=) (return startOptions) actions
    when (null nonOptions) $ printHelp >> throw NotEnoughArguments
    command <- fromError $ parseCommand nonOptions
    currentTerm <- getCurrentTerm
    let env = Environment
            { envCurrentTerm = currentTerm
            , envOpts = opts
            }
    saveState =<< runCommand env command =<< retrieveState

With some guessing, we can pop out this translation:

def runCommand(env, cmd, state):
   ...
def retrieveState():
   ...
def saveState(state):
   ...

def main():
  args = getArgs()
  (actions, nonOptions, errors) = getOpt(Permute(), options, args)
  opts = **mumble**
  if nonOptions is None:
     printHelp()
     raise NotEnoughArguments
  command = parseCommand(nonOptions)

  currentTerm = getCurrentTerm()
  env = Environment(envCurrentTerm=currentTerm, envOpts=opts)

  state = retrieveState()
  result = runCommand(env, command, state)
  saveState(result)

This is not bad, for a very superficial understanding of Haskell syntax (there’s only one obviously untranslatable bit, which requires knowing what a fold is. Not all Haskell code is folds; I’ll repeat, don’t worry about it too much!)

Most of the things I have called “line noise” actually have very deep reasons behind them, and if you’re curious behind the actual reasons behind these distinctions, I recommend learning how to write Haskell. But if you’re just reading Haskell, I think these rules should be more than adequate.

Thanks to Keegan McAllister, Mats Ahlgren, Nelson Elhage, Patrick Hurst, Richard Tibbetts, Andrew Farrell and Geoffrey Thomas for comments. Also thanks to two kind denizens of #python, asdf and talljosh`, for acting as Python-using guinea pigs.

Postscript. If you’re really curious what foldl (>>=) (return startOptions) actions does, it implements the chain of responsibility pattern. Hell yeah.

The new Reflections on Trusting Trust

In his classic essay Reflections on Trusting Trust, Ken Thompson describes a self-replicating compiler bug which is undetectable by source code inspection. The self-replication is made possible by the fact that most compilers are self-compiling: old versions of a compiler are used to compile new ones, and if the old version is malicious, it can slip the same bug when it detects it is compiling itself.

A new trend is precisely this self-hosting process, but for self-certifying typecheckers: typecheckers which are used to prove their own correctness. (Note that these are powerful typecheckers, close to being able to check arbitrary theorems about code.) This may seem a little odd, since I could write a trivial typechecker which always claimed it was correct. In order to work around this, we must bootstrap the correctness proof by proving the typechecker correct in another language (in the case of F*, this language is Coq). Once this has been done, we can then use this verified typechecker to check a specification of itself. This process is illustrated below.

image

image

image

The question then is whether or not such self-certifying typecheckers are similarly vulnerable to the problem Ken described for self-hosting compilers. For arguments sake, let’s assume that the backend compiler and runtime are certified (a strong assumption that is almost universally untrue, including for F*). Since the typechecker can’t insert malicious bugs into the programs it compiles (it only, you know, typechecks), one would have to rely on a bug in the source code itself. Surely such a bug would be obvious!

This is unclear: we have certified our implementation, but what of our specification? In Coq, we proved various theorems about the soundness and adequacy of our type system, which give us at least some hope that it is correct in the way we expect. But these proofs are nowhere to be seen in the emancipated F* world. If we want to evolve our specification (less plausible for a full-blooded dependently typed language, but within the realm of possibility for a less powerful one), we must turn back to Coq and adjust the relevant theorems. Otherwise, we run the risk of changing our type system to an unsound one.

image

image

Fortunately, that’s all we have to do: we can use the old F* type checker to certify the new one, rather than attempt to export certificates and reverify with them Coq. All told, though, don’t throw out your Coq code yet… not, at least, if you think your type system may change in the future.

Obviously Correct

What do automatic memory management, static types and purity have in common? They are methods which take advantage of the fact that we can make programs obviously correct (for some partial definition of correctness) upon visual inspection. Code using automatic memory management is obviously correct for a class of memory bugs. Code using static types is obviously correct for a class of type bugs. Code using purity (no mutable references or side effects) is obviously correct for a class of concurrency bugs. When I take advantage of any of these techniques, I don’t have to prove my code has no bugs: it just is, automatically!

Unfortunately, there’s a catch. What all of these “obviously correct” methodologies ask you do is to sacrifice varying degrees of expressiveness at their altar. No more pointer tricks. No more playing fast and loose with data representation. No more mutation. If this expressiveness was something most people really didn’t want anyway (e.g. memory management), it is happily traded away. But if it’s something they want, well, as language designers, we’re making it harder for people to do things that they want to do, and it shouldn’t surprise us when they grab their torches and pitchforks and storm the ivory tower, assertions about correctness and maintainability be damned.

It seems to me that we must fight fire with fire: if we’re going to take away features, we better be giving them compelling new features. With static types you also get pattern matching, QuickCheck style property testing, and performance benefits. With purity, you get software transactional memory and speculative evaluation. Discovering and implementing more of these “killer apps” is the key to adoption. (Some research that I’m currently doing with Adam Chlipala is leveraging purity to offer automatic caching for web applications. It’s not much, but I think it’s in the right direction.)

I still have a fanatical devotion to correctness. But these days, I suspect that for most people, it’s something bitter, like medicine, to be taken with some better tasting features. That’s fine. Our challenge, as programming language researchers, is to exploit correctness to bring tangible benefits now, rather than nebulous maintainability benefits later.

Thanks Nelson Elhage and Keegan McAllister for their comments.


Postscript: Performance of static types versus dynamic types. An earlier draft of this post pointed at Quora’s decision to move to Scala from Python as a clear indicator of this fact. Unfortunately, as several pre-readers pointed out, there are too many confounding factors to make this claim definitive: CPython was never explicitly engineered for performance, whereas the JVM had decades of work poured into it. So I’ll have to leave you with a more theoretical argument for the performance of static types: the optimization techniques of runtime just-in-time compilers for dynamic compilers involves identifying sections of code which are actually statically typed, and compiling them into the form a static compiler will. So, if you know this information ahead of time, you will always do better than if you know this information later: it’s only a question of degree. (Of course, this doesn’t address the possibility that JIT can identify information that would have been difficult to determine statically.)

Postscript: Shared transactional memory. Joe Duffy had a great retrospective on transactional memory and the experience he had attempting to implement it for Microsoft’s stack. And despite a great enthusiasm for this idea, it’s interesting to note this quote:

Throughout all of this, we searched and searched for the killer TM app. It’s unfair to pin this on TM, because the industry as a whole still searches for a killer concurrency app. But as we uncovered more successes in the latter, I became less and less convinced that the killer concurrency apps we will see broadly deployed in the next 5 years needed TM. Most enjoyed natural isolation, like embarrassingly parallel image processing apps. If you had sharing, you were doing something wrong.

Richard Tibbetts points out that concurrency is often addressed at an architectural level lower than what most working programmers want to deal with, and so while STM is a killer application for those platforms, most developers don’t want to think about concurrency at all.

Polyglot programming

Being back in town over MIT’s Infinite Activities Period is making me think about what kind of short lecture I want to try teaching. I’ve been turning over the idea of a polyglot programming class in my head: the idea is that while most people feel really comfortable with only one or two programming languages, you can learn how to make this knowledge work for you in almost any programming language you could possible language.

Unfortunately, I don’t have a good idea of what these skills actually are, nor do I have a sense for what kinds of things people would want to know. Nor do I think that I could jam this into two hours of lecturing: topics that I probably would want to cover are:

History of Programming Languages. Knowing how all the lineages tie together will help you figure out when a language feature will work the way you expect it to (since they just stole it from another language in the same line), and when, actually, it won’t work at all. It lets you nicely encapsulate the main big ideas of language features, which you can then explore the infinite variations of. It gives you groups of languages which mostly have the same idioms.

Street smarts and bootstrapping. What are the first things you should look for when you’re getting acquainted with a new language? Syntax? Cheat sheets? Tutorials? How to organize this torrent of information, what to do first, where to ask questions, what to learn how to do. How to interpret error messages you know nothing about. How to navigate the development ecosystem and assess libraries you know nothing about. How to source dive code in languages you know nothing about. Common bumps on the road towards Hello World. Unusual and universal ways of debugging.

Interoperability and FFI. What are the basic building blocks for higher-level data types in most of these languages, and what do they look like in memory? How do you make lots of different languages talk to each other efficiently! How about garbage collection, reference pinning and concurrency? Common impedance mismatches between languages.

Suggestions and comments would be appreciated.

Why you shouldn't do a PhD in systems

The opinions presented in this post are not necessarily mine. I’m just one very confused undergraduate senior with a lot of soul searching to do.

When I tell my friends, “I’m going to get a PhD,” I sometimes get the response, “Good for you!” But other times, I get the response, “Why would you want to do that?” as if I was some poor, misguided soul, brainwashed by a society which views research as the “highest calling”, the thing that the best go to, while the rest go join industry. “If you’re a smart hacker and you join industry,” they say, “you will have more fun immediately, making more impact and more money.”

Time. Getting a PhD takes a lot of time, they tell me. Most programs like to advertise something like five, or maybe six years, but if you actually look at the statistics, it’s actually something that could potentially extend up to nine or ten years. That’s a really huge chunk of your life: effectively all of your twenties and, honestly, there might be much better things for you to do with these tender years of your life. A PhD is merely a stepping stone, a necessary credential to be taken seriously in academia but not anything indicative of having made a significant contribution to a field. And for a stepping stone, it’s an extremely time consuming one.

There are many other things that could have happened during this time. You could have began a startup and seen it get acquired or sink over a period of three years: six years seems like two lifetimes in a context like that. You could have began a career as a professional in an extremely hot job market for software engineers, hopping from job to job until you found a work that you were genuinely interested in: as a PhD you are shackled to your advisor and your university. The facilities for change are so incredibly heavyweight: if you switch advisors you effectively have to start over, and it’s easy to think, “What did I do with the last three years of my life?”

Money. There is one thing you didn’t do in those last few years: make money. PhDs are the slave labor that make the academic complex run. It’s not that universities aren’t well funded by grants: indeed, the government spends large amounts of money funding research programs. But most of this money never finds its way to PhDs: you’re looking at a $30k stipend, when a software engineer can easily be making $150k to $200k in a few years at a software company. Even once you make it into a tenure track position, you are still routinely making less than people working for industry. You don’t go into academia expecting to get rich.

Scarcity. Indeed, you shouldn’t go into academia expecting to get much of anything. The available tenured positions are greatly outstripped by the number of PhD applicants, to the point that your bid into the academic establishment is more like a lottery ticket. You have to be doing a postdoc—e.g. in a holding pattern—at precisely the right time when a tenure position becomes vacated (maybe the professor died), or spend years building up your network of contacts in academia in hopes of landing a position through that connection. Most people don’t make it, even at a second or third tier university. The situation is similar for industrial research labs, which become rarer and rarer by the year: Microsoft Research is highly selective and as a prospective PhD, you are making a ten year bet about its ability to survive. Intel Labs certainly didn’t.

Tenure isn’t all that great. But even if you do make it to tenure, it isn’t actually all that great. You’ve spent the last decade and a half fighting for the position in a competitive environment that doesn’t allow for any change of pace (god forbid you disappear for a year while your tenure clock is ticking), and now what? You are now going to stay at the institution you got tenure at for the rest of your life: all you might have is a several month sabbatical every few years. It’s the ivory handcuffs, and you went through considerably more effort to put them on than that guy who went to Wall Street. And as for the work? Well, you still have to justify your work and get grant funding, you still need to serve on committees and do other tasks which you simply must do which are not at all related to your research. In industry, you could simply hire someone to handle your post on your university’s “Disciplinary Committee”—in academia, that’s simply not how it works.

Lack of access. And if you are a systems researcher, you don’t even get the facilities you need to do the large-scale research that is really interesting. Physicists get particle accelerators, Biologists get giant labs, but what does the systems researcher get? Certainly not a software system used by millions of users around the world. To get that sort of system, you have to go to industry. Just ask Matt Welsh, who made a splash leaving a tenured position at Harvard to go join Google. Working in this context lets you actually go and see if your crazy ideas go and work.

Don’t do it now. Of course, at this point I’m mentally protesting that this is all incredibly unfair to a PhD, that you do get more freedom, that maybe some people don’t care that much about money, that this is simply a question about value systems, and really, for some people, it’s the right decision. I might say that your twenties are also the best time to do your PhD, that academia is the correct late-career path, that you can still do a startup as a professor.

Perhaps, they say, but you need to figure out if this is the right decision for you. You need experience in both areas to make this decision, and the best time to do this is sampling industry for two or three years before deciding if you want to go to industry. After you graduate your PhD, people mentally set their timer on your potential as an academic, and if you don’t publish during that time people will stop taking you seriously. But if you start a PhD in your mid-twenties, no one will bat an eye. Everyone can have a bad software internship; don’t let that turn you away from industry. We solve cool problems. We are more diverse, in aggregate we give more freedom. In no sense of the word are we second-class.

They might be right. I don’t know.

Let's play a game

Ever wondered how Haskellers are magically able to figure out the implementation of functions just by looking at their type signature? Well, now you can learn this ability too. Let’s play a game.

You are an inventor, world renowned for your ability to make machines that transform things into other things. You are a proposer.

image

But there are many who would doubt your ability to invent such things. They are the verifiers.

image

The game we play goes as follows. You, the proposer, make a claim as to some wondrous machine you know how to implement, e.g. (a -> b) -> a -> b (which says given a machine which turns As into Bs, and an A, it can create a B). The verifier doubts your ability to have created such a machine, but being a fair minded skeptic, furnishes you with the inputs to your machine (the assumptions), in hopes that you can produce the goal.

image

As a proposer, you can take the inputs and machines the verifier gives you, and apply them to each other.

image

But that’s not very interesting. Sometimes, after the verifier gives you some machines, you want to make another proposal. Usually, this is because one of the machines takes a machine which you don’t have, but you also know how to make.

image

The verifier is obligated to furnish more assumptions for this new proposal, but these are placed inside the cloud of abstraction.

image

You can use assumptions that the verifier furnished previously (below the cloud of abstraction),

image

but once you’ve finished the proposal, all of the new assumptions go away. All you’re left with is a shiny new machine (which you ostensibly want to pass to another machine) which can be used for the original goal.

image

These are all the rules we need for now. (They constitute the most useful subset of what you can do in constructive logic.)

Let’s play a game.

image

Our verifier supplies the machines we need to play this game. Our goal is r.

image

That’s a lot of machines, and it doesn’t look like we can run any of them. There’s no way we can fabricate up an a from scratch to run the bottom one, so maybe we can make a a -> r. (It may seem like I’ve waved this proposal up for thin air, but if you look carefully it’s the only possible choice that will work in this circumstance.) Let’s make a new proposal for a -> r.

image

Our new goal for this sub-proposal is also r, but unlike in our original case, we can create an r with our extra ingredient: an a: just take two of the original machines and the newly furnished a. Voila, an r!

This discharges the cloud of abstraction, leaving us with a shiny new a -> r to pass to the remaining machine, and fulfill the original goal with.

image

Let’s give these machines some names. I’ll pick some suggestive ones for you.

image

Oh hey, you just implemented bind for the continuation monad.

image

Here is the transformation step by step:

m a -> (a -> m b) -> m b
Cont r a -> (a -> Cont r b) -> Cont r b
((a -> r) -> r) -> (a -> ((b -> r) -> r)) -> ((b -> r) -> r)
((a -> r) -> r) -> (a -> (b -> r) -> r) -> (b -> r) -> r

The last step is perhaps the most subtle, but can be done because arrows right associate.

As an exercise, do return :: a -> (a -> r) -> r (wait, that looks kind of familiar…), fmap :: (a -> b) -> ((a -> r) -> r) -> (b -> r) -> r and callCC :: ((a -> (b -> r) -> r) -> (a -> r) -> r) -> (a -> r) -> r (important: that’s a b inside the first argument, not an a !).

This presentation is the game semantic account of intuitionistic logic, though I have elided treatment of negation and quantifiers, which are more advanced topics than the continuation monad, at least in this setting.