Inside 206-105

Existential Pontification and Generalized Abstract Digressions

The GHC scheduler

I’d like to talk about some nitty-gritty details of GHC’s thread scheduling, discovered over the course of working on stride scheduling for GHC. Most of these choices are merely implementation details and are not part of any specification. While these choices shouldn’t be relied upon, they are worth knowing, since many of these details were accreted over the course of many performance bugs, benchmark tests and other battles. In this post, I’ll attempt to give some historical insight into why many choices were made. These insights should generalize to any system that would like to implement green threads, lightweight threads that use less memory than traditional operating system threads. For space reasons, I’m not going to talk about STM or sparks (though they are also quite interesting).

Update: A large portion of this material has been incorporated into the scheduler page in the GHC commentary

Anatomy of a thread

I’d first like to discuss some brief background about the runtime system first and point out some perhaps nonintuitive design choices. A thread is represented by a TSO (thread-state object) by GHC, i.e. the StgTSO struct in includes/rts/storage/TSO.h. [1] In Haskell, TSOs can be passed around as ThreadId objects. The Stg in front of the struct name indicates that TSOs are garbage collected, like other closures in Haskell. The TSO, along with the stack allocated with it (STACK), constitute the primary memory overhead of a thread. Default stack size, in particular, is controlled by the GC flag -ki, and is 1k by default. [2] Threads are run by Capabilities, which can be thought of virtual cores managed by GHC. Capabilities are, in turn, mapped to true operating system threads, or Tasks, though we won’t talk about them much.

Being garbage collected has two major implications for TSOs. First, TSOs are not GC roots, so they will get GC'd if there is nothing holding on to them (e.g. in the case of deadlock), and their space is not automatically reclaimed when they finish executing [3]. Usually, a TSO will be retained by a Capability’s run queue (a GC root), or in the list of waiting threads of some concurrency variable, e.g. an MVar. Second, a TSO must be considered a mutable object, and is thus subject to the conventional GC write barriers necessary for any mutable object in a generational garbage collector. [4] The dirty bit tracks whether or not a TSO has been modified; it is always set when a thread is run and also when any of the pointer fields on a TSO are modified. Two fields, set by setTSOLink and setTSOPrev, are of particular interest to the scheduler.

Run queue

The run queue is at the heart of the scheduler, as any runnable thread will hit the run queue before the scheduler actually pops it off the queue and runs it. There’s one per capability rts/Capability.h (in the bad old days, there was a global run queue, but this performed badly for multithreaded processes), and it is implemented as a doubly-linked list run_queue_hd and run_queue_tl. [6] The head and tail pointers mean that the queue is actually a deque: this is important because the scheduler will often have to handle threads that were interrupted in some way, and should let the threads get back on. The links themselves are on the TSOs and modified with setTSOLink and setTSOPrev, so modifying the queue dirties the TSOs involved. [7] Otherwise, the run queue is exclusively owned by the scheduler. If there are idle capabilities and if we have more than one thread left in our run queue, threads will be pushed to other queues with schedulePushWork.

Threads are put in front (pushOnRunQueue) if:

  • A stack overflow occurs;
  • A heap overflow occurs; [8]
  • A task attempts to run a thread, but it is bound and the current task is the wrong one;
  • A thread is associated with a black hole (a thunk that is being evaluated), and another thread, possibly on another capability, has blocked on its evaluation (see ticket #3838);
  • In the threaded runtime, if a thread was interrupted because another Capability needed to do a stop-the-world GC (see commit 6d18141d8);
  • In the non-threaded runtime, when a thread waiting on IO unblocks.

Threads are put in back (appendToRunQueue) in the case of pre-emption, or if it’s new; particularly, if

  • A thread was pre-empted via the context switch flag (e.g. incoming message from another thread, the timer fired, the thread cooperatively yielded, etc; see also [8] on how this interacts with heap overflows);
  • It is a new thread (so large amounts of thread creation do not starve old threads, see conc004 and commit 05881ecab);
  • A thread becomes unblocked;
  • A thread is migrated to another capability (though, in this case, the queue was empty anyway);
  • A thread finishes, but for some reason we need to keep it around (this is related to in-calls, though I’m not a 100% sure what is going on here; if you know, please tell me!)

Benchmarks

Benchmarks like nofib are very important, even if they are synthetic, as they will often be construed as primary evidence whether or not a change to the scheduler speeds or slows things down. One reason is that it is much easier to tell why a short program that torture tests threads has slowed down than it is to tell why a large, complicated multithreaded program no longer seems very snappy. But really, the main motivation is convenience: nofib programs are easy to measure and easy to compare. Fortunately, the tests often measure something quite specific, so I’d like to describe the tests that compose the smp nofib suite here:

  • callback001 (also known as ffi014) performs a large number of incalls to Haskell from C from a large number of threads. This is a rather specific test related to how we place threads in the run queue even if they’ve finished, if they finished in an in-call.
  • callback002 measures how quickly we can perform incalls to Haskell from C.
  • chan measures how scheduling order effects memory usage: if threads are allowed to run for a bit without getting context switched, they build up data in channels. This is related to when we reset the context switch flag (see [8]).
  • sieve implements the Sieve of Eratosthenes, spawning many threads to evaluate thunks of a lazy list in parallel. It performs a bit of allocation, and sensitive to what happens to threads after a HeapOverflow.
  • threads001 tests how quickly we can create a thread and then context switch to it.
  • threads003 tests how quickly many threads can communicate by reading and writing MVars. It is a bit sensitive to what happens to threads after they wake up from sleeping.
  • threads006 tests how quickly threads can be created and destroyed, as well as throwTo blocking performance. It is very sensitive to the number of major GCs that occur (which can be influenced if TSO size changes).
  • threads007 generates a lot of threads waiting on MVars, and then sees how shutdown behavior is affected. It was due to bad behavior in the MVar queue and fixed in f4692220c7.

Conclusion

The GHC scheduler is pretty complicated! Much of the current behavior was created in response to specific problems: the right choices are not obvious a priori! I hope this post will serve as a valuable reference for any future GHC hackers interested in playing around with the scheduler, as well as for anyone else who needs to implement a scheduler for their runtime system. Much of the historical data was gleaned from comments (though I found some out-of-date ones), liberal use of git blame, and cross-referencing with the bug tracker—these are all useful places to figure out, “Well, why does that code do that?” In this post, I hope I’ve answered that question, to some degree.


[1] Initialization of StgTSO is handled in createThread in rts/Threads.c; this function is in turn invoked by createGenThread, createIOThread and createStrictIOThread in rts/RtsAPI.c. These functions setup the initial stack state, which controls what the thread executes when it actually gets run. These functions are the ones invoked by the fork# and other primops (entry-points for primops are located in rts/PrimOps.cmm).

[2] Actually, your usable stack will be a little smaller than that because this size also includes the size of the StgTSO struct. (This is only really for allocating lots of threads into one block, however, as once a GC occurs the TSOs and stacks will no longer be adjacent.)

[3] Here is a sample program which demonstrates how holding onto ThreadId using stable pointers (which force the object their pointing to to never be GC'd) can leak memory:

import Control.Concurrent
import Control.Monad
import Foreign.StablePtr

n = 400000
main = do
    ms <- replicateM n (newEmptyMVar >>= \m -> (forkIO (putMVar m ()) >>= newStablePtr) >> return m)
    mapM_ takeMVar ms

The heap profile of the run shows none of the TSO/STACK objects being deallocated, even when the MVars drain out as threads finish executing.

[4] The write barrier for generational GCs refers not to memory barrier of multithreaded execution, but rather, notification for the garbage collector when a mutable reference in the old generation changes, and may now possibly point to an object in the young generation. Write barriers are necessary because the old generation will not be traversed during a minor collection, and thus if old generations may point to an object in a young generation, we may miss the fact that a young object is still alive even though it has no references from other young objects. In GHC, a write barrier is implemented by adding an object to the mutable list (mut_list) of a Capability if it is not in the youngest generation. (Some objects, like MutArr#, are permanently on the mutable list; in such a case, a write barrier may not be necessary. But see [5] for more details.) Objects will usually track their dirty status, so that they don’t add themselves to the mutable list multiple times. (Accidentally adding an object multiple times is harmless, but means the GC has to do extra work traversing the mutable list.) Additionally, if we can guarantee that the new reference does not point to the young generation (for instance, it is a static closure like END_TSO_QUEUE), then dirtying the object is not necessary. Getting this stuff right is tricky, to say the least!

[5] There is a bit of a sordid story here. Keeping an object permanently on the mutable list is done by scavenge_mutable_list in rts/sm/Scav.c, which will unconditionally re-add such an object to the mutable list if it sees it there. How does the object get on the mutable list in the first place? It’s not placed on the list upon creation; rather, upon the first minor GC on the youngest generation, the scavenging GC notices the object and places it on the mutable list by gct->failed_to_evac = rtsTrue. How do we end up freeing the object? The mutable list is considered a set of root pointers, but it is only scavenged, not evacuated. If an item on the mutable list ends up not being evacuated, it will be blown away regardless. (This does mean, however, that its elements will not be freed until the next GC.) Isn’t it really inefficient to always be scanning these arrays? Yes, and this used to be a problem (ticket #650), nowadays mitigated by card marking. The same story applied to TSOs (ticket #1589), but the fix here was to properly apply a write barrier and not keep the objects permanently on the mutable list; this improved performance quite a bit when there were a lot of threads (even if you don’t scavenge their pointers, traversing a huge mutable list is still a pain.) Creating a lot of small mutable arrays is apt to be painful.

[6] It used to be singly linked, but fixing ticket #3838 required the ability to remove TSOs from the run queue.

[7] Since these fields are always traversed by the GC, it’s important that they do not contain NULL pointers or garbage. Instead, we set them to the static closure END_TSO_QUEUE. Because this is guaranteed not to be in the young generation, this is why you do not need to dirty the TSO after setting this field.

[8] Sometimes, a heap overflow and a context switch occur simultaneously. If the thread requested a large block, we still always push it in front (because we don’t want another thread to steal our large block); however, otherwise, the context switch takes precedence and the thread is booted to the end of the queue—the context switch is checked as late as possible. (See commit 05881ecab)

  • January 28, 2013

NLP: the missing framework

So you want to make a web app. In today’s world, there is a panoply of software to assist you: you can use an all-in-one framework, or you can grab libraries to deal with the common needs of templating, database access, interactivity, etc. These libraries unify common functionality and take care of edge-cases you might otherwise not have the resources to deal with.

But there is one tool which is conspicuously absent: the natural language processing library.

“Now wait!” you may be saying, “of course there are NLP libraries, nltk and lingpipe come to mind.” Sure, but are you actually using these libraries? “Maybe not, but my application doesn’t need NLP, you see.”

The thing is, you are doing language processing in your application, even if you don’t realize it: “string concatenation” is really just a simple form of natural language generation, a subfield of NLP in its own right. [1] If you need to perform a more complicated task, such as pluralize nouns, capitalize sentences or change the grammatical form of verbs, you’ll need linguistic data. [2] This data is an essential part of many traditional NLP tasks. However, if you need to pluralize something today, you’re more likely to copy-paste a list of regexes off the Internet rather than think, “Hm, I should install an NLP library.” Part of this is because, while NLP libraries do contain this data, it is not publicized well.

It’s also worth considering if your application could benefit from any traditional NLP, including keyword generation, canonicalization (When are two things written slightly differently the same?), language identification, full-text search, autocompletion, topic detection and clustering, content summarization, parsing human-written dates and locations, etc. While it’s rare for an application to need all of these features, most would benefit from a few of them. For example, a blog application might want keyword generation to generate tags, full-text search to search posts, content summarization for non-fullpage views of posts, and date parsing for scheduling posts. These features tend to be absent, however, because they are often difficult to implement properly. Modern approaches often require models to be trained on large corpora of data—so-called data-driven models. Most of the time, this setup cost doesn’t seem worth it; if the feature is to be implemented (e.g. as an extension), a bag of heuristics is quicker.

Both of these problems hint at the trouble with current NLP frameworks: they assume that users are interested in building NLP systems, as opposed to using NLP systems. I shouldn’t need a PhD in computational linguistics to get my nouns to pluralize correctly or parse dates robustly. I shouldn’t need a PhD to get passable results on conventional, well-studied NLP applications. The default expectation should not be that users need to train a model: pre-existing models can easily be reused. Although there is an upper limit to how good an NLP algorithm can do without any tuning, the principled approach can still offer improvements over heuristics. But even more importantly, once a model is being used, developers who want to improve their results can train their own model on text from their own application, which is likely to carry domain-specific terminology and patterns. The library should be initially easy to use, and principled enough to be a gateway drug into the wonderful world of computational linguistics. Who knows what other applications could arise when developers recognize NLP as an accessible tool for their toolbox? [3]

Here is my call to arms: I want to see all of the current “baby-NLP” functionality collected together into a single place, where they get benefit from shared linguistic data and serve as easy-to-use features that initially attract developers. I would like to see more complicated but useful NLP technology become more accessible to a non-linguistic audience. And I would like all of this to be based on principled NLP foundations, so that it is possible to improve on the out-of-the-box models and algorithms. NLP practitioners are often very careful not to overstate what their systems are capable of (in contrast to the irrational exuberance of the 1980s). That’s OK: sometimes, the bar really is that low.

Thanks to Gregory Price, Eric Kow and Richard Tibbetts for helping review earlier drafts of this article.


[1] As a field, natural language generation doesn’t really consider string concatenation to be a true method; instead, it is interested in how to generate text from a functional description of intent. One neat example is referring expression generation.

[2] For example, the functionality (e.g. pluralization rules collected in the language/ folder in MediaWiki. MediaWiki is one of the most international open source projects, and I find it a fascinating source of information about linguistic oddities in foreign languages.

[3] As an example, I'd like to sketch how natural language generation can assist internationalization of applications. Suppose that you would like to let a user know that “you have three new messages.” The most obvious way to implement this would be with: printf("You have %d new message(s).", numMessages). Now, there are a number of shortcuts that have been taken here: we always print out a numeric digit, rather than AP style which uses English for numbers between zero and nine, and we’ve sidestepped whether or not “message” should be pluralized by tacking on an (s) on the end.

If we’d like to handle those cases, the next obvious thing to do is to add a few new functions: we’ll need a function apnumber to convert 3 to three, and we’ll need a function pluralize to convert message into messages when numMessages is greater than one. So you would end up with something like printf("You have %s new %s", apnumber(numMessages), pluralize("message", numMessages)). This is the ad hoc approach which will work reasonably well on English but will get you into trouble when you realize other languages have things like noun-adjective agreement (“nouveau message” versus “nouveaux messages”). Internationalization frameworks have long recognized and offered mechanisms for dealing with these cases; however, the average English-based project is unlikely to know about these problems until they internationalize.

However, there exists a representation which is agnostic to these issues. Consider the dependency grammar of this sentence, which we have extracted with a little NLP:

nsubj(have-2, You-1)
root(ROOT-0, have-2)
num(messages-5, three-3)
amod(messages-5, new-4)
dobj(have-2, messages-5)

We might ask, “Given data of this form, can we automatically generate an appropriate sentence in some language, which conveys the information and is grammatically correct?” That is a pretty hard task: it is the fundamental question of NLG. (It's not quite equivalent to machine translation, since we might require a user to add extra information about the functional intent that would otherwise be very hard to extract from text.) While it would be cool if we had a magic black box which could crank out the resulting sentences, even today, the tools developed by NLG may help reduce translator burden and increase flexibility. I think that’s well worth investigating.

  • January 2, 2013

Google Nexus 7 setup notes

I acquired a Google Nexus 7 (Wi-Fi only) over winter break. I don’t really like getting new devices: they invariably require a lot of work to setup to my liking. Here are some notes:

  • Jailbreaking the device from Linux is still fiddly. Ultimately, it’s probably easiest to just find a Windows box and use the Nexus Root Toolkit. The tool is somewhat racy; try the detection code again if it fails the first time.
  • Transferring files to/from Linux is a pain in the ass. I have SCP over SSHDroid working; I also tried both DropBear SSH Servers but they did not come with scp binaries and were thus fairly useless for the purpose of file transfer. SSHDroid didn’t work out of the box: I needed to apply comment 14 to make the real scp binaries get picked up in the path. By default, these apps are configured to accept password-authentication (not even keyboard-interactive!) with extremely weak default passwords: make sure you disable that. Still looking for a good rsync implementation. On the USB side, Ubuntu/Gnome/Nautilus natively recognised Nexus in PTP mode but when I tried copying files it hung. MTP is fairly unsupported by Ubuntu 12.10, but go-mtpfs works decently well given a sufficiently modern libmtp. Adam Glasgall has packaged libmtp for Quantal, so go add his PPA, and then follow the installation instructions of go-mtpfs. Update: Transferring files directly to removable media has also worked reasonably well.
  • The tablet really does feel like a phone, courtesy of both being on the Android platform. But no 3G means offline is a lot more important, and the larger screen makes certain types of applications a lot more pleasant to use (Update: I’ve settled on MX Player as my video player of choice, since it supports Advanced SubStation Alpha subtitling and MKV files. Unfortunately, it doesn't support deep color (e.g. 10-bit).)
  • Micro USB to USB OTG cable is really handy, esp. for hooking up keyboards or external media. I’d dare say, it’s a more essential accessory than a cover. Note that the micro-USB port isn’t able to power USB devices with high power requirements (e.g. spinning platter external disks), so you’ll need a powered USB hub to connect them. (One symptom of this is if you try to mount an under-powered hard drive, the directory listing will persistently come up empty. It may also may make clicking noises: probably not good for the drive.) I use USB-OTG to perform mounting.
  • I tried to get my paper database on Mendeley mirrored onto my tablet, but it's been pretty tough. I’ve been trying to use Referey, which is a Mendeley-client for Android, but it requires me to somehow propagate my Mendeley SQLite database and all of my PDFs. Dropbox seems like a good match here, except that the official Dropbox client doesn't support keeping entire folders synced (only favorite files). If you’re like me, and you don't know exactly what papers you are going to be reading, you have to use something different, e.g. Dropsync. (BTW, if you, like me, have the clever idea of putting the SQLite database with your PDFs, so they all get synced in one folder, don’t ever "Tidy Up": Mendeley will happily delete your SQLite database as a “foreign object”.) Mendeley and Dropbox seem to interact poorly with each other in various ways (case-sensitivity; also, Mendeley likes to make filenames that are too long, and Dropbox will stupidly and happily accept them).
  • The “open windows” button doesn’t appear to properly respect when an application is closed through its own volition (i.e. through an exit button natively supported by the aplication.) This is a bit annoying.

Oh yeah, and Happy New Year. :)

Update: I had my Nexus 7 inexplicably brick itself. Fortunately, once the phone is unlocked, it is very easy to reflash the image (and I didn’t lose data, which normally occurs when you first unlock a fone). I did this by fastboot update image-nakasi-jop40d.zip while the phone was in the bootloader (hold down both volume keys while powering up, and the image was downloaded from Google), and then applying the last set of steps from here to get SuperSu installed again (i.e. fastbooting into ClockworkMod and then sideloading SuperSu).

  • December 31, 2012

Metro Maps of the News

Metro maps are a visual metaphor for complex, interdependent story lines developed by Dafna Shahaf. Dafna’s thesis involved techniques for automatically taking a corpus of news articles and extracting a coherent narratives that covered the overall space. For our final CS448b project, we took one of the narratives Dafna had generated and created a system for displaying the maps. (The demo is best viewed on a large monitor.)

/img/metromap.png

We only had enough time to get the viewer aspect polished, but we think that it would not be too difficult to extend this framework for the construction of metro maps (in case you don’t have access to Dafna’s algorithm).

This is joint work with Russell Chou and Jacob Jensen.

  • December 13, 2012

Maildir synchronizing Sup

On the prompting of Steven Hum, I've put some finishing touches on my Sup patchset and am “releasing” it to the world (more on what I mean by “release” shortly.) The overall theme of this patchset is that it integrates as much Sup metadata it can with Maildir data. In particular:

  • It merges Damien Leone’s sync-back patchset with the latest Sup mainline. The sync-back patchset synchronizes flags such as “Read” or “Trashed” to the Maildir, which can then be propagated back to your IMAP server using OfflineIMAP.
  • Furthermore, this patchset has the ability to synchronize arbitrary labels, with a simple set of rules of what folder a message should be moved to depending on what labels it has. For example, inbox and archived messages can be kept in separate folders, so that non-Sup clients can usefully access mail you care about. (Trust me: this is really awesome.) This is coupled with a bonus OfflineIMAP patch which implements fast remote message moving.
  • It implements inotify on Maildir, so a full directory scan is no longer necessary to retrieve new messages. The bottleneck for polling is now strictly OfflineIMAP.
  • It implements the ability to save sent and draft messages to Maildir, so they show up in third-party clients.
  • Finally, it has a number of miscellaneous bugfixes and extra hooks which I have personally found useful.

There is at least a high probability the patchset will work for you, since I’ve been using it actively for a while. Sup will sometimes crash; if it doesn't happen reproduceably or cause data loss, I probably won’t investigate too hard. Some of my patches are a bit sketchy (especially those labeled HACK: I’ve attempted to document all the skeevy bits in commit messages and code comments.) So, how supported is this version of Sup? Well:

  1. I am using this patchset, therefore, for all use-cases and environments I care about, it will stay working;
  2. I will probably not fix problems I am not affected by, and definitely not problems I cannot reproduce;
  3. I do not promise a stable commit history: I’ve rebased the patchset multiple times and will continue to do so.

Some of the early patches are pretty uncontroversial though, and I’d like to see them get into mainline eventually. You can get the code here: http://gitorious.org/~ezyang/sup/ezyang/commits/maildir-sync/

New hooks

sent-save-to
  Configures where to save sent mail to. If this hook doesn't exist,
  the global sent setting will be used (possibly defaulting to sup://sent)
  Variables:
      message: RMail::Message instance of the mail to send.
      account: Account instance matching the From address
  Return value:
       Source to save mail to, nil to use default

compose-from
  Selects a default address for the From: header of a new message
  being composed.
  Variables:
    opts: a dictionary of ComposeMode options, including :from, :to,
      :cc, :bcc, :subject, :refs and :replytos
  Return value:
    A Person to be used as the default for the From: header

draft-save-to
  Selects a source to save a draft to.
  Variables:
    from_email: the email part of the From: line, or nil if empty
  Return value:
    A source to save the draft to.

Label synchronization

To use this functionality, in config.yaml, you need a new option :maildir_labels:

:maildir_labels:
  :stanford: [[:inbox, 4], [null, 6]]

The value of this option is a dictionary of "accounts" to lists of precedences. (The account label stanford doesn’t actually mean anything; it's just for documentation.) Read it as follows:

For messages belonging in source 4 or source 6 (consult sources.yaml), if the message has the :inbox tag, move it to source 4, otherwise move it to source 6.

This will automatically start working for any new mail you change the labels of. In order to apply this to old mail, you need to run sup-sync-back-maildir. If you're going to move a lot of mail, you probably want to run this version of OfflineIMAP: https://github.com/ezyang/offlineimap

  • December 1, 2012

Why can’t I just be a little lazy?

You can. Imagine a version of Haskell where every constructor was strict, e.g. every field had a ! prefix. The semantics of this language are well defined; and in fact, the fine folks at CMU have known about this for some time:

Up to this point we have frequently encountered arbitrary choices in the dynamics of various language constructs. For example, when specifying the dynamics of pairs, we must choose, rather arbitrarily, between the lazy dynamics, in which all pairs are values regardless of the value status of their components, and the eager dynamics, in which a pair is a value only if its components are both values. We could even consider a half-eager (or, equivalently, half-lazy) dynamics, in which a pair is a value only if, say, the first component is a value, but without regard to the second.

Similar questions arise with sums (all injections are values, or only injections of values are values), recursive types (all folds are values, or only folds of values are values), and function types (functions should be called by-name or by-value). Whole languages are built around adherence to one policy or another. For example, Haskell decrees that products, sums, and recursive types are to be lazy, and functions are to be called by name, whereas ML decrees the exact opposite policy. Not only are these choices arbitrary, but it is also unclear why they should be linked. For example, we could very sensibly decree that products, sums, and recursive types are lazy, yet impose a call-by-value discipline on functions. Or we could have eager products, sums, and recursive types, yet insist on call-by-name. It is not at all clear which of these points in the space of choices is right; each has its adherents, and each has its detractors.

Are we therefore stuck in a tarpit of subjectivity? No! The way out is to recognize that these distinctions should not be imposed by the language designer, but rather are choices that are to be made by the programmer. This may be achieved by recognizing that differences in dynamics reflect fundamental type distinctions that are being obscured by languages that impose one policy or another. We can have both eager and lazy pairs in the same language by simply distinguishing them as two distinct types, and similarly we can have both eager and lazy sums in the same language, and both by-name and by-value function spaces, by providing sufficient type distinctions as to make the choice available to the programmer.

This is from the Polarization chapter of Harper’s Practical Foundations for Programming Languages. Personally, I think call-by-name with (by default) eager data types by default is an under-appreciated point in the design space: with this combination, you still get the ability to implement your own control-flow structures like if (just not on data structures) and have lazy bindings, but you no longer have to worry about a large class of space leak. Of course, this regime doesn't eliminate all problems: for example, if you foldl instead of foldl', you will still end up with a long line of function applications and stack overflow. It’s not clear to me if there is an alternative form of fix which dodges this bullet.

  • November 26, 2012

Functional Encryption

Joe Zimmerman recently shared with me a cool new way of thinking about various encryption schemes called functional encryption. It’s expounded upon in more depth in a very accessible recent paper by Dan Boneh et al.. I’ve reproduced the first paragraph of the abstract below:

We initiate the formal study of functional encryption by giving precise definitions of the concept and its security. Roughly speaking, functional encryption supports restricted secret keys that enable a key holder to learn a specific function of encrypted data, but learn nothing else about the data. For example, given an encrypted program the secret key may enable the key holder to learn the output of the program on a specific input without learning anything else about the program.

Quite notably, functional encryption generalizes many existing encryption schemes, including public-key encryption, identity-based encryption and homomorphic encryption. Unfortunately, there are some impossibility results for functional encryption in general in certain models of security (the linked paper has an impossibility result for the simulation model.) There’s no Wikipedia page for functional encryption yet; maybe you could write it!

Apropos of nothing, a math PhD friend of mine recently asked me, “So, do you think RSA works?” I said, “No, but probably no one knows how to break it at the moment.” I then asked him why the question, and he mentioned he was taking a class on cryptography, and given all of the assumptions, he was surprised any of it worked at all. To which I replied, “Yep, that sounds about right.”

  • November 25, 2012

Extremist Programming

Functions are awesome. What if we made a PL that only had functions?

Objects are awesome. What if we made a PL where everything was an object?

Lazy evaluation is awesome. What if we made a PL where every data type was lazy?

Extremist programming (no relation to extreme programming) is the act of taking some principle, elevating it above everything else and applying it everywhere. After the dust settles, people often look at this extremism and think, “Well, that was kind of interesting, but using X in Y was clearly inappropriate. You need to use the right tool for the job!”

Here’s the catch: sometimes you should use the wrong tool for the job—because it might be the right tool, and you just don’t know it yet. If you aren’t trying to use functions everywhere, you might not realize the utility of functions that take functions as arguments [1] or cheap lambdas [2]. If you aren’t trying to use objects everywhere, you might not realize that both integers [3] and the class of an object [4] are also objects. If you aren’t trying to use laziness everywhere, you might not realize that purity is an even more important language feature [5].

This leads to two recommendations:

  1. When learning a new principle, try to apply it everywhere. That way, you’ll learn more quickly where it does and doesn’t work well, even if your initial intuitions about it are wrong. (The right tool for the job, on the other hand, will lead you to missed opportunities, if you don’t realize that the principle is applicable in some situation).
  2. When trying to articulate the essence of some principle, an extremist system is clearest. If you want to know what it is like to program with lazy evaluation, you want to use Haskell, not a language with optional laziness. Even if the extremist system is less practical, it really gets to the core of the issue much more quickly.

There are a lot of situations where extremism is inappropriate, but for fun projects, small projects and research, it can really teach you a lot. One of the most memorable interactions I had in the last year was while working with Adam Chlipala. We were working on some proofs in Coq, and I had been taking the moderate route of doing proofs step-by-step first, and then with Ltac automation once I knew the shape of the proof. Adam told me: “You should automate the proofs from the very beginning, don’t bother with the manual exploration.” [6] It was sage advice that made my life a lot better: I guess I just wasn’t extremist enough!

Files are awesome. What if we made an OS where everything was a file?

Cons cells are awesome. What if we made a PL where everything was made of cons cells?

Mathematics is awesome. What if we made a PL where everything came from math?

Arrays are awesome. What if we made a PL where everything was an array?


[1] Higher-order functions and combinators: these tend to not see very much airplay because they might be very verbose to write, or because the language doesn't have a very good vocabulary for saying what the interface of a higher-order function is. (Types help a bit here.)

[2] Cheap lambdas are necessary for the convenient use of many features, including: monads, scoped allocation (and contexts in general), callbacks, higher-order functions.

[3] Consider early versions of Java prior to the autoboxing of integer and other primitive types.

[4] Smalltalk used this to good effect, as does JavaScript.

[5] This is one of my favorite narratives about Haskell, it comes from Simon Peyton Jones’ presentation Wearing the hair shirt (in this case, laziness).

[6] This is the essence of the Chlipala school of Coq proving, in recognition of how astonishingly easy it is to trick experienced computer scientists into writing the equivalents of straight-line programs by hand, without any abstractions.

  • November 20, 2012

Plan 9 mounts and dependency injection

“Everything is a file.” [1] This was the design philosophy taken to its logical extreme in Plan 9. Any interface you could imagine was represented as a file. Network port, pixel buffers, kernel interfaces—all were unified under a common API: the file operations (open, read, write...) Plan 9 used this to eliminate most of its system calls: it had only thirty-nine, in contrast to modern Linux's sprawling three hundred and twenty-six.

When I first heard of Plan 9, my first thought was, “But that’s cheating, right?” After all, they had reduced the number of syscalls but increased the number of custom files: complexity had merely been shifted around. But one of my labmates gave me a reason why this was still useful: per-process mountpoints. These mountpoints meant that I could give each process their own view of the filesystem—usually the same, but sometimes with some vital differences. Suppose that I wanted to tunnel the network connection of one of my applications: this application would be accessing the network through some file, so I instead could mount a network filesystem to the network files of another system, and transparently achieve proxying without any cooperation from my application. [2]

Let’s step back for a moment and put on our programming language hats. Suppose that a file is an abstract data type, and the syscall interface for manipulating files is the interface for this data type. What are mounts, in this universe? Another friend of mine pointed out the perfectly obvious analogy:

Files : Mounts :: Abstract Data Types : Dependency Injection

In particular, the mount is a mechanism for modifying some local namespace, so that when a file is requested, it may be provided by some file system completely different to what the process might have expected. Similarly, dependency injection specifies a namespace, such that when an object is requested, the concrete implementation may be completely different to what the caller may have expected.

The overall conclusion is that when developers implemented dependency injection, they were reimplementing Plan 9’s local mounts. Is your dependency injection hierarchical? Can you replace a hierarchy (MREPL), or mount your files before (MBEFORE) or after (MAFTER) an existing file system? Support runtime changes in the mount? Support lexical references (e.g. dot-dot ..) between entities in the hierarchy? I suspect that existing dependency injection frameworks could learn a bit from the design of Plan 9. And in Haskell, where it seems that people are able to get much further without having to create a dependency injection framework, do these lessons map back to the design of a mountable file system? I wonder.


[1] Functional programmers might be reminded of a similar mantra, “Everything is a function.”

[2] For the longest time, Linux did not provide per-process mount namespaces, and even today this feature is not available to unprivileged users—Plan 9, in contrast, had this feature available from the very beginning to all users. There is also the minor issue where per-process mounts are actually a big pain to work with in Linux, primarily, I dare say, due to the lack of appropriate tools to assist system administrators attempting to understand their applications.

  • November 8, 2012

hp/D3.js: an interactive heap profile viewer

I'm taking a Data Visualization course this fall, and one of our assignments was to create an interactive visualization. So I thought about the problem for a little bit, and realized, “Hey, wouldn’t it be nice if we had a version of hp2ps that was both interactive and accessible from your browser?” (hp2any fulfills this niche partially, but as a GTK application).

A week of hacking later: hp/D3.js, the interactive heap profile viewer for GHC heaps. Upload your hp files, share them with friends! Our hope is that the next time you need to share a heap profile with someone, instead of running hp2ps on it and sending your colleague the ps file, you’ll just upload the hp file here and send a colleague your link. We’ve tested it on recent Firefox and Chrome, it probably will work on any sufficiently modern browser, it definitely won’t work with Internet Explorer.

/img/hpd3js.png

Some features:

  • You can annotate data points by clicking on the graph and filling in the text box that appears. These annotations are saved and will appear for anyone viewing the graph.
  • You can filter heap elements based on substring match by typing in the “filter” field.
  • You can drill down into more detail by clicking on one of the legend elements. If you click OTHER, it will expand to show you more information about the heap elements in that band. You can then revert your view by pressing the Back button.

Give it a spin, and let me know about any bugs or feature suggestions! (Some known bugs: sometimes Yesod 500s, just refresh until it comes up. Also, we lack backwards animations, axis changing is a little choppy and you can’t save annotations on the OTHER band.)

  • November 2, 2012