ezyang's blog

the arc of software bends towards understanding

Posts

Changes to IntMap

As it stands, it is impossible to define certain value-strict operations on IntMaps with the current containers API. The reader is invited, for example, to try efficiently implementing map :: (a -> b) -> IntMap a -> IntMap b, in such a way that for a non-bottom and non-empty map m, Data.IntMap.map (\_ -> undefined) m == undefined.

Now, we could have just added a lot of apostrophe suffixed operations to the existing API, which would have greatly blown it up in size, but following conversation on libraries@haskell.org, we’ve decided we will be splitting up the module into two modules: Data.IntMap.Strict and Data.IntMap.Lazy. For backwards compatibility, Data.IntMap will be the lazy version of the module, and the current value-strict functions residing in this module will be deprecated.

Read more...

Food-related functional humor

Fall is coming, and with it come hoards of ravenous Freshmen arriving on MIT’s campus. I’ll be doing three food events… all of them functional programming puns. Whee!

Dumpling Hylomorphism

Anamorphism: the building up of a structure. Catamorphism: the consumption of a structure. Hylomorphism: both an anamorphism and a catamorphism. This event? A hylomorphism on dumplings. Come learn the basic fold, or just perform a metabolic reduction on food.

I’ve done this event several times in the past and it’s always good (if a little sweaty) fun. Dumpling making, as it turns out, is an extremely parallelizable task: you can have multiple people rolling out wraps, wrapping the dumplings, and a few brave cooks actually boiling or pan-frying them. (Actually, in China, no one makes their own wraps anymore, because the store bought ones are so good. Not so in the US…)

Read more...

BlockedIndefinitelyOnMVar

This post was adapted from a post I made to the glasgow-haskell-users list.

According to Control.Exception, the BlockedIndefinitelyOnMVar exception (and related exception BlockedIndefinitelyOnSTM) is thrown when “the thread is blocked on an MVar, but there are no other references to the MVar so it can’t ever continue.” The description is actually reasonably precise, but it is easy to misinterpret. Fully understanding how this exception works requires some extra documentation from Control.Concurrent as well as an intuitive feel for how garbage collection in GHC works with respects to Haskell’s green threads.

Read more...

From data type definitions to code

What do these problems have in common: recursive equality/ordering checks, printing string representations, serializing/unserializing binary protocols, hashing, generating getters/setters? They are repetitive boilerplate pieces of code that have a strong dependence on the structure of the data they operate over. Since programmers love automating things away, various schools of thought have emerged on how to do this:

  1. Let your IDE generate this boilerplate for you. You right-click on the context menu, click “Generate hashCode()”, and your IDE does the necessary program analysis for you;
  2. Create a custom metadata format (usually XML), which you then run another program on which converts this description into code;
  3. Add sufficiently strong macro/higher-order capabilities to your language, so you can write programs which generate implementations in-program;
  4. Add sufficiently strong reflective capabilities to your language, so you can write a fully generic dynamic implementation for this functionality;
  5. Be a compiler and do static analysis over abstract syntax trees in order to figure out how to implement the relevant operations.

It hadn’t struck me how prevalent the fifth option was until I ran into wide scale use of one particular facet of the camlp4 system. While it describes itself as a “macro system,” in sexplib and bin-prot, the macros being used are not those of the C tradition (which would be good for implementing 3), rather, they are in the Lisp tradition, including access to the full syntax tree of OCaml and the ability to modify OCaml’s grammar. Unlike most Lisps, however, camlp4 has access to the abstract syntax tree of data type definitions (in untyped languages these are usually implicit), which it can use to transform into code.

Read more...

Variant types and GADTs

OCaml supports anonymous variant types, of the form type a = [`Foo of int | `Bar of bool], with the appropriate subtyping relations. Subtyping is, in general, kind of tricky, so I have been using these variant types fairly conservatively. (Even if a feature gives you too much rope, it can be manageable and useful if you use discipline.) Indeed, they are remarkably handy for one particular use-case for which I would have normally deployed GADTs. This is the “Combining multiple sum types into a single sum type” use-case.

Read more...

In-program GC stats for GHC

I’ll be at this year’s Hac Phi (coming up in a week and a half), and I am planning on working on in-program garbage collector statistics for GHC. There is nothing really technically difficult about this task (we just need to expose some functions in the RTS), but it’s not been done yet and I know quite a few performance-minded and long-running-server-minded folks who would love to see this functionality.

My question for you is this: what would you like such an API to look like? What things should it offer, how would you like to interact with it?

Read more...

Synthetic Git merges

In theory, Git supports custom, low-level merge drivers with the merge configuration properties. In practice, no one actually wants to write their own merge driver from scratch. Well, for many cases where a custom merge driver would come in handy, you don’t have to write your merge driver from scratch! Consider these cases:

  • You want to merge files which have differing newline styles,
  • You want to merge files where one had lots of trailing whitespace removed,
  • You want to merge files one branch has replaced certain strings with custom strings (for example, a configuration file which instantiated PASSWORD, or a file that needs to be anonymized if there is a merge conflict),
  • You want to merge a binary file that has a stable textual format, or
  • You want to merge with knowledge about specific types of conflicts and how to resolve them (a super-smart rerere).

For all of these cases, you can instead perform a synthetic Git merge by modifying the input files (constructing synthetic merge inputs), calling Git’s git merge-file to do the actual merge, and then possibly editing the result, before handing it back off to the original invoker of your merge driver. It’s really simple. Here’s an example driver that handles files with differing newline styles by canonicalizing them to UNIX:

Read more...

Parallelism to plug space leaks

It is not too difficult (scroll to “Non sequitur”) to create a combinator which combines two folds into a single fold that operates on a single input list in one pass. This is pretty important if your input list is pretty big, since doing the folds separately could result in a space leak, as might be seen in the famous “average” space leak:

import Data.List
big = [1..10000000]
sum' = foldl' (+) 0
average xs = fromIntegral (sum' xs) / fromIntegral (length xs)
main = print (average big)

(I’ve redefined sum so we don’t stack overflow.) I used to think combining functions for folds were pretty modular, since they had a fairly regular interface, could be combined together, and really represented the core notion of when it was possible to eliminate such a space leak: obviously, if you have two functions that require random access to elements of the list, they’ll retain the entirety of it all the way through.

Read more...

Facebook support for BarnOwl

This one’s for the MIT crowd. This morning, I finished my Facebook module for BarnOwl to my satisfaction (my satisfaction being asynchronous support for Facebook API calls, i.e. no more random freezing!) Getting it to run on Linerva was a bit involved, however, so here is the recipe.

  1. Setup a local CPAN installation using the instructions at sipb.mit.edu, using local::lib. Don’t forget to add the setup code to .bashrc.mine, not .bashrc, and then source them. Don’t forget to follow prerequisites: otherwise, CPAN will give a lot of prompts.
  2. Install all of the CPAN dependencies you need. For the Facebook module, this means Facebook::Graph and AnyEvent::HTTP. I suggest using notest, since Any::Moose seems to fail a harmless test on Linerva. Facebook::Graph fails several tests, but don’t worry about it since we’ll be using a pre-packaged version. If you want to use other modules, you will need to install them in CPAN as well.
  3. Clone BarnOwl to a local directory (git clone git://github.com/ezyang/barnowl.git barnowl), ./autogen.sh, configure and make.
  4. Run using ./barnowl, and then type the command :facebook-auth and follow the instructions!

Happy Facebooking!

Read more...

Grad School, Oh My

It still feels a little strange how this happened. Not a year ago, I was pretty sure I was going to do the Masters of Engineering program at MIT, to make up for a “missed year” that was spent abroad (four years at MIT plus one at Cambridge, not a bad deal.)

But a combination of conversations with various researchers whom I greatly respect, nagging from my Dad, and an inability to really figure out who I would actually do my Master’s thesis under while at MIT meant that at some point about a month and a half ago, I decided to go for the graduate school admissions cycle this fall. Oh my. It feels like undergraduate admissions all over again (which was not really a pleasant experience), though this time around, what I need to write is the “Research Statement.”

Read more...