ezyang's blog

the arc of software bends towards understanding

2010

Spring 2010: A Random Walk

Here at the eve of Spring 2010 term, I decided to run this little experiment on my laptop: what files had I modified within the last six months?

find . \( -path '*/.*' \) -prune -o -mtime -180 -print

The result was north of one-hundred-fifty thousand modified files. Here’s the (slightly) abridged version:

  • LaTeX files for “Adventures in Three Monads”, an article that ran in the Monad Reader. Also, blackboard diagrams for my Advanced Typeclasses class I gave that IAP; I ended up not being able to get to the material I prepared for the Reader.
  • valk.txt, which contained notes for my Valkyrie Nethack character. I made my first ascension on March {24,25}th.
  • An Eclipse Java project that served as the jumpboard for my HAMT experiments, under the purview of my undergraduate research project.
  • htmlpurifier-web and htmlpurifier, courtesy of the HTML Purifier 4.1 release I pushed within the last month. (I suspect there will be another release coming soon too.) This also meant new builds of PHP 5.2.11, 5.2.12, 5.2.13, 5.3.1 and 5.3.2 for my super-amazing PHP multi-version farm. Note to self, next time, exclude the build directories from your automated backups, kthxbai.
  • A qemu checkout, in which I attempted to fix their broken DHCP code when the same MAC address requests two different IP addresses, gave up, and assigned static addresses to the virtual machines we were using to demo live process migration. Mmm… 6.828 final project.
  • A hackage-server and holumbus checkout, sprung from aborted dreams of making Holombus and Hackage cooperate to have up-to-the-minute index of all Haskell functions. I hear the Holumbus team has been making changes to make Hayoo be able to incrementally update its index.
  • Updates to tidy up extort, a membership dues tracking application written in Haskell, due to the recent change of leadership in the Assassins’ Guild. During the replacement election, one of the suggested candidate questions was “Do you know Haskell.” We’ll see how long the program lasts…
  • An abc source directory, in which I flexed my C source-diving skills and searched for information on how to use the library. I may be working closely with it in my internship at Galois. Curiously enough, this roughly coincided with the SAT solver that was to be written for 6.005, as well as the study of SAT in my computation complexity class 6.045.
  • A mit-scheme checkout, in order to analyze their red-black tree implementation to figure out how easily it could be persisted (the answer was no, and I had to write my own implementation off of Okasaki’s notes), and to figure out why --batch-mode didn’t do what it said on the tin.
  • A log4j source tree, which I used for two of my Software Engineering 6.005 projects. It was mostly painless to use, and I highly recommend it if you’re building software in Java.
  • Lots of test directories for wizard (note to self, backing those up is also a bad idea!) Some day, I’ll unleash this software on the world, but for now, it’s usage is growing within the MIT sphere.

The really abridged version:

Read more...

Refactoring Haskell code?

I have to admit, refactoring Haskell code (or perhaps even just functional code) is a bit of a mystery to me. A typical refactoring session for me might look like this: sit down in front of code, reread code. Run hlint on the code, fix the problems it gives you. Look at the code some more. Make some local transformations to make a pipeline tighter or give a local subexpression a name. Decide the code is kind of pretty and functional and go do something else.

Read more...

Nested Data Parallelism versus Creative Catamorphisms

I got to watch (unfortunately not in person) Simon Peyton Jones’ excellent talk (no really, if you haven’t seen it, you should carve out the hour necessary to watch it) on Data Parallel Haskell (slides). The talk got me thinking about a previous talk about parallelism given by Guy Steele I had seen recently.

What’s the relationship between these two talks? At first I though, “Man, Guy Steele must be advocating a discipline for programmers, while Simon Peyton Jones’ is advocating a discipline for compilers.” But this didn’t really seem to fit right: maybe you have a clever catamorphism for the problem, the overhead for fully parallelizing everything is prohibitive. As Steele notes, we need “hybrid sequential/parallel strategies,” the most simple of which is “parallelize it until it’s manageable and run the fast sequential algorithm on it,” ala flat data parallelism. Nor is nested data parallelism a silver bullet; while it has wider applicability, there are still domains it fits poorly on.

Read more...

Omnipresent Cabal

A short public service announcement: you might think you don’t need Cabal. Oh, you might be just whipping up a tiny throw-away script, or a small application that you never intend on distributing. Cabal? Isn’t that what you do if you’re planning on sticking your package on Hackage? But the Cabal always knows. The Cabal is always there. And you should embrace the Cabal, even if you think you’re too small to care. Here’s why:

Read more...

Name conflicts on Hackage

Attention Conservation Notice. Unqualified identifiers that are used the most on Hackage.

Perhaps you dread the error message:

Ambiguous occurrence `lookup'
It could refer to either `Prelude.lookup', imported from Prelude
                      or `Data.Map.lookup', imported from Data.Map

It is the message of the piper that has come to collect his dues for your unhygenic unqualified unrestricted module import style.

Or perhaps your a library writer and trying to think up of a new symbol for your funky infix combinator, but you aren’t sure what other libraries have used already.

Read more...

Design Patterns in Haskell

Attention Conservation Notice. A listing of how Gang of Four design patterns might be equivalently implemented in Haskell. A phrasebook for object-oriented programmers dealing with functional programming concepts.

In their introduction to seminal work Design Patterns, the Gang of Four say, “The choice of programming language is important because it influences one’s point of view. Our patterns assume Smalltalk/C++-level language features, and that choice determines what can and cannot be implemented easily. If we assumed procedural languages, we might have included design patterns called ‘Inheritance,’ ‘Encapsulation,’ and ‘Polymorphism.’”

Read more...

Art. Code. Math. (And mit-scheme)

I was in rehearsal today, doodling away second oboe for Saint Saens’ Organ Symphony for the nth time, and it occurred to me: I’ve listened to and played this piece of music enough times to know the full overall flow as well as a good chunk of the orchestral parts, not just mine. So when the hymnal calls give way to the triumphant entrance of the organ in the last movement, or when the tempos start shifting, simultaneously speeding up and slowing down, at the end of the piece, it’s not surprising; it’s almost inevitable. Couldn’t have it any other way.

Read more...

Inessential guide to fclabels

Last time I did an Inessential guide to data-accessor and everyone told me, “You should use fclabels instead!” So here’s the partner guide, the inessential guide to fclabels. Like data-accessor the goal is to make record access and editing not suck. However, it gives you some more useful abstractions. It uses Template Haskell on top of your records, so it is not compatible with data-accessor.

Identification. There are three tell-tale signs:

Read more...

The Problem with xUnit

Tagline: Assertions considered not ideal.

I think automated tests are great. I used two particular flavors of test, the unit test and the integration test, extensively in HTML Purifier and they’re the only reason why I feel comfortable making changes to code that I first wrote in High School. The automated tests let me hack and then figure out if I broke anything with the single stroke of a button, rather than manually shove a few inputs in and see if they “look alright.” They’re also an informal specification of “what I wanted the code to do” when I originally wrote it, by the fine tradition of an example.

Read more...

Creative catamorphisms

The bag of programming tricks that has served us so well for the last 50 years is the wrong way to think going forward and must be thrown out.

Last week, Guy Steele came in and did a guest lecture “The Future is Parallel: What’s a Programmer to Do?” for my advanced symbolic class (6.945). It’s a really excellent talk; such an excellent talk that I had seen the slides for prior to the talk. However hearing Guy Steele talk about it in person really helped set things in context for me.

Read more...