ezyang's blog

the arc of software bends towards understanding

Posts

Databases are categories

Update. The video of the talk can be found here: Galois Tech Talks on Vimeo: Categories are Databases.

On Thursday Dr. David Spivak presented Categories are Databases as a Galois tech talk. His slides are here, and are dramatically more accessible than the paper Simplicial databases. Here is a short attempt to introduce this idea to people who only have a passing knowledge of category theory.

An essential exercise when designing relational databases is the practice of object modeling using labeled graphs of objects and relationships. Visually, this involves drawing a bunch of boxes representing the objects being modeled, and then drawing arrows between the objects showing relationships they may have. We can then use this object model as the basis for a relational database schema.

Read more...

Bug boogie: Git and symlinks

Git is very careful about your files: unless you tell it to be explicitly destructive, it will refuse to write over files that it doesn’t know about, instead giving an error like:

Untracked working tree file ‘foobar’ would be overwritten by merge.

In my work with Wizard, I frequently need to perform merges on working copies that have been, well, “less than well maintained”, e.g. they untarred a new version of the the source tree on the old directory and forgot to add the newly added files. When Wizard goes in and tries to automatically upgrade them to the new version the proper way, this results in all sorts of untracked working tree file complaints, and then we have to go and manually check on the untracked files and remove them once they’re fine.

Read more...

Punt the Prelude

Conservation attention notice. What definitions from the Haskell 98 Prelude tend to get hidden? I informally take a go over the Prelude and mention some candidates.

(.) in the Prelude is function composition, that is, (b -> c) -> (a -> b) -> a -> c. But the denizens of #haskell know it can be much more than that: the function a -> b is really just the functor, so a more general type is Functor f => (b -> c) -> f b -> f c, i.e. fmap. Even more generally, (.) can indicate morphism composition, as it does in Control.Category.

Read more...

Use The Monoid: A worked example

Attention conservation notice. Equivalent Haskell and Python programs are presented for retrieving values from a data structure using state. We then refactor the Haskell program into one that has no state, just a monoid.

A pretty frequent thing a working programmer needs to do is extract some values (frequently more than one) from some data structure, possibly while keeping track of extra metadata. I found myself writing this code the other day:

Read more...

Bananas, Lenses, Envelopes and Barbed Wire <br>A Translation Guide

One of the papers I’ve been slowly rereading since summer began is “Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire”, by Erik Meijer, Maarten Fokkinga and Ross Paterson. If you want to know what {cata,ana,hylo,para}morphisms are, this is the paper to read: section 2 gives a highly readable formulation of these morphisms for the beloved linked list.

Last time, however, my eyes got a little bit glassy when they started discussing algebraic data types, despite having used and defined them in Haskell; part of me felt inundated in a sea of triangles, circles and squiggles, and by the time they reached the laws for the basic combinators, I might as well have said, “It’s all math to me!”

Read more...

Lazy exceptions and IO

Consider the following piece of code:

import Prelude hiding (catch)
import Control.Exception

main :: IO ()
main = do
    t <- safeCall
    unsafeCall t
    putStrLn "Done."

safeCall :: IO String
safeCall = do
    return alwaysFails `catch` errorHandler

--alwaysFails = throw (ErrorCall "Oh no!")
alwaysFails = error "Oh no!"

errorHandler :: SomeException -> IO String
errorHandler e = do
    putStrLn "Caught"
    return "Ok."
errorHandler_ e = errorHandler e >> return ()

unsafeCall :: String -> IO ()
unsafeCall = putStrLn

What might you expect the output to be? A straightforward transcription to Python might look like:

Read more...

Upgrading to Ubuntu Lucid

Now that term is over, I finally went an upgraded my laptop to Ubuntu 10.04 LTS, Lucid Lynx. The process went substantially more smoothly than Karmic went, but there were still a few hiccups.

Etckeeper. As always, you should set AVOID_COMMIT_BEFORE_INSTALL to 0 before attempting a release upgrade, since etckeeper hooks will be invoked multiple times and there’s nothing more annoying than getting the notice “etckeeper aborted the install due to uncommited changes, please commit them yourselves” because there is no way that’s going to work.

Read more...

Class Reflections

Last February, I posted about classes that I was going to be taking. Here are some reflections, now that final projects and examinations are over.

6.005: Software Construction. Teaching students how to engineer large software projects is one of the oddest paradoxes that you might encounter in academic life. The institute is certainly capable of teaching concepts, tools and methodologies, but, to actually be capable of constructing a system from scratch? It’s not really something you can learn, it something that you have to do. And the amount of work you have to put in to start getting the taste of real code as opposed to school code (which gets thrown away at the end of the term) doesn’t fit into one term. We’ve joked that MIT ought to have a two part series, where the second part you are asked to go modify some code you wrote a year ago in the face of shifting requirements. (Unfortunately, I suspect a large number of people will rewrite the thing: one of the problems of not actually being able to do large systems.)

Read more...

I Hate Patches: <br>Confessions of an Open-Source Developer

It is a truth universally acknowledged that if you really want a change put into an open source project, you submit a patch along with the bug report. Sure, you might complain that the average user doesn’t have any programming experience and that it’s unreasonable to expect them to learn some complex system and then figure out how to make the change they’re looking for, but you’re just not in the secret society of hacker enthusiasts who fix their own damn software and give back to the projects they use.

Read more...

Spring 2010: A Random Walk

Here at the eve of Spring 2010 term, I decided to run this little experiment on my laptop: what files had I modified within the last six months?

find . \( -path '*/.*' \) -prune -o -mtime -180 -print

The result was north of one-hundred-fifty thousand modified files. Here’s the (slightly) abridged version:

  • LaTeX files for “Adventures in Three Monads”, an article that ran in the Monad Reader. Also, blackboard diagrams for my Advanced Typeclasses class I gave that IAP; I ended up not being able to get to the material I prepared for the Reader.
  • valk.txt, which contained notes for my Valkyrie Nethack character. I made my first ascension on March {24,25}th.
  • An Eclipse Java project that served as the jumpboard for my HAMT experiments, under the purview of my undergraduate research project.
  • htmlpurifier-web and htmlpurifier, courtesy of the HTML Purifier 4.1 release I pushed within the last month. (I suspect there will be another release coming soon too.) This also meant new builds of PHP 5.2.11, 5.2.12, 5.2.13, 5.3.1 and 5.3.2 for my super-amazing PHP multi-version farm. Note to self, next time, exclude the build directories from your automated backups, kthxbai.
  • A qemu checkout, in which I attempted to fix their broken DHCP code when the same MAC address requests two different IP addresses, gave up, and assigned static addresses to the virtual machines we were using to demo live process migration. Mmm… 6.828 final project.
  • A hackage-server and holumbus checkout, sprung from aborted dreams of making Holombus and Hackage cooperate to have up-to-the-minute index of all Haskell functions. I hear the Holumbus team has been making changes to make Hayoo be able to incrementally update its index.
  • Updates to tidy up extort, a membership dues tracking application written in Haskell, due to the recent change of leadership in the Assassins’ Guild. During the replacement election, one of the suggested candidate questions was “Do you know Haskell.” We’ll see how long the program lasts…
  • An abc source directory, in which I flexed my C source-diving skills and searched for information on how to use the library. I may be working closely with it in my internship at Galois. Curiously enough, this roughly coincided with the SAT solver that was to be written for 6.005, as well as the study of SAT in my computation complexity class 6.045.
  • A mit-scheme checkout, in order to analyze their red-black tree implementation to figure out how easily it could be persisted (the answer was no, and I had to write my own implementation off of Okasaki’s notes), and to figure out why --batch-mode didn’t do what it said on the tin.
  • A log4j source tree, which I used for two of my Software Engineering 6.005 projects. It was mostly painless to use, and I highly recommend it if you’re building software in Java.
  • Lots of test directories for wizard (note to self, backing those up is also a bad idea!) Some day, I’ll unleash this software on the world, but for now, it’s usage is growing within the MIT sphere.

The really abridged version:

Read more...