ezyang's blog

the arc of software bends towards understanding

2011

Why you shouldn't do a PhD in systems

The opinions presented in this post are not necessarily mine. I’m just one very confused undergraduate senior with a lot of soul searching to do.

When I tell my friends, “I’m going to get a PhD,” I sometimes get the response, “Good for you!” But other times, I get the response, “Why would you want to do that?” as if I was some poor, misguided soul, brainwashed by a society which views research as the “highest calling”, the thing that the best go to, while the rest go join industry. “If you’re a smart hacker and you join industry,” they say, “you will have more fun immediately, making more impact and more money.”

Read more...

Let's play a game

Ever wondered how Haskellers are magically able to figure out the implementation of functions just by looking at their type signature? Well, now you can learn this ability too. Let’s play a game.

You are an inventor, world renowned for your ability to make machines that transform things into other things. You are a proposer.

image

But there are many who would doubt your ability to invent such things. They are the verifiers.

Read more...

8 ways to report errors in Haskell revisited

In 2007, Eric Kidd wrote a quite popular article named 8 ways to report errors in Haskell. However, it has been four years since the original publication of the article. Does this affect the veracity of the original article? Some names have changed, and some of the original advice given may have been a bit… dodgy. We’ll take a look at each of the recommendations from the original article, and also propose a new way of conceptualizing all of Haskell’s error reporting mechanisms.

Read more...

Joseph and the Amazing Technicolor Box

Consider the following data type in Haskell:

data Box a = B a

How many computable functions of type Box a -> Box a are there? If we strictly use denotational semantics, there are seven:

image

But if we furthermore distinguish the source of the bottom (a very operational notion), some functions with the same denotation have more implementations…

  1. Irrefutable pattern match: f ~(B x) = B x. No extras.
  2. Identity: f b = b. No extras.
  3. Strict: f (B !x) = B x. No extras.
  4. Constant boxed bottom: Three possibilities: f _ = B (error "1"); f b = B (case b of B _ -> error "2"); and f b = B (case b of B !x -> error "3").
  5. Absent: Two possibilities: f (B _) = B (error "4"); and f (B x) = B (x `seq` error "5").
  6. Strict constant boxed bottom: f (B !x) = B (error "6").
  7. Bottom: Three possibilities: f _ = error "7"; f (B _) = error "8"; and f (B !x) = error "9".

List was ordered by colors of the rainbow. If this was hieroglyphics to you, may I interest you in this blog post?

Read more...

Diskless Paxos crash recovery

This is an edited version of an email I sent last week. Unfortunately, it does require you to be familiar with the original Paxos correctness proof, so I haven’t even tried to expand it into something appropriate for a lay audience. The algorithm is probably too simple to be in the literature, except maybe informally mentioned—however, if it is wrong, I would love to know, since real code depends on it.

Read more...

First impressions of module programming

During my time at Jane Street, I’ve done a fair bit of programming involving modules. I’ve touched functors, type and module constraints, modules in modules, even first class modules (though only peripherally). Unfortunately, the chapter on modules in Advanced Topics in Types and Programming Languages made my eyes glaze over, so I can’t really call myself knowledgeable in module systems yet, but I think I have used them enough to have a few remarks about them. (All remarks about convention should be taken to be indicative of Jane Street style. Note: they’ve open sourced a bit of their software, if you actually want to look at some of the stuff I’m talking about.)

Read more...

In-program GC stats redux

Hac Phi was quite productive (since I managed to get two blog posts out of it!) On Saturday I committed a new module GHC.Stats to base which implemented a modified subset of the API I proposed previously. Here is the API; to use it you’ll need to compile GHC from Git. Please test and let me know if things should get changed or clarified!

-- | Global garbage collection and memory statistics.
data GCStats = GCStats
    { bytes_allocated :: Int64 -- ^ Total number of bytes allocated
    , num_gcs :: Int64 -- ^ Number of garbage collections performed
    , max_bytes_used :: Int64 -- ^ Maximum number of live bytes seen so far
    , num_byte_usage_samples :: Int64 -- ^ Number of byte usage samples taken
    -- | Sum of all byte usage samples, can be used with
    -- 'num_byte_usage_samples' to calculate averages with
    -- arbitrary weighting (if you are sampling this record multiple
    -- times).
    , cumulative_bytes_used :: Int64
    , bytes_copied :: Int64 -- ^ Number of bytes copied during GC
    , current_bytes_used :: Int64 -- ^ Current number of live bytes
    , current_bytes_slop :: Int64 -- ^ Current number of bytes lost to slop
    , max_bytes_slop :: Int64 -- ^ Maximum number of bytes lost to slop at any one time so far
    , peak_megabytes_allocated :: Int64 -- ^ Maximum number of megabytes allocated
    -- | CPU time spent running mutator threads.  This does not include
    -- any profiling overhead or initialization.
    , mutator_cpu_seconds :: Double
    -- | Wall clock time spent running mutator threads.  This does not
    -- include initialization.
    , mutator_wall_seconds :: Double
    , gc_cpu_seconds :: Double -- ^ CPU time spent running GC
    , gc_wall_seconds :: Double -- ^ Wall clock time spent running GC
    -- | Number of bytes copied during GC, minus space held by mutable
    -- lists held by the capabilities.  Can be used with
    -- 'par_max_bytes_copied' to determine how well parallel GC utilized
    -- all cores.
    , par_avg_bytes_copied :: Int64
    -- | Sum of number of bytes copied each GC by the most active GC
    -- thread each GC.  The ratio of 'par_avg_bytes_copied' divided by
    -- 'par_max_bytes_copied' approaches 1 for a maximally sequential
    -- run and approaches the number of threads (set by the RTS flag
    -- @-N@) for a maximally parallel run.
    , par_max_bytes_copied :: Int64
    } deriving (Show, Read)

-- | Retrieves garbage collection and memory statistics as of the last
-- garbage collection.  If you would like your statistics as recent as
-- possible, first run a 'performGC' from "System.Mem".
getGCStats :: IO GCStats

Changes to IntMap

As it stands, it is impossible to define certain value-strict operations on IntMaps with the current containers API. The reader is invited, for example, to try efficiently implementing map :: (a -> b) -> IntMap a -> IntMap b, in such a way that for a non-bottom and non-empty map m, Data.IntMap.map (\_ -> undefined) m == undefined.

Now, we could have just added a lot of apostrophe suffixed operations to the existing API, which would have greatly blown it up in size, but following conversation on libraries@haskell.org, we’ve decided we will be splitting up the module into two modules: Data.IntMap.Strict and Data.IntMap.Lazy. For backwards compatibility, Data.IntMap will be the lazy version of the module, and the current value-strict functions residing in this module will be deprecated.

Read more...

Food-related functional humor

Fall is coming, and with it come hoards of ravenous Freshmen arriving on MIT’s campus. I’ll be doing three food events… all of them functional programming puns. Whee!

Dumpling Hylomorphism

Anamorphism: the building up of a structure. Catamorphism: the consumption of a structure. Hylomorphism: both an anamorphism and a catamorphism. This event? A hylomorphism on dumplings. Come learn the basic fold, or just perform a metabolic reduction on food.

I’ve done this event several times in the past and it’s always good (if a little sweaty) fun. Dumpling making, as it turns out, is an extremely parallelizable task: you can have multiple people rolling out wraps, wrapping the dumplings, and a few brave cooks actually boiling or pan-frying them. (Actually, in China, no one makes their own wraps anymore, because the store bought ones are so good. Not so in the US…)

Read more...

BlockedIndefinitelyOnMVar

This post was adapted from a post I made to the glasgow-haskell-users list.

According to Control.Exception, the BlockedIndefinitelyOnMVar exception (and related exception BlockedIndefinitelyOnSTM) is thrown when “the thread is blocked on an MVar, but there are no other references to the MVar so it can’t ever continue.” The description is actually reasonably precise, but it is easy to misinterpret. Fully understanding how this exception works requires some extra documentation from Control.Concurrent as well as an intuitive feel for how garbage collection in GHC works with respects to Haskell’s green threads.

Read more...