ezyang's blog

the arc of software bends towards understanding

Diskless Paxos crash recovery

This is an edited version of an email I sent last week. Unfortunately, it does require you to be familiar with the original Paxos correctness proof, so I haven’t even tried to expand it into something appropriate for a lay audience. The algorithm is probably too simple to be in the literature, except maybe informally mentioned—however, if it is wrong, I would love to know, since real code depends on it.

I would like to describe an algorithm for Paxos crash-recovery that does not require persistent storage, by utilizing synchronized clocks and a lattice-based epoch numbering. The basic idea is to increase the ballot/proposal number to one for which it is impossible for the crashed node to have made any promises for it. Such an algorithm, as noted in Paxos made Live, is useful in the case of disk corruption, where persistent storage is lost. (Unfortunately, the algorithm they describe in the paper for recovering from this situation is incorrect. The reason is left as an exercise for the reader.) It is inspired by Renesse’s remark about an “epoch-based system”, and the epoch-based crash-recovery algorithm described in JPaxos: State Machine Replication Based on the Paxos Protocol. However, in correspondence with Nuno, I discovered that proofs for the correctness of their algorithm had not been published, so I took it upon myself to convince myself of its correctness, and in the process discovered a simpler version. It may be the case that this algorithm is already in the community folklore, in which case all the better, since my primary interest is implementation.

First, let’s extend proposal numbers from a single, namespaced value n to a tuple (e, n), where n is a namespaced proposal number as before, and e is an epoch vector, with length equal to the number of nodes in the Paxos cluster, and the usual Cartesian product lattice structure imposed upon it.

Let’s establish what behavior we’d like from a node during a crash:

KNOWN-UNKNOWNS. An acceptor knows a value e*, for which for all e where e* ≤ e (using lattice ordering), the acceptor knows if it has responded to prepare requests of form (e, n) (for all n).

That is to say, the acceptor knows what set of proposal numbers he is guaranteed not to have made any promises for.

How can we establish this invariant? We might write a value to persistent storage, and then incrementing it upon a crash; this behavior is then established by monotonicity. It turns out we have other convenient sources of monotonic numbers: synchronized clocks (which are useful for Paxos in other contexts) have this behavior. So instead of using a vector of integers, we use a vector of timestamps. Upon a crash, a process sets its epoch to be the zero vector, except for its own entry, which is set to his current timestamp.

In Paxos made Simple, Lamport presents the following invariant on the operation of acceptors:

P1a. An acceptor can accept proposal numbered n iff it has not responded to a prepare request greater than n.

We can modify this invariant to the following:

P1b. An acceptor can accept proposal numbered (e, n) iff e* ≤ e and it has not responded to a prepare request (_, n') with n' > n.

Notice that this invariant “strengthens” P1a in the sense that an acceptor accepts a proposal in strictly less cases (namely, it refuses proposals when e* ≰ e). Thus, safety is preserved, but progress is now suspect.

When establishing progress of Paxos, we require that there exist a stable leader, and that this leader eventually pick a proposal number that is “high enough”. So the question is, can the leader eventually pick a proposal number that is “high enough”? Yes, define this number to be (lub{e}, max{n} + 1). Does this epoch violate KNOWN-UNKNOWNS? No, as a zero vector with a single later timestamp for that node is always incomparable with any epoch the existing system may have converged upon.

Thus, the modifications to the Paxos algorithm are as follows:

  • Extend ballot numbers to include epoch numbers;
  • On initial startup, set e* to be the zero vector, with the current timestamp in this node’s entry;
  • Additionally reject accept requests whose epoch numbers are not greater-than or equal to e*;
  • When selecting a new proposal number to propose, take the least upper bound of all epoch numbers.

An optimization is on non-crash start, initialize e* to be just the zero vector; this eliminates the need to establish an epoch in the first round of prepare requests. Cloning state from a snapshot is an orthogonal problem, and can be addressed using the same mechanisms that fix lagging replicas. We recommend also implementing the optimization in which a leader only send accept messages to a known good quorum, so a recovered node does not immediately force a view change.

I would be remiss if I did not mention some prior work in this area. In particular, in Failure Detection and Consensus in the Crash-Recovery Model, the authors present a remarkable algorithm that, without stable storage, can handle more than the majority of nodes failing simultaneously (under some conditions, which you can find in the paper). Unfortunately, their solution is dramatically more complicated than solution I have described above, and I do not know of any implementations of it. Additionally, an alternate mechanism for handling crashed nodes with no memory is a group membership mechanism. However, group membership is notoriously subtle to implement correctly.

First impressions of module programming

During my time at Jane Street, I’ve done a fair bit of programming involving modules. I’ve touched functors, type and module constraints, modules in modules, even first class modules (though only peripherally). Unfortunately, the chapter on modules in Advanced Topics in Types and Programming Languages made my eyes glaze over, so I can’t really call myself knowledgeable in module systems yet, but I think I have used them enough to have a few remarks about them. (All remarks about convention should be taken to be indicative of Jane Street style. Note: they’ve open sourced a bit of their software, if you actually want to look at some of the stuff I’m talking about.)

The good news is that they basically work the way you expect them to. In fact, they’re quite nifty. The most basic idiom you notice when beginning to use a codebase that uses modules a lot is you see this:

module Sexp = struct
  type t = ...
  ...
end

There is in fact a place where I have seen this style before: Henning Thielemann’s code on Hackage, in particular data-accessor, which I have covered previously. Unlike in Haskell, this style actually makes sense in OCaml, because you never include Sexp (an unqualified import in Haskell lingo) in the conventional sense, you usually refer to the type as Sexp.t. So the basic unit of abstraction can be thought of as a type—and most simple modules are exactly this—but you can auxiliary types and functions that operate on that type. This is pretty simple to understand, and you can mostly parse the module system as a convenient namespacing mechanism.

Then things get fun.

When you use Haskell type classes, each function individually specifies what constraints on the argument there are. OCaml doesn’t have any type classes, so if you want to do that, you have to manually pass the dictionary to a function. You can do that, but it’s annoying, and OCaml programmers think bigger. So instead of passing a dictionary to a function, you pass a module to a functor, and you specialize all of your “generic” functions at once. It’s more powerful, and this power gets over the annoyance of having to explicitly specify what module your using at any given time. Constraints and modules-in-modules fall out naturally from this basic idea, when you actually try to use the module system in practice.

Probably the hardest thing (for me) to understand about the module system is how type inference and checking operate over it. Part of this is the impedance mismatch with how type classes work. When I have a function:

f :: Monoid m => m -> Int -> m

m is a polymorphic value that can get unified with any specific type. So if I do f 5 + 2, that’s completely fair game if I have an appropriate Monoid instance defined for Int (even though + is not a Monoid instance method.)

However, if I do the same trick with modules, I have to be careful about adding extra type constraints to teach the compiler that some types are, indeed, the same. Here is an example of an extra type restriction that feels like it should get unified away, but doesn’t:

module type SIG = sig
    type t
    val t_of_string : string -> t
end

module N : SIG = struct
    type t = string
    let t_of_string x = x
end

let () = print_endline (N.t_of_string "foo")

Actually, you have to specify that t and string are the same when you add that SIG declaration:

module N : SIG with type t = string = struct

Funny! (Actually, it gets more annoying when you’re specifying constraints for large amounts of types, not just one.) It’s also tricky to get right when functors are involved, and there were some bugs in pre-3.12 OCaml which meant that you had to do some ugly things to ensure you could actually write the type constraints you wanted (with type t = t… those ts are different…)

There are some times, however, when you feel like you would really, really like typeclasses in OCaml. Heavily polymorphic functionality tends to be the big one: if you have something like Sexpable (types that can be converted into S-expressions), using the module system feels very much like duck typing: if it has a sexp_of_t function, and it’s typed right, it’s “sexpable.” Goodness, most of the hairy functors in our base library are because we need to handle the moral equivalent of multiparameter type classes.

Monadic bind is, of course, hopeless. Well, it works OK if you’re only using one monad in your program (then you just specialize your >>= to that module’s implementation by opening the module). But in most applications you’re usually in one specific monad, and if you want to quickly drop into the option monad you’re out of luck. Or you could redefine the operator to be >>=~ and hope no one stabs you. :-)

In-program GC stats redux

Hac Phi was quite productive (since I managed to get two blog posts out of it!) On Saturday I committed a new module GHC.Stats to base which implemented a modified subset of the API I proposed previously. Here is the API; to use it you’ll need to compile GHC from Git. Please test and let me know if things should get changed or clarified!

-- | Global garbage collection and memory statistics.
data GCStats = GCStats
    { bytes_allocated :: Int64 -- ^ Total number of bytes allocated
    , num_gcs :: Int64 -- ^ Number of garbage collections performed
    , max_bytes_used :: Int64 -- ^ Maximum number of live bytes seen so far
    , num_byte_usage_samples :: Int64 -- ^ Number of byte usage samples taken
    -- | Sum of all byte usage samples, can be used with
    -- 'num_byte_usage_samples' to calculate averages with
    -- arbitrary weighting (if you are sampling this record multiple
    -- times).
    , cumulative_bytes_used :: Int64
    , bytes_copied :: Int64 -- ^ Number of bytes copied during GC
    , current_bytes_used :: Int64 -- ^ Current number of live bytes
    , current_bytes_slop :: Int64 -- ^ Current number of bytes lost to slop
    , max_bytes_slop :: Int64 -- ^ Maximum number of bytes lost to slop at any one time so far
    , peak_megabytes_allocated :: Int64 -- ^ Maximum number of megabytes allocated
    -- | CPU time spent running mutator threads.  This does not include
    -- any profiling overhead or initialization.
    , mutator_cpu_seconds :: Double
    -- | Wall clock time spent running mutator threads.  This does not
    -- include initialization.
    , mutator_wall_seconds :: Double
    , gc_cpu_seconds :: Double -- ^ CPU time spent running GC
    , gc_wall_seconds :: Double -- ^ Wall clock time spent running GC
    -- | Number of bytes copied during GC, minus space held by mutable
    -- lists held by the capabilities.  Can be used with
    -- 'par_max_bytes_copied' to determine how well parallel GC utilized
    -- all cores.
    , par_avg_bytes_copied :: Int64
    -- | Sum of number of bytes copied each GC by the most active GC
    -- thread each GC.  The ratio of 'par_avg_bytes_copied' divided by
    -- 'par_max_bytes_copied' approaches 1 for a maximally sequential
    -- run and approaches the number of threads (set by the RTS flag
    -- @-N@) for a maximally parallel run.
    , par_max_bytes_copied :: Int64
    } deriving (Show, Read)

-- | Retrieves garbage collection and memory statistics as of the last
-- garbage collection.  If you would like your statistics as recent as
-- possible, first run a 'performGC' from "System.Mem".
getGCStats :: IO GCStats

Changes to IntMap

As it stands, it is impossible to define certain value-strict operations on IntMaps with the current containers API. The reader is invited, for example, to try efficiently implementing map :: (a -> b) -> IntMap a -> IntMap b, in such a way that for a non-bottom and non-empty map m, Data.IntMap.map (\_ -> undefined) m == undefined.

Now, we could have just added a lot of apostrophe suffixed operations to the existing API, which would have greatly blown it up in size, but following conversation on libraries@haskell.org, we’ve decided we will be splitting up the module into two modules: Data.IntMap.Strict and Data.IntMap.Lazy. For backwards compatibility, Data.IntMap will be the lazy version of the module, and the current value-strict functions residing in this module will be deprecated.

The details of what happened are a little subtle. Here is the reader’s digest version:

  • The IntMap in Data.IntMap.Strict and the IntMap in Data.IntMap.Lazy are exactly the same map; there is no runtime or type level difference between the two. The user can swap between “implementations” by importing one module or another, but we won’t prevent you from using lazy functions on strict maps. You can convert lazy maps to strict ones using seqFoldable.
  • Similarly, if you pass a map with lazy values to a strict function, the function will do the maximally lazy operation on the map that would still result in correct operation in the strict case. Usually, this means that the lazy value probably won’t get evaluated… unless it is.
  • Most type class instances remain valid for both strict and lazy maps, however, Functor and Traversable do not have valid “strict” versions which obey the appropriate laws, so we’ve selected the lazy implementation for them.
  • The lazy and strict folds remain, because whether or not a fold is strict is independent of whether or not the data structure is value strict or spine strict.

I hacked up a first version for the strict module at Hac Phi on Sunday, you can see it here. The full implementation can be found here.

Food-related functional humor

Fall is coming, and with it come hoards of ravenous Freshmen arriving on MIT’s campus. I’ll be doing three food events… all of them functional programming puns. Whee!

Dumpling Hylomorphism

Anamorphism: the building up of a structure. Catamorphism: the consumption of a structure. Hylomorphism: both an anamorphism and a catamorphism. This event? A hylomorphism on dumplings. Come learn the basic fold, or just perform a metabolic reduction on food.

I’ve done this event several times in the past and it’s always good (if a little sweaty) fun. Dumpling making, as it turns out, is an extremely parallelizable task: you can have multiple people rolling out wraps, wrapping the dumplings, and a few brave cooks actually boiling or pan-frying them. (Actually, in China, no one makes their own wraps anymore, because the store bought ones are so good. Not so in the US…)

The WOK Combinator

The combinator is familiar to computer scientists, but less well known to food scientists. A closely guarded secret among the Chinese is the WOK combinator, guaranteed to combine vegetable and meat in record runtime. (Vegan-friendly too.)

Dumplings are a little impractical for an MIT student in the thick of term; they usually tend to get reserved for special events near the beginning of term, if at all. However, stir fry is quick, cheap and easy and an essential healthy component to any college student’s diet. My personal mainstay is broccoli and chicken (which is nearly impossible to get wrong), but I’ve diversified and I’m quite a fan of bell peppers and chorizo these days too (see below). One difficulty with running this event is making sure there are enough rice cookers… running out of rice is no fun, since it takes so long to cook up a new batch!

Polyroastism

Roast(X) where X = {Broccoli, Garlic, Pork, Bell Peppers, Chorizo, Carrots, Onions, Asparagus, Sweet Potatoes}.

This is a new one. Really, I just wanted to do roast broccoli with garlic. It’s sooooo good. I’ve never roasted chorizo before, but there didn’t seem to be enough meat on the menu, so I tossed it in. I did a lot of roasting when I was in Portland, because I’d frequently buy random vegetables at the farmers markets, get home, and then have to figure out how to cook them. Roasting is a pretty good bet, in many cases! I forgot to put beets in the event description; maybe I’ll get some…

BlockedIndefinitelyOnMVar

This post was adapted from a post I made to the glasgow-haskell-users list.

According to Control.Exception, the BlockedIndefinitelyOnMVar exception (and related exception BlockedIndefinitelyOnSTM) is thrown when “the thread is blocked on an MVar, but there are no other references to the MVar so it can’t ever continue.” The description is actually reasonably precise, but it is easy to misinterpret. Fully understanding how this exception works requires some extra documentation from Control.Concurrent as well as an intuitive feel for how garbage collection in GHC works with respects to Haskell’s green threads.

Here’s the litmus test: can you predict what these three programs will do? :

main1 = do
    lock <- newMVar ()
    forkIO $ takeMVar lock
    forkIO $ takeMVar lock
    threadDelay 1000 -- let threads run
    performGC -- trigger exception
    threadDelay 1000

main2 = do
    lock <- newEmptyMVar
    complete <- newEmptyMVar
    forkIO $ takeMVar lock `finally` putMVar complete ()
    takeMVar complete

main3 = do
    lock <- newEmptyMVar
    forkIO $ takeMVar lock `finally` putMVar lock ()
    let loop = do
        b <- isEmptyMVar lock
        if b
            then yield >> performGC >> loop
            else return ()
    loop

Try not to peek. For a hint, check the documentation for forkIO.


The first program gives no output, even though the threadDelay ostensibly lets both forked threads get scheduled, run, and deadlocked. In fact, BlockedIndefinitelyOnMVar is raised, and the reason you don’t see it is because forkIO installs an exception handler that mutes this exception, along with BlockedIndefinitelyOnSTM and ThreadKilled. You can install your own exception handler using catch and co.

There is an interesting extra set of incants at the end of this program that ensure, with high probability, that the threads get scheduled and the BlockedIndefinitelyOnMVar exception gets thrown. Notice that the exception only gets thrown when “no references are left to the MVar.” Since Haskell is a garbage collected language, the only time it finds out references are gone are when garbage collections happen, so you need to make sure one of those occurs before you see one of these errors.

One implication of this is that GHC does not magically know which thread to throw the exception at to “unwedge” the program: instead, it will just throw BlockedIndefinitelyOnMVar at all of the deadlocked threads, including (if applicable) the main thread. This behavior is demonstrated in the second program, where the program terminates with BlockedIndefinitelyOnMVar because the main thread gets a copy of the exception, even though the finally handler of the child thread would have resolved the deadlock. Try replacing the last line with takeMVar complete `catch` \BlockedIndefinitelyOnMVar -> takeMVar complete >> putStrLn "done". It’s pretty hilarious.

The last program considers what it means for an MVar to be “reachable”. As it deadlocks silently, this must mean the MVar stayed reachable; and indeed, our reference isEmptyMVar prevents the MVar from ever going dead, and thus we loop infinitely, even though there was no possibility of the MVar getting filled in. GHC only knows that a thread can be considered garbage (which results in the exception being thrown) if there are no references to it. Who is holding a reference to the thread? The MVar, as the thread is blocking on this data structure and has added itself to the blocking list of this. Who is keeping the MVar alive? Why, our closure that contains a call to isEmptyMVar. So the thread stays. The general rule is as follows: if a thread is blocked on an MVar which is accessible from a non-blocked thread, the thread sticks around. While there are some obvious cases (which GHC doesn’t manage) where the MVar is obviously dead, even if there are references sticking around to it, figuring this out in general is undecidable. (Exercise: Write a program that solves the halting problem if GHC was able to figure this out in general.)

To conclude, without a bit of work (which would be, by the way, quite interesting to see), BlockedIndefinitelyOnMVar is not an obviously useful mechanism for giving your Haskell programs deadlock protection. Instead, you are invited to think of it as a way of garbage collecting threads that would have otherwise languished around forever: by default, a deadlocked thread is silent (except in memory usage.) The fact that an exception shows up was convenient, operationally speaking, but should not be relied on.

From data type definitions to code

What do these problems have in common: recursive equality/ordering checks, printing string representations, serializing/unserializing binary protocols, hashing, generating getters/setters? They are repetitive boilerplate pieces of code that have a strong dependence on the structure of the data they operate over. Since programmers love automating things away, various schools of thought have emerged on how to do this:

  1. Let your IDE generate this boilerplate for you. You right-click on the context menu, click “Generate hashCode()”, and your IDE does the necessary program analysis for you;
  2. Create a custom metadata format (usually XML), which you then run another program on which converts this description into code;
  3. Add sufficiently strong macro/higher-order capabilities to your language, so you can write programs which generate implementations in-program;
  4. Add sufficiently strong reflective capabilities to your language, so you can write a fully generic dynamic implementation for this functionality;
  5. Be a compiler and do static analysis over abstract syntax trees in order to figure out how to implement the relevant operations.

It hadn’t struck me how prevalent the fifth option was until I ran into wide scale use of one particular facet of the camlp4 system. While it describes itself as a “macro system,” in sexplib and bin-prot, the macros being used are not those of the C tradition (which would be good for implementing 3), rather, they are in the Lisp tradition, including access to the full syntax tree of OCaml and the ability to modify OCaml’s grammar. Unlike most Lisps, however, camlp4 has access to the abstract syntax tree of data type definitions (in untyped languages these are usually implicit), which it can use to transform into code.

One question that I’m interested in is whether or not this sort of metaprogramming can be made popular with casual users of languages. If I write code to convert a data structure into a Lisp-like version, is a logical next step to generalize this code into metaprogramming code, or is that a really large leap only to be done by extreme power users? At least from a user standpoint, camlp4 is extremely unobtrusive. In fact, I didn’t even realize I was using it until a month later! Using sexplib, for example, is a simple matter of writing:

type t = bar | baz of int * int
  with sexp

Almost magically, sexp_of_t and sexp_to_t spring into existence.

But defining new transformations is considerably more involved. Part of the problem is the fact that the abstract-syntax tree you are operating over is quite complex, the unavoidable side-effect of making a language nice to program in. I could theoretically define all of the types I cared about using sums and products, but real OCaml programs use labeled constructors, records, anonymous types, anonymous variants, mutable fields, etc. So I have to write cases for all of these, and that’s difficult if I’m not already a language expert.

A possible solution for this is to define a simpler, core language on which to operate over, much the same way GHC Haskell compiles down to Core prior to code generation. You can then make the extra information available through an annotations system (which is desirable even when you have access to the full AST.) If the idea is fundamentally simple, don’t force the end-user to have to handle all of the incidental complexity that comes along with making a nice programming language. Unless, of course, they want to.

Postscript. One of the things I’m absolutely terrible at is literature search. As with most ideas, it’s a pretty safe bet to assume someone else has done this already. But I couldn’t find any prior art here. Maybe I need a better search query than “intermediate language for metaprogramming.”

Variant types and GADTs

OCaml supports anonymous variant types, of the form type a = [`Foo of int | `Bar of bool], with the appropriate subtyping relations. Subtyping is, in general, kind of tricky, so I have been using these variant types fairly conservatively. (Even if a feature gives you too much rope, it can be manageable and useful if you use discipline.) Indeed, they are remarkably handy for one particular use-case for which I would have normally deployed GADTs. This is the “Combining multiple sum types into a single sum type” use-case.

Consider the following program in Haskell:

data A = Foo Int | Bar Bool
data B = Baz Char | Qux

If one would like to define the moral equivalent of A plus B, the most naive way to do this is:

data AorB = A A | B B

But this kind of sucks: I would have preferred some kind of flat namespace by which I could refer to A and B (also, this encoding is not equivalent to data AorB = Foo Int | Bar Bool | Baz Char | Qux in the presence of laziness.) If you use normal sum types in OCaml, you’re similarly out of luck. However, you can handily manage this if you use variant types:

type a = [`Foo of int | `Bar of bool]
type b = [`Baz of char | `Quz]
type a_or_b = [a | b]

Sweet! Note that we’re not using the full generality of variant types: I will only ever refer to these variant constructors in the context of a, b or a_or_b: anonymous variant types are right out. This prevents coercion messes.

I can actually pull this off in Haskell with GADTs, although it’s certainly not obvious for a beginning programmer:

data A
data B
data AorB t where
  Foo :: Int -> AorB A
  Bar :: Bool -> AorB A
  Baz :: Char -> AorB B
  Quz :: AorB B

To pattern match against all constructors, I specify the type AorB t; to only do A I use AorB A, and to only do B I use AorB B. Don’t ask me how to specify arbitrary subsets of more than two combined sum types. (Solutions in the comment section welcome, though they will be graded on clarity.)

The Haskell approach does have one advantage, which is that the sum type is still closed. Since OCaml can make no such guarantee, things like bin-prot need to use up a full honking four-bytes to specify what variant it is (they hash the name and use that as a unique identifier) rather than the two bits (but more likely, one byte) needed here. This also means for more efficient generated code.

In-program GC stats for GHC

I’ll be at this year’s Hac Phi (coming up in a week and a half), and I am planning on working on in-program garbage collector statistics for GHC. There is nothing really technically difficult about this task (we just need to expose some functions in the RTS), but it’s not been done yet and I know quite a few performance-minded and long-running-server-minded folks who would love to see this functionality.

My question for you is this: what would you like such an API to look like? What things should it offer, how would you like to interact with it?

Here’s one sample API to get the ball rolling:

module GHC.RTS.Stats where

-- Info is not collected unless you run with certain RTS options.  If
-- you are planning on using this on a long-running server, costs of the
-- options would be good to have (we also probably need to add extra
-- options which record, but have no outwardly visible effect.)

-- Read out static parameters that were provided via +RTS

generations :: IO Int

---------------------------------------------------------------------
-- Full statistics

-- Many stats are internally collected as words. Should be publish
-- words?

-- Names off of machine readable formats

bytesAllocated :: IO Int64
numGCs :: IO Int64
numByteUsageSamples :: IO Int64
averageBytesUsed :: IO Int64 -- cumulativeBytesUsed / numByteUsageSamples
maxBytesUsed :: IO Int64
-- peakMemoryBlocksAllocated :: IO Int64
peakMegabytesAllocated :: IO Int64
initCpuSeconds :: IO Double
initWallSeconds :: IO Double
mutatorCpuSeconds :: IO Double
mutatorWallSeconds :: IO Double
gcCpuSeconds :: IO Double
gcWallSeconds :: IO Double

-- Wouldn't be too unreasonable to offer a data structure with all of
-- this?  Unclear.  At least, it would prevent related data from
-- desynchronizing.

data GlobalStats = GlobalStats
    { g_bytes_allocated :: Int64
    , g_num_GCs :: Int64
    , g_num_byte_usage_samples :: Int64
    , g_average_bytes_used :: Int64
    , g_max_bytes_used :: Int64
    , g_peak_megabytes_allocated :: Int64
    , g_init_cpu_seconds :: Double
    , g_init_wall_seconds :: Double
    , g_mutator_cpu_seconds :: Double
    , g_mutator_wall_seconds :: Double
    , g_gc_cpu_seconds :: Double
    , g_gc_wall_seconds :: Double
    }
globalStats :: IO GlobalStats
generationStats :: Int -> IO GlobalStats

---------------------------------------------------------------------
-- GC statistics

-- We can't offer a realtime stream of GC events, because they come
-- to fast. (Test? eventlog comes to fast, maybe GC is manageable,
-- but you don't want to trigger GC in your handler.)

data GCStats = GCStats
    { gc_alloc :: Int64
    , gc_live :: Int64
    , gc_copied :: Int64
    , gc_gen :: Int
    , gc_max_copied :: Int64
    , gc_avg_copied :: Int64
    , gc_slop :: Int64
    , gc_wall_seconds :: Int64
    , gc_cpu_seconds :: Int64
    , gc_faults :: Int
    }
lastGC :: IO GCStats
lastMajorGC :: IO GCStats
allocationRate :: IO Double

---------------------------------------------------------------------
-- Parallel GC statistics

data ParGCStats = ParGCStats
    { par_avg_copied :: Int64
    , par_max_copied :: Int64
    }
parGCStats :: IO ParGCStats
parGCNodes :: IO Int64


---------------------------------------------------------------------
-- Threaded runtime statistics
data TaskStats = TaskStats
    -- Inconsistent naming convention here: mut_time or mut_cpu_seconds?
    -- mut_etime or mut_wall_seconds? Hmm...
    { task_mut_time :: Int64
    , task_mut_etime :: Int64
    , task_gc_time :: Int64
    , task_gc_etime :: Int64
    }

---------------------------------------------------------------------
-- Spark statistics

data SparkStats = SparkStats
    { s_created :: Int64
    , s_dud :: Int64
    , s_overflowed :: Int64
    , s_converted :: Int64
    , s_gcd :: Int64
    , s_fizzled :: Int64
    }
sparkStats :: IO SparkStats
sparkStatsCapability :: Int -> IO SparkStats

Synthetic Git merges

In theory, Git supports custom, low-level merge drivers with the merge configuration properties. In practice, no one actually wants to write their own merge driver from scratch. Well, for many cases where a custom merge driver would come in handy, you don’t have to write your merge driver from scratch! Consider these cases:

  • You want to merge files which have differing newline styles,
  • You want to merge files where one had lots of trailing whitespace removed,
  • You want to merge files one branch has replaced certain strings with custom strings (for example, a configuration file which instantiated PASSWORD, or a file that needs to be anonymized if there is a merge conflict),
  • You want to merge a binary file that has a stable textual format, or
  • You want to merge with knowledge about specific types of conflicts and how to resolve them (a super-smart rerere).

For all of these cases, you can instead perform a synthetic Git merge by modifying the input files (constructing synthetic merge inputs), calling Git’s git merge-file to do the actual merge, and then possibly editing the result, before handing it back off to the original invoker of your merge driver. It’s really simple. Here’s an example driver that handles files with differing newline styles by canonicalizing them to UNIX:

#!/bin/sh
CURRENT="$1"
ANCESTOR="$2"
OTHER="$3"
dos2unix "$CURRENT"
dos2unix "$ANCESTOR"
dos2unix "$OTHER"
exec git merge-file "$CURRENT" "$ANCESTOR" "$OTHER"

You can then set it up by frobbing your .git/config:

[merge.nl]
        name = Newline driver
        driver = /home/ezyang/merge-fixnewline.sh %A %O %B

And your .git/info/attributes:

*.txt merge=nl

In Wizard, we implemented (more clever) newline canonicalization, configuration value de-substitution (this reduces the diff between upstream and downstream, reducing the amount of conflicts due to proximity), and custom rerere behavior. I’ve also seen a coworker of mine use this technique manually to handle merge conflicts involving trailing whitespace (in Mercurial, no less!)

Actually, we took this concept further: rather than only create synthetic files, we create entirely synthetic trees, and then call git merge on them proper. This has several benefits:

  • We can now pick an arbitrary ancestor commit to perform the merge from (this, surprisingly enough, really comes in handy for our use-case),
  • Git has an easier time detecting when files moved and changed newline style, etc, and
  • It’s a bit easier to use, since you just call a custom command rather than have to remember how to setup your Git config and attributes properly (and keep them up to date!)

Merges are just metadata—multiple parents commits. Git doesn’t care how you get the contents of your merge commit. Happy merging!