ezyang's blog

the arc of software bends towards understanding

Two ways of representing perfect binary trees

A common simplification when discussing many divide and conquer algorithms is the assumption that the input list has a size which is a power of two. As such, one might wonder: how do we encode lists that have power of two sizes, in a way that lists that don’t have this property are unrepresentable? One observation is that such lists are perfect binary trees, so if we have an encoding for perfect binary trees, we also have an encoding for power of two lists. Here are two well-known ways to do such an encoding in Haskell: one using GADTs and one using nested data-types. We claim that the nested data-types solution is superior.

This post is literate, but you will need some type system features:

{-# LANGUAGE ScopedTypeVariables, GADTs, ImpredicativeTypes #-}

GADTs

One approach is to encode the size of the tree into the type, and then assert that the sizes of two trees are the same. This is pretty easy to do with GADTs:

data Z
data S n

data L i a where
    L :: a -> L Z a
    N :: L i a -> L i a -> L (S i) a

By reusing the type variable i, the constructor of N ensures that we any two trees we combine must have the same size. These trees can be destructed like normal binary trees:

exampleL = N (N (L 1) (L 2)) (N (L 3) (L 4))

toListL :: L i a -> [a] -- type signature is necessary!
toListL (L x) = [x]
toListL (N l r) = toListL l ++ toListL r

Creating these trees from ordinary lists is a little delicate, since the i type variable needs to be handled with care. Existentials over lists work fairly well:

data L' a = forall i. L' { unL' :: L i a }
data Ex a = forall i. Ex [L i a]

fromListL :: [a] -> L' a
fromListL xs = g (Ex (map L xs))
  where
    g (Ex [x]) = L' x
    g (Ex xs)  = g (Ex (f xs))
    f (x:y:xs) = (N x y) : f xs
    f _ = []

Nested data-types

Another approach is to literally build up a type isomorphic to a 2^n size tuple (modulo laziness). For example, in the case of a 4-tuple, we’d like to just say ((1, 2), (3, 4)). There is still, however, the pesky question of how one does recursion over such a structure. The technique to use here is bootstrapping, described in Adam Buchsbaum in his thesis and popularized by Chris Okasaki:

data B a = Two (B (a, a)) | One a
    deriving Show

Notice how the recursive mention of B does not hold a, but (a, a): this is so-called “non-uniform” recursion. Every time we apply a Two constructor, the size of our tuple doubles, until we top it off:

exampleB = Two (Two (One ((1,2), (3,4))))

fromListB :: [a] -> B a
fromListB [x] = One x
fromListB xs = Two (fromListB (pairs xs))
    where pairs (x:y:xs) = (x,y) : pairs xs
          pairs _ = []

toListB :: B a -> [a]
toListB (One x) = [x]
toListB (Two c) = concatMap (\(x,y) -> [x,y]) (toListB c)

Which is better?

At first glance, the GADT approach seems more appealing, since when destructing it, the data type looks and feels a lot like an ordinary binary tree. However, it is much easier to parse user data into nested data types than GADTs (due to the fact that Haskell is not a dependently typed language). Ralf Hinze, in his paper Perfect Trees and Bit-reversal Permutations, gives another argument in favor of nested datatypes:

Comparing [perfect trees and the usual definition of binary trees] it is fairly obvious that the first representation is more concise than the second one. If we estimate the space usage of an k-ary constructor at k+1 cells, we have that a perfect tree of rank n consumes (2^n-1)3+(n+1)2 cells with the first and (2^n-1)3+22^n* with the second. [The difference coming from all of the extra leaf nodes.]

Nevertheless, destructing the nested data type tree is very weird, and we might feel better about the “exotic” nested data type if there was an efficient transformation from the catamorphism (n :: t a -> t a -> t a , z :: a -> t a) on traditional trees:

cataL :: (t a -> t a -> t a, a -> t a) -> L i a -> t a
cataL (n,z) (N l r) = n (cataL (n,z) l) (cataL (n,z) r)
cataL (n,z) (L x) = z x

to a catamorphism (f :: a -> t a, g :: t (a, a) -> t a) on our nested data-type tree:

cataB :: (forall a. a -> t a, forall a. t (a, a) -> t a) -> B a -> t a
cataB (f,g) (One a) = f a
cataB (f,g) (Two t) = g (cataB (f,g) t)

This conversion is possible, though, alas, it is not a catamorphism:

cataLB :: forall t a. (t a -> t a -> t a, a -> t a) -> B a -> t a
cataLB (n,z) t = f t z
  where
    f :: forall b. B b -> (b -> t a) -> t a
    f (One a) z = z a
    f (Two t) z = f t (\(l,r) -> n (z l) (z r))

The idea is to create a function (a -> t a) -> t a, which we then pass z in order to get the final result. This is the time honored difference list/continuation passing trick, where we build up a chain of function invocations rather than attempt to build up the result directly, since ordinarily the catamorphism on nested data-type trees proceeds in the wrong direction. But now, we can easily perform any fold we would have done on ordinary trees on our nested data-type trees, which resolves any lingering concerns we may have had. Nested data types are superior… from a representation size perspective, in any case. (See Jeremy’s comment for another take on the issue, though.)

For further reading, check out Generalised folds for nested datatypes (Richard Bird, Ross Paterson).

Polymorphic variants in Ur/Web

This document explains how polymorphic variants in Ur/Web work. It was written because the official tutorial has no mention of them, the manual only devotes a paragraph to the topic, and there are some useful tricks for dealing with them that I picked up while wrangling with them in Logitext.

What are polymorphic variants?

Polymorphic variants may be familiar to OCaml users: they permit you to use the tags of variants in multiple types, and not just in the original algebraic data type a constructor was defined in. Instead having to keep names unique:

datatype yn = Yes1 | No1
datatype ynm = Yes2 | No2 | Maybe2

We can just reuse them:

con yn = variant [Yes = unit, No = unit]
con ynm = variant [Yes = unit, No = unit, Maybe = unit]

This is extremely convenient if you have a lot of constructors which you want to share between multiple logical types. Unfortunately, they have a number of nasty effects, largely stemming from the fact that they introduce subtyping to the language. In Ur, this inherent subtyping is modulated by the same row types that power Ur’s record system, so handling polymorphic variants is quite like handling records, and both are based off of Ur/Web’s type-level records.

How do I make a polymorphic variant?

To make a polymorphic variant type, instead of applying the $ operator, instead apply the variant to a type-level record. So:

$[A = int, B = bool]

generates a record with two fields, A containing an int and B containing a bool, whereas:

variant [A = int, B = bool]

generates a variant with two constructors, A containing just an int, or B containing just a bool.

To make a polymorphic variant value, use the make function, which requires a label (indicating the constructor) and the value:

make [#A] 2

Technically, during the construction of a polymorphic variant, you also need to know what the full set of constructors this value will be used with are. Normally Ur/Web will infer this for you, but this is an important restriction which will affect code that operates on variants. The full signature of make is this:

val make : nm :: Name
        -> t ::: Type
        -> ts ::: {Type}
        -> [[nm] ~ ts]
        => t -> variant ([nm = t] ++ ts)

The function of nm and t should be self-evident, and ts is the type-level record for the rest of the values in the variant, concatenated with [nm = t] to produce a type-level record which is guaranteed to contain nm.

How do I destruct a polymorphic variant?

Use the match function, which takes a variant and a record of functions which indicate how to process each possible constructor of the variant:

match t { A = fn a => a + 2,
          B = fn b => if b then 3 else 6 }

Indeed, the variant and the record use the same type-level record, though the types of the record are a little different, as seen in the type of match:

val match : ts ::: {Type}                (* the type-level record *)
         -> t ::: Type
         -> variant ts                   (* the variant *)
         -> $(map (fn t' => t' -> t) ts) (* the record *)
         -> t

What other operations can I perform on variants?

make and match are the only primitives you need: everything else can derived. However, the meta library has a Variant module which contains of useful derived functions for operating with variants. For example, this pair of functions:

val read : r ::: {Unit} -> t ::: Type -> folder r
        -> $(mapU t r) -> variant (mapU {} r) -> t
val write : r ::: {Unit} -> t ::: Type -> folder r
        -> $(mapU t r) -> variant (mapU {} r) -> t -> $(mapU t r)

Allow you to use variants as labels, to project and edit values from homogeneously typed records. The signatures are not too difficult to read: r is the type-level record which defines the variant, t is the type of the homogeneous record, folder r is a folder for the record (which usually will get inferred), $(mapU t r) is the type of a homogeneous record (we didn’t write $r because that would be a record containing only unit) and variant (mapU {} r) is the variant serving as a “label”. Here are some example uses of some of the simpler functions in this library:

read {A = 1, B = 2} (make [#A] ())
== 1

write {A = 1, B = 2} (make [#B] ()) 3
== {A = 1, B = 3}

search (fn v => match v {A = fn () => None, B = fn () => Some 2})
== Some 2

find (fn v => match v {A = fn () => True, B = fn () => False})
== Some (make [#A] ())

test [#A] (make [#A] 2)
== Some 2

weaken (make [#A] 2 : variant [A = int])
== make [#A] 2 : variant [A = int, B = int]

eq (make [#A] ()) (make [#B] ())
== False

mp (fn v => match v {A = fn () => 2, B = fn () => True})
== {A = 2, B = True}

fold (fn v i => match v {A = fn () => i + 1, B = fn () => i + 2}) 0
== 3

mapR (fn v x => match v {A = fn i => i * 2, B = fn i => i * 3}) { A = 2 , B = 3 }
== { A = 4, B = 9 }

What does destrR do?

This function has a somewhat formidable type:

val destrR : K --> f :: (K -> Type) -> fr :: (K -> Type) -> t ::: Type
          -> (p :: K -> f p -> fr p -> t)
          -> r ::: {K} -> folder r -> variant (map f r) -> $(map fr r) -> t

But really, it’s just a more general match. match can easily be implemented in terms of destrR:

match [ts] [t] v fs =
  destrR [ident] [fn p => p -> t] (fn [p ::_] f x => f x) v fs

destrR affords more flexibility when the record is not quite a function, but a record containing a function, or a type class, or even if the variant is the function and the record the data.

Is there a more concise way to match over multiple constructors?

Polymorphic variants frequently have a lot of constructors, all of which look basically the same:

con tactic a =
  [Cut = logic * a * a,
   LExact = int,
   LConj = int * a,
   LDisj = int * a * a,
   LImp = int * a * a,
   LIff = int * a,
   LBot = int,
   LTop = int * a,
   LNot = int * a,
   LForall = int * universe * a,
   LExists = int * a,
   LContract = int * a,
   LWeaken = int * a,
   RExact = int,
   RConj = int * a * a,
   RDisj = int * a,
   RImp = int * a,
   RIff = int * a * a,
   RTop = int,
   RBot = int * a,
   RNot = int * a,
   RForall = int * a,
   RExists = int * universe * a,
   RWeaken = int * a,
   RContract = int * a]

Filling out records to match against quickly gets old, especially if the functionality for any two constructors with the same type of data is the same:

let fun empty _ = True
    fun single (_, a) = proofComplete a
    fun singleQ (_, _, a) = proofComplete a
    fun double (_, a, b) = andB (proofComplete a) (proofComplete b)
in match t {Cut       = fn (_, a, b) => andB (proofComplete a) (proofComplete b),
            LExact    = empty,
            LBot      = empty,
            RExact    = empty,
            RTop      = empty,
            LConj     = single,
            LNot      = single,
            LExists   = single,
            LContract = single,
            LWeaken   = single,
            LTop      = single,
            RDisj     = single,
            RImp      = single,
            LIff      = single,
            RNot      = single,
            RBot      = single,
            RForall   = single,
            RContract = single,
            RWeaken   = single,
            LForall   = singleQ,
            RExists   = singleQ,
            LDisj     = double,
            LImp      = double,
            RIff      = double,
            RConj     = double
            }
end

Adam Chlipala and I developed a nice method for reducing this boilerplate by abusing local type classes, which allow us to lean of Ur/Web’s inference engine to automatically fill in the function to handle an element of a particular type. Here is that recursive traversal again, using our new method:

let val empty   = declareCase (fn _ (_ : int) => True)
    val single  = declareCase (fn _ (_ : int, a) => proofComplete a)
    val singleQ = declareCase (fn _ (_ : int, _ : Universe.r, a) => proofComplete a)
    val double  = declareCase (fn _ (_ : int, a, b) => andB (proofComplete a) (proofComplete b))
    val cut     = declareCase (fn _ (_ : Logic.r, a, b) => andB (proofComplete a) (proofComplete b))
in typeCase t end

For every “type” in the variant, you write a declareCase which takes that type and reduces it to the desired return type. (You also get, as the first constructor, a constructor function to create the original constructor; e.g. declareCase (fn f x => f x) is the identity transformation. Then you run typeCase and watch the magic happen. There are more detailed usage instructions in variant.urs.

How do I make the type of my variant bigger?

When writing metaprograms that create variant types, a common problem is that the variant you just created is too narrow: that is, the ts in variant ts doesn’t have enough entries in it. This is especially a problem when ts is the record you are folding over. Consider a simple example where we would like to write this function, which generates a record of constructors for each constructor of the variant:

fun ctors : ts ::: {Type} -> fl : folder ts -> $(map (fn t => t -> variant ts) ts)

Ur/Web is not clever enough to figure out the naive approach:

fun ctors [ts] fl =
  @fold [fn ts' => $(map (fn t => t -> variant ts) ts')]
      (fn [nm ::_] [v ::_] [r ::_] [[nm] ~ r] n => n ++ {nm = make [nm]})
      {} fl

because it has no idea that nm is a member of the type-level record ts (Ur/Web doesn’t directly have an encoding of field inclusion.)

The way to fix this problem is to use a trick that shows up again and again in variant metaprograms: make the accumulator polymorphic in the fields that are already processed. It is the same trick that is employed in value level programs, when you reverse a list by foldr by observing that you want to make the accumulator a function:

con accum r = s :: {Type} -> [r ~ s] => $(map (fn t => t -> variant (r ++ s)) r)
fun ctors [ts] fl =
  @fold [accum]
        (fn [nm::_] [v::_] [r::_] [[nm] ~ r]
            (k : accum r)
            [s::_] [[nm = v] ++ r ~ s] => k [[nm = v] ++ s] ++ {nm = make [nm]})
        (fn [s::_] [[] ~ s] => {}) fl [[]] !

accum is the type of the accumulator, and we can see it has a new type argument s :: {Type}. This argument is concatenated with the fields that are to be processed r and the current field nm in order to provide the full set of fields ts. During a fold over a record like [A = int, B = bool, C = string], we see:

r = [],                  nm = A, s = [B = bool, C = string]
r = [A = int],           nm = B, s = [C = string]
r = [A = int, B = bool], nm = C, s = []

r builds up fields as usual in a fold, but s builds up its fields in reverse, because, like list reverse, s is not determined until we’ve folded over the entire structure, and now evaluate the pile of type functions outside-in. Thus, it’s easy to see k [[nm = v] ++ s] will always have the correct type.

Conclusion

Polymorphic variants in Ur/Web are quite useful, and avoid many of the problems associated with unrestricted subtyping. Logitext wasn’t originally intending on using polymorphic variants, but we adopted them when they were found to be the most reliable method by which we could quickly implement JSON serialization via metaprogramming, and we’ve come to appreciate their metaprogrammability in a variety of other contexts too. Probably their biggest downside over traditional algebraic data types is the lack of recursion, but that too can be simulated by manually implementing the mu operator using Ur/Web’s module system. I hope this tutorial has given you enough meat to use polymorphic variants on your own, and maybe do a little metaprogramming with them too.

Managing the server/client split in Ur/Web

The holy grail of web application development is a single language which runs on both the server side and the client side. The reasons for this are multifarious: a single language promotes reuse of components that no longer need to be reimplemented in two languages and allows for much more facile communication between the server and the client. Web frameworks that explicitly strive to handle both the server and client include Meteor, Ur/Web, Opa and Google Web Toolkit.

One of the biggest implementation difficulties facing anyone wishing to build such a system is the fact that there are multiple runtimes: the server runtime and the browser runtime, each with an accordingly different set of primitives and APIs available. Furthermore, some code we might wish to only live on the server, and never be sent to the client. When a language feature can be implemented on both runtimes, we maintain the illusion that client and server are indistinguishable; when it cannot, the illusion is broken.

Thus, in order to support runtime-specific FFI calls in such an integrated language, the following questions must be answered:

  1. When is code sent to the client and when is code kept on the server? This information must be exposed to the user (rather than be kept as an “implementation detail”).
  2. How do I force execution on the server?
  3. How do I force execution on the client?

In this blog post, I’ll discuss how Ur/Web addresses these questions. The answers are rather simple (any with any luck generalize to other similar systems), but they are rather difficult to come by if you treat the compiler as a black box.

1. Client/server split

An obvious solution to the client/server division problem is to label entry points (e.g. the main function or an onClick handler) as starting from the server (main) or client (onClick), and then conducting reachability analysis to label all other functions. Thus, in the following Ur/Web code, txn : transaction unit would execute on the server here:

fun main () =
  txn;
  return <xml><body>Done!</body></xml>

while it would execute on the client here:

fun main () =
  return <xml><body><a onclick={txn}>Click me!</a></body></xml>

When given a fragment like this:

fun foo body =
  r <- txn body;
  return <xml>{r}</xml>

it is not possible to know whether or not txn will be needed on the client side or server side without analyzing all of the callers and checking if they are client side or server side. Situations like this are the most important for forcing server-side or client-side behavior.

2. Forcing server-side

Suppose that we wanted to force txn to be executed on the server-side. If we’re already on the server, there is nothing more to do. However, if we’re on the client, we need to make an RPC call back to the server. In Ur/Web, this is easily done:

fun fooClient body =
  r <- rpc (txn body);
  return <xml>{r}</xml>

However, as rpc is a client-side only function in Ur/Web, we can no longer use this function for server-side computation. One consequence of this choice is it forces us to be explicit about when an RPC occurs, which is good news for understanding and security.

3. Forcing client-side

Suppose we wanted to force txn to be executed on the client-side. This is tricky: if we’re already on the client we can go ahead as normal, but if we’re executing in the server side, what does it mean to execute some code on the client?

One interpretation is this: since we are building some HTML that is to be shown to the client, txn should be run when the client actually displays the HTML. Ur/Web recently added the active tag which achieves just this effect:

fun fooServer body =
  return <xml><active code={txn body} /></xml>

The code attribute acts much like onclick and other similar attributes, in that it defines an entry point which happens to automatically get run when shown in the browser. It is still an event handler, in the sense that if someone invokes fooServer, but then doesn’t end up using the HTML, txn never gets called: active can be thought of a sort of lazy execution.

If we would truly like the client to execute some code immediately, our best bet is to shove an active tag down a source which is hooked up to an active dyn element:

fun fooServer body source =
  set source <xml><active code={txn body} /></xml>;
  return <xml>???</xml>

but in this case it is not really possible to ask the client what the result of the computation was (the server is not permitted to block!) This method of mobile code delivery can even be done asynchronously, using channels:

fun fooServerAsync body channel =
  send channel <xml><active code={txn body} /></xml>;
  return <xml>???</xml>

4. Interaction with the optimizer

The code within HTML event handlers (e.g. onclick={...} and active code={...}) can have free variables which bind to variables in their lexical scope, which may have been calculated from the server. As such, you might expect in this case, foo : int -> xbody would be executed on the server:

fun main n =
  let val x = foo n
  in return <xml><body><active code={txn; return x} /></body></xml>

However, Ur/Web’s optimizer is too smart for it’s own good: since foo is pure and thus referentially transparent, it can always be safely inlined (especially when there is only one use-site):

fun main n =
  return <xml><body><active code={txn; return (foo n)} /></body></xml>

Written this way, it is clear that foo is run from the client. Thus, an innocent transformation can break your code, if foo was a server FFI called that was unimplemented on the client.

The troubling conclusion is that variable substitution can make valid programs invalid. Of course, in an eagerly evaluated, impure language, variable substitution is not valid. But we might expect it to be true for a pure language like Ur/Web. In any case, we can teach Ur/Web not to inline by marking foo as being benignEffectful in our urp file.

5. Conclusions

In general, when writing Ur/Web applications, here are some useful guidelines:

  1. Always mark server/client-only identifiers with serverOnly and clientOnly in your urp files. Ur/Web will generally handle one-sided FFI functions appropriately, but if you have code that takes advantage of language features that are only implemented on one side (e.g. closures on the server side), be sure to mark those functions appropriately.
  2. Use rpc to move from client-to-server, and active to move from server-to-client. Because of the “rpc must refer to named function” invariant, the general structure of Ur/Web applications will be blobs of server code with client-side code embedded internally.
  3. If you are interested in generating client code that includes pure server-computed data, make sure the functions computing that data are marked benignEffectful.
  4. In general, don’t worry about the server/client split! Ur/Web will warn you if you need to move things around, but for the most part, things should just work.

One last word about security in a shared server/client model: how does Ur/Web ensure that input validation ends up on the server-side and not the client-side? It is rather simple: the only parts of your program that care about input validation are ones that involve persistent data, and all of these functions are server only. Thus, any user-data that makes it to any of these functions necessarily passed through a top-level page handler or an rpc, making it a relatively simple job to ensure that the validation is on the “right” side of the pond. If you use a data-structure that is correct-by-construction, you’re automatically done!

Secure multiparty Bitcoin anonymization

Abstract. We describe how secure multi-party sorting can serve as the basis for a Bitcoin anonymization protocol which improves over current centralized “mixing” designs.

Bitcoin is a pseudonymous protocol: while Bitcoin addresses are in principle completely anonymous, all traffic into and out of a wallet is publicly visible. With some simple network analysis collections of addresses can be linked together and identified.

The current state of the art for anonymizing Bitcoins is a mixing service, which is trusted third-party wallet which accepts incoming transactions, and in random increments scheduled at random times in the future, transfers a corresponding quantity to a new wallet of your choice. The result is given any Bitcoin that is distributed from this service, there exist a large number of identities from whom the Bitcoin may have originated.

Mixing services of this kind have a number of notable problems. First, the mixing service must be trusted not to keep logs or otherwise monitor the mixing: if they are compromised, the path of any given Bitcoin can be fully traced. The usual advice for this scenario is to use multiple mixing services, so that all of these services must be compromised before anonymity is lost. Second, the mixing service must be trusted not to turn around and refuse to give you back your funds; this makes such mixing services risky for anonymizing large quantities of Bitcoins. Finally, most mixing services charge a processing fee on top of the usual transaction fee one might expect to pay out for arranging for a Bitcoin transfer.

We propose a decentralized, secure multiparty protocol for implementing a mixing protocol. Such a system has been proposed in abstract (also here); in this post, we describe precisely how to implement it, in particular showing that multi-party sorting (a relatively well-studied algorithmic problem) is sufficient to implement this protocol. This protocol does not require a trusted third party (except to assist in discovery of participants interested in performing the mixing protocol), does not require you to reveal your input-output addresses beyond the ultimate transaction, and can be performed atomically using Bitcoin’s transaction language.

Protocol description

First some preliminaries: the multi-party sorting problem is usually formulated as follows: each party i contributes an input element A[i]. After the protocol is carried out, all parties learn the sorted sequence A in secret shared form, but do not learn who contributed any particular element A[i] of the sequence. Referring to the work of Jónsson, Kreitz and Uddin (2011), we assume this protocol as a primitive: the essential idea behind any multi-party sorting is to construct a fixed size sorting circuit for the inputs, and then use the general framework of multi-party computation on the resulting circuit description.

We now describe the mixing problem. Assume that some number of parties are assembled to mix 1 BTC of coins among themselves. (For now, we assume that every participant has the same number of Bitcoins; we will generalize this later.) In particular, each party i has a input wallet A[i] (with balance of at least 1 BTC) and an output wallet B[i], and will only sign a transaction in which 1 BTC is transferred to its output wallet B[i]. Any adversary participating in this transaction should not be able to learn the B[i] corresponding to A[i], except that it is among the set of output wallets taking part in the transaction.

The protocol proceeds as follows:

  1. Every participant declares the input wallet it will be participating in the protocol with, and produces a signature to show that they own the wallet. These wallets are publically sorted into A.
  2. With ordering defined as the numeric value of the public key of each participating output wallet, conduct a secure multi-party sort of all of the output wallets. We now have a sorted list of output wallets B, with no member of the transaction having learned who contributed any given output wallet. Each participant should check if their output wallet is contained in this list (to protect against Byzantine failure); if it is not, they should abort the protocol and destroy their output wallet (its identity has been leaked).
  3. Construct a transaction transferring 1 BTC from A[0] to B[0] (from the sorted lists), A[1] to B[1], and so forth and broadcast it to all participants. Clients sign the transaction with their input wallet.
  4. Once all signatures have arrived, the transaction is valid and is broadcast for incorporation into the block chain.

Mixing pools only work if all participants are attempting to mix identical amounts of Bitcoins. In order to manage participants who would like to mix larger amounts of Bitcoins, we suggest maintaining discovery channels for power of two sizes of Bitcoins, e.g. …1/4 BTC, 1/2 BTC, 1 BTC, 2 BTC, 4 BTC…

Analysis

Step (1) does not mention output wallets, and thus cannot leak information about the input-output correspondence. By definition, step (2) does not leak information about who contributed the output wallet. Furthermore, we don’t even require the sorted result to be secret shared: this sorted list will become public information once the transaction is published in the block chain. The case of aborting the transaction when your output wallet is not present in the result (in the case of Byzantine failure) is delicate: aborting does leak information, and thus you must not use the output wallet in any further transactions. In step (3), assuming that an attacker knows that a mixing transaction is taking place, the deterministic mapping of input to output wallets gives all participants no further bits of information. Thus, this protocol clearly fulfills its security requirements.

One odd thing about this protocol is that no random permutation between participants is explicitly constructed. This might seem unusual, since a natural security property one might expect is for an output wallet to receive its 1 BTC from a uniformly randomly chosen input wallet. However, this condition, while sufficient for anonymity, is not necessary. Just as adding a random permutation destroys all information about the original permutation, replacing the original permutation with a new constant permutation also destroys all information about the original permutation. Furthermore, honest participants will have picked their addresses uniformly at random, so the process of sorting automatically constructs a random permutation between these participants (dishonest participants must be excluded, since they can generate addresses from a skewed probability distributions).

What is the amount of anonymity granted by one round of this protocol? Consider the case where I have 1 BTC in an input wallet tied to my identity, and I participate in a mixing round with n honest participants with a fresh output wallet. Since no information about the source of this output wallet was leaked to the participants, an adversary in the transaction would have a 1/n-1 probability of guessing which output wallet was mine: call this the anonymity factor. In the case of a Sybil attack, the amount of anonymity conferred decreases. If the fraction of attackers is less than some fraction of the participants (for many secret sharing protocols, the magic numbers are 1/2 for passive attackers, and 1/3 for active attackers), then the anonymity factor is still 1/n-1, where n is the number of honest participants; but n is smaller than the number of visible participants in the protocol: the size of the transaction is not necessarily correlated with how anonymous it is! If the number of attackers is above this fraction, then the secret sharing scheme may leak information and no anonymity is gained. This allows for a denial of service attack against a mixing protocol; we describe some mitigation strategies against this attack later. (Note, however, that no matter how many attackers there are, you are guaranteed to not lose any Bitcoins, due to the verification step in (2).)

In practice

As with traditional mixing, a number of precautions should be taken in order to avoid accidentally revealing information about the source of Bitcoins via a side-channel. Consider the case where Alice has 2 BTC tied to her real-world identity, and she would like to anonymously pay 3/4 BTC to Bob.

Alice first prepare by creating a pool of empty wallets, which she will use to carry the anonymized Bitcoins. Alice connects to a tracker for 1 BTC mixing over Tor. (1 BTC is the amount she would like to pay to Bob, rounded up.) She waits a time window to expire for the next mixing, and then as the protocol takes place submits her (public) input wallet and her (anonymous) output wallet. If the protocol fails, she throws out her output wallet and submits a new one next time, and blacklists the misbehaving node if she can figure out who it is.

Once Alice has successfully carried out a mixing, she flips a coin (which comes up tails with probability 1/m). If the coin comes up heads, she waits for another mixing. The number of mixing transactions she is expected to perform is m (obeying the geometric distribution, so selected because it makes all output wallets behave identically with regards to remixing or exiting). Once Alice exits mixing, she now has a wallet containing an anonymous Bitcoin (more precisely, this Bitcoin could be attributable with equal probability to any of the other wallets that she participated in mixes with). She transfers 3/4 BTC to Bob, leaving 1/4 BTC in her wallet.

The remaining Bitcoins in the wallet should now be considered tainted (as they now have a direct relationship to Bob, who may have a public wallet). These Bitcoins should be split into mixable amounts and reanonymized, before used for any other purposes. Even after anonymization, these coins must still be used with care: in particular, they must not be transferred back to the original, public Bitcoin account. In such a situation, the graph structure of mixing transactions looks like this:

image

(The green node is your public node, the red node is your anonymous node). Network analysis that looks for cycles in Bitcoin transfers will be able to identify any transactions, even if the anonymizing pool has a large amount of turnover (though, amusingly enough, if many participants generate cycles, this attack is harder to carry out). To assist in the tracking of these coins, we suggest the development of wallet management software that can handle thousands of private keys, sort by “size of coin”, and track the easily traceable transaction history associated with any given coin.

In order to protect herself against Sybil attacks, Alice may wish to select her mixing tracker with care. Some mixing trackers could charge fees for listing: with sufficient volume, these charges would make it more expensive to carry out a sustained Sybil attack. (The fees can then be turned around and used to pay for the processing of the complicated mixing transactions, which have the social expectation of being accompanied with a transaction fee.) Every mixing should be conducted with a different IP address; if Alice is using Tor for anonymity she needs to reanonymize her connection each time.

Conclusion

Secure multi-party computation has always been in the eye of users of Bitcoin seeking anonymization, but to date there has not been any plausible implementation strategy for any such computation. We hope that this document describes such an implementation strategy and leads the way to a better ecosystem of Bitcoin anonymizers. As the fiascos at Bitcoinica and other exchanges have demonstrated, relying on third party wallets is dangerous. Fortunately, they are also unnecessary.

Acknowledgments

I would like to thank Eric Price for playing an instrumental role in the formulation of this protocol.

Why verification results in higher quality code

Correctness is overrated. After all, no one knows what it means for any reasonably complicated system to be “correct”, and even when we do, the mileposts move around on a daily basis. With the raison d’être of formal verification stripped away, can we still consider it a worthy goal?

Perhaps verification results in higher quality code. But this is not obviously true: correctness is not quality. We might hope that high quality code is readable and easily understood, that it should be as self-contained and independent from the rest of the system, that it is efficient and economical. There is no a priori reason to believe that verification would grant us any of these properties. No matter how horrible some code is, as long as it is correct, there exists a proof which vouches for its correctness.

But as anyone who has gone through the sweat and tears of verifying a program can tell you, formal verification really does make your code better. Here’s the secret: proving theorems is really hard. If we want any hope to successfully prove something about a program, we must to make reasoning about the code as easy as possible. A program under verification irresistibly gravitates towards it’s most “reasonable” form, because otherwise the proofs are simply too arduous to carry out. And in this form, the tenets of high quality code follow.

Take for example Bedrock, a system for building verified low-level programs which manipulate pointers and registers. These are programs that deal with mutable state, a feature which is well known to dramatically increase the difficulty of reasoning. Bedrock, and many systems like it, would be dead out of the water if not for the development of an important system called separation logic. The central idea behind it is so obvious to any experienced practitioner it is barely worth stating: private local state is easier to reason about than public global state—**modularity** is good. It enforces this through a clever formalism, the star operator, which combines two assertions about two regions of memory while assuring that the regions are disjoint. Regardless, the end result is this: if your components are independent, the theorem proving is easy; if your components are tangled together, the theorem proving is hard. You do the math.

But it doesn’t stop there. When different components do interact, the principle of encapsulation says that I do not want to know all of the gory details of a component, just its high-level interface. In theorem prover land, “all of the gory details” means unmanageably large facts about many pointers, and a “high-level interface” is an abstract predicate which rolls up all of these facts into a single, cohesive logical fact (“this is a linked list.”) Developing these predicates is critical to keeping your theorem statements concise and understandable, and in higher-order provers like Bedrock, they can apply not only to data but also to code, i.e. function pointers.

The tenets of high quality code speak for code that is written for humans to understand, and not just machines to execute. But code that is written for machines to understand have many of the same properties that are valued by humans, for if they do not, getting the machine to “understand” becomes an impossible task. Computers may be simple-minded, but all that means is code a computer can understand is code that you can understand too. And that is high quality code.

Thoughts on gamifying textbooks

Earlier this year, Woodie Flowers wrote this criticism of MITx:

We seem to have decided to offer “courses” rather than participate in the exciting new process of replacing textbooks with more effective training tools.

Logitext, true to its name, was intended to explore what a chapter from a next-generation textbook on formal logic might look like. But if you asked anyone what subjects the most important textbooks of this century would be about, I doubt logic would be particularly high on anyone’s list. In terms of relevance, Logitext misses the mark. But I do think there are some design principles that Logitext helps elucidate.

On interactivity

It is a good thing that the quality of the Metropolitan Opera’s productions is not correlated with the quality of their website pages. Here, we have an egregious example of interactivity for the sake of interactivity, with no real benefit to the presentation of information.

There are many things that interactivity can bring the table. However, interactive textbooks should still look like textbooks. The conventional textbook is a masterwork of design for the static medium: it is skimmable, supports random access and easily searched. You cannot beat this format at its own game, no matter how well designed your video game levels may be.

In order to apply interactivity tastefully, you must consider what the limitations of the static medium were: it is precisely here where interactivity can bring something to the table. Here are some examples:

  • Every field of study has its jargon, which assists people versed with the language but impedes those who are not. In a static medium, you can only define jargon a limited number of times: it clutters up a text to redefine it a term every time it shows up in the text, even if you know your students frequently forget what a term means. In an a dynamic medium, the fix is trivial: tooltips. Logitext did not start off with tooltips, but I quickly learned readers were getting confused about the meanings of “conjunction”, “disjunction” and “turnstile”. Tooltips let us easily extend the possible audience of a single work.
  • The written medium demands a linear narrative, only sparingly stopping for questions or elaborations. Too many waypoints, and you risk losing the reader. In an interactive text, the system can give context-sensitive information only when it is relevant. In Logitext, when a reader clicks on a clause which requires an instantiation of a variable, the system explains how this inference rule works. This explanation is given elsewhere in a more general explanation of how quantifiers work, but the system also knows how to offer this information a timely and useful manner.
  • Sometimes, the information you are trying to convey should also be given in another form. It’s the difference between describing a piece of music or actually hearing it, the difference between giving someone a map or letting them wander around for a few hours. Whenever possible, show, don’t tell. And if possible, show in different ways—different intuitions work for different people. I can explain what the “no free occurrence” rule is until the chickens come home, but the unexpected renaming of variables when you click “forall right” immediately introduces the intuitive idea (even though it still needs to be explained for full understanding.)

It is telling that each of these enhancements have been abused by countless systems in the past. Many people have a dim view of tooltips and Clippy, myself included. I think one way to limit the damage of any such abuse is to demand that the textbook gracefully degrade without interactivity. (For technological reasons, Logitext doesn’t render correctly without JavaScript, but I consider this a bug.)

On exercise design

In order to truly learn something, you must solve some problems with it. Gamification of exercises has done well at supplying extrinsic motivation, but up until recently, the state of the art in online exercise design has been something like this. I find this depressing: there is no indication the student is really learned the underlying concepts, or has just constructed an elaborate private system which happens also to be wrong. Remember when you were asked to show your work? We should be demanding this online too.

This is easier said than done. It was no accident that I picked the sequent calculus for Logitext: while logic holds a very special place in my heart, I also knew that it would be easy to automate. The road to a system for something as simple as High School Algebra will be long and stony. Logitext sidesteps so many important questions, even ones as simple as “How do we get student’s answers (with work) onto the computer?” let alone thorny issues such as one addressed by a recent PhD thesis: “How do we tell if the student needs to provide more work?”

I think I came away with two important ideas from Logitext. The first is a strong conviction that theorem provers are the right foundational technology for interesting exercises in mathematics. Building the Logitext system was a bit of work, but once the platform was designed, defining exercises was simple, literally a line of code per exercise. If every exercise had to be implemented from scratch, the cost would have been prohibitively expensive, and the tutorial would have looked a lot different. We know that, in principle, we can formalize all of mathematics; in the case of elementary mathematics, we may not even have to solve open research questions to get there. Theorem provers also know when you’ve gotten the answer right, and I suspect from a gamification perspective that is all you need.

The second important idea is that computers can assist exploration of high-level concepts, even when the foundations are shaky. For some people, copying down a string of mathematical symbols quickly and accurately is an ordeal: a system like Logitext abstracts that away and allows them to see the higher order structure of these proofs. It is true that it is better of students have a strong foundation, but if we had a system which could prevent them from getting left behind, I think the world would be strictly better for it. The solution to a curriculum which relies on a freakish knack for manipulating abstract symbols should not be eliminating symbols, but making it easier to manipulate them. Educational systems should have what I call adjustable affordance: you should have the option to do the low level manipulations, or have the system do them for you.

Conclusion

I have articulated the following design principles for the gamification of textbooks:

  • An interactive textbook should still look like a textbook.
  • Use interactivity to extend the possible audience of a textbook by assisting those with less starting knowledge.
  • Use interactivity to offer context-sensitive information only when relevant.
  • Use interactivity to show; but be sure to explain afterwards.
  • (Heavily modified) theorem provers will be the fundamental technology that will lie beneath any nontrivial exercise engine.
  • One of the most important contributions of computers to exercises will not be automated grading, but assisted exploration.

I’ve asserted these very confidently, but the truth is that they are all drawn from a sample size of one. While the Logitext project was a very informative exercise, I am struck by how little I know about K12 education. As an ace student, I have a rather unrepresentative set of memories, and they are probably all unreliable by now anyway. Education is hard, and while I think improved textbooks will help, I don’t really know if they will really change the game. I am hopeful, but I have the nagging suspicion that I may end up waiting a long time.

An Interactive Tutorial of the Sequent Calculus

You can view it here: An Interactive Tutorial of the Sequent Calculus. This is the “system in three languages” that I was referring to in this blog post. You can also use the system in a more open ended fashion from this page. Here’s the blurb:

This interactive tutorial will teach you how to use the sequent calculus, a simple set of rules with which you can use to show the truth of statements in first order logic. It is geared towards anyone with some background in writing software for computers, with knowledge of basic boolean logic.

Developing this system has been quite a fascinating foray into user interface and game design. While similar systems have been built in the past, my goal was a little different: I wanted something simple enough and accessible enough that anyone with a vague interest in the topic could pick it up, work through it in an hour, and learn something interesting about formal logic. I don’t think this demo will be particularly successful among the math-phobic, but perhaps it will be more successful with people who have an intrinsic interest in this sort of thing. I must have incorporated dozens of comments from my many friends at MIT who put up with my repeated badgering about the system. The first version looked very different. I give my superlative thanks to my beta testers.

There is a lot of hubbub about the next generation of online teaching systems (edX), and this demo (because, really, that’s what it is) is intended to explore how to blur the line between textbooks and video games. It doesn’t really go far enough: it is still too much like a textbook, there is not enough creative latitude in the early exercises. I don’t feel like I have gotten the right feel that a video game which progressively layers concepts (e.g. Portal). On the other hand, I do feel I have done a good job of making the text skimmable, and there are a lot of little touches which I think do enhance the experience. I am embarrassed to admit that there are some features which are not included because they were technically too annoying to implement.

If there is one important design principle behind this demo, it is that there is a difference between giving a person a map and letting a person wander around in a city for a few hours. But, fair reader, you probably don’t have a few hours, and I am probably demanding too much of your attention. Nevertheless, forgive my impoliteness and please, take it out for a spin.

Ubuntu Precise upgrade (Thinkpad/Xmonad)

It is once again time for Ubuntu upgrades. I upgraded from Ubuntu Oneiric Ocelot to Ubuntu Precise Pangolin (12.04), which is an LTS release. Very few things broke (hooray!)

  • The Monospace font changed to something new, with very wide glyph size. The old font was DejaVuSansMono, which I switched back to.
  • Xournal stopped compiling; somehow the linker behavior changed and you need to specify the linker flags manually.
  • gnome-keyring isn’t properly starting up for us non-Unity folks. The underlying problem appears to be packaging errors by Gnome, but adding eval `gnome-keyring-daemon -s to my .xsession` cleared things up.
  • The battery icon went away! I assume some daemon is failing to get run, but since I have a very nice xmobar display I’m not mourning its loss.
  • Default GHC is GHC 7.4.1! Time to rebuild; no Haskell Platform yet. (Note that GHC 7.4.1 doesn’t support the gold linker; this is the chunk-size error.)

I also upgraded my desktop from the previous LTS Lucid Lynx.

  • I had a lot of invalid signature errors, which prevented the release upgrade script from running. I fixed it by uninstalling almost all of my PPAs.
  • Offlineimap needed to be updated because some Python libraries it depended on had backwards incompatible changes (namely, the imap library.)
  • VirtualBox messed up its revision numbers, which contained an underscore which is forbidden. Manually editing it out of the file seems to fix it.

What happens when you mix three research programming languages together

“…so that’s what we’re going to build!”

“Cool! What language are you going to write it in?”

“Well, we were thinking we were going to need three programming languages…”

“…three?”

“…and they’ll be research programming languages too…”

“Are you out of your mind?”


This was the conversation in streaming through my head when I decided that I would be writing my latest software project in Coq, Haskell and Ur/Web. I had reasonably good reasons for the choice: I wanted Coq because I didn’t actually want to implement a theorem prover from scratch, I wanted Ur/Web because I didn’t actually want to hand write JavaScript to get an AJAX interface, and I wanted Haskell because I didn’t want to write a bucket of C to get Ur/Web and Coq to talk to each other. But taken altogether the whole thing seemed a bit ludicrous, like an unholy fusion of a trinity of research programming languages.

In the end, it worked out quite well. Now, what this means depends on your expectations: it was not the case that “everything worked out of the box and had very nice instructions attached.” However, if it was the case that:

  • No single issue ended up requiring an unbounded amount of time and yak shaving,
  • Any patches written made it into upstream, improving the situation of the software for future developers, and
  • The time spent on engineering grease is less than the time it would have taken to build the system with inferior languages,
  • Everyone involved in the project is willing to learn all of the languages involved (easy if it’s only one person),

then yes, it worked “quite well”. In this post, I’d like to describe in a little more detail what happened when I put these three languages together and speculate wildly about general maxims that might apply when someone is doing something similar.

Coq

While Coq is a research language, it is also in very wide use among academics, and most of its instability lies in advanced features that I did not use in my project. So the primary issues I encountered with Coq were not bugs, but in integrating it with the system (namely, making it talk to Haskell).

Maxim 1. Interchange formats will be undocumented and just good enough to get the job done.

Coq is already designed to allow for communication between processes (this is how the Proof General/Emacs and Coq talk to each other), but the format between coqtop and Proof General was undocumented, ad hoc, and didn’t transmit enough information for my application. In the face of such a situation, there are two ways to proceed: grit your teeth and implement the bad protocol or patch the compiler to make a better one. I chose the latter, and learned something very interesting:

Maxim 2. In ML-like languages, it’s very easy to make simple but far reaching changes to a codebase, due to the assistance of the typechecker.

Making the changes to the frontend was very simple; there was nothing deep about the change, and a combination of the typechecker and grep allowed me to pull off the patch with zero debugging. With a few XML tags at a few key spots, I got output reasonable enough to build the rest of the system with.

Aside. Later, I learned that coqide in recent versions of Coq (8.4 and later) has another interchange format. Moving forward, it is probably the correct mechanism to interact with Coq interactively, though this is made somewhat more difficult by the fact that the interchange format is undocumented; however, I’ve filed a bug. With any luck, it will hopefully do better than my patch. My patch was originally intended to be a partial implementation of PGIP, a generic interchange format for interacting with theorem provers, but the Coq developers and I later discovered that the PGIP project is inactive, and the other user, Isabelle, has discontinued using their PGIP backend. (Sometimes standards don’t help!)

Ur/Web

Ur/Web is comparatively less used, and accordingly we ran into a variety of bugs and other infelicities spanning all parts of the system, from the frontend to the compiler. Were they blockers? No!

Maxim 3. A deterministically reproducible bug in some core functionality will get fixed very quickly by an active original author of the code.

This maxim doesn’t apply to fundamental limitations in design (where the fix will take a lot of elbow grease, though the author will usually have a good idea when that’s the case), but other bugs of this type, I found I could get freakishly quick turnaround times for fixes. While I may attribute part of this to the fact that my advisor was the one who wrote the compiler, I don’t think that’s all there is to it. There is a certain pride that comes with an interesting, tricky bit of code you wrote, that makes it an irresistible little puzzle when someone shows you a bug. And we love little puzzles.

There’s also a corollary:

Maxim 4. The less interesting a problem is to the academic, the more likely it is you’ll be able to fix it yourself.

Academics are somewhat allergic to problems that they’re not interested in and which aren’t vital for their research. This means they don’t like working on these bits, but it also means that they’ve probably kept it simple, which means you’re more likely to be able to figure it out. (A good typechecker also really helps! See maxim 2.) There was a simple bug with serving 404s from FastCGI’s compiled by Ur/Web, which had a very simple fix; I also made some modifications to Ur/Web made it runnable without having to make install first. Maintainers of active research software tend to be quite receptive to these “engineering” patches, which serve no direct research purpose. I consider these contributes to be a vital component of being a good citizen of the open source community.

Haskell

OK, Haskell is not really “just” a research language anymore; it is also a very flexible general purpose language which has seen quite a bit of real world use and can be treated as an “ordinary” language in that respect. This made it a good choice for gluing the two other languages together; it can do just about anything, and has very good FFI support for calling into and out of Haskell. This brings us to our next maxim:

Maxim 5. An FFI is a crucial feature for any DSL, and should be a top priority among tasks involved in preparing a language for general usage.

Having Haskell and Ur/Web talk to each other through their FFIs was key for making this all work. Ur/Web is a domain specific language for writing web applications, and among other things it does not include robust systems libraries (e.g. executing external processes and interfacing with them). Most languages will have this problem, since library support takes a bit of work to add, but Ur/Web has a second problem: all side-effectful transactions need to also be able to be rolled back, and this is rather hard to achieve for general input-output. However, with an FFI, we can implement any code which needs this library support in a more suitable language (Haskell), wrap it up in an API which gives the appropriate transactional guarantees, and let Ur/Web use it. Without it, we would not have been able to use Ur/Web: it’s an extremely flexible escape hatch.

Specifying an FFI also is a good way of demonstrating how your language is different from C: it forces you to think about what invariants you expect foreign functions to have (referential transparency? thread-safety?): these invariants are exactly the ones that get automatically fulfilled by code written in your language. That’s pretty cool!

However, because functions which manipulate C pointers are non-transactional, Ur/Web is limited to FFI functions which handle basic C types, e.g. integers and strings. Thus the question of parsing becomes one of utmost importance for Ur/Web, as strings are the preferred interchange format for complex structures. While different languages will have different situations, in general:

Maxim 6. Make sure you know how to do parsing in all of the languages involved.

Conclusion

I’ve presented six maxims of research polyglottism:

  1. Interchange formats will be undocumented and just good enough to get the job done.
  2. In ML-like languages, it’s very easy to make simple but far reaching changes to a codebase, due to the assistance of the typechecker.
  3. A deterministically reproducible bug in some core functionality will get fixed very quickly by an active original author of the code.
  4. The less interesting a problem is to the academic, the more likely it is you’ll be able to fix it yourself.
  5. An FFI is a crucial feature for any DSL, and should be a top priority among tasks involved in preparing a language for general usage.
  6. Make sure you know how to do parsing in all of the languages involved.

If you keep all of these maxims in mind, I believe that the tradeoff between some extra bugfixing and yak shaving for the benefits of the research programming language is a compelling one, and one that should be considered seriously. Yes, you have to be willing to muck around with the innards of all the tools you use, but for any sufficiently important tool, this is inevitably true. And what is a more important tool than your compiler?

Postscript. The application in question is Logitext.

Some thoughts about literature review

While working on my senior thesis, I had to write a prior work section, which ended up being a minisurvey for the particular subfield my topic was in. In the process, a little bird told me some things…

  • If you can, ask someone who might know a little bit about the subject to give you the rundown: there’s a lot of knowledge in people’s heads which never got written down. But also be aware that they will probably have their blind spots.
  • It is better to do the literature review later rather than earlier, after you have started digging into the topic. I have been told if you read the literature too early, you will get spoiled and stop thinking novel thoughts. But I also think there is also a little bit of “you’ll understand the literature better” if you’ve already thought about the topic on your own. Plus, it’s easy to think that everything has been done before: it’s simply not true! (But if you think this, you will get needlessly discouraged.)
  • Don’t indiscriminately add papers to your database. You should have something you want to do with it: is it an important paper that you have to cite because everyone knows about it? Is it directly addressing the issue you’re dealing with? Does it happen to be particularly well written? Is it something that you could see yourself reading more carefully later? Don’t be afraid to toss the paper out; if it actually was important, you’ll run into it again later.
  • Every researcher is a historian. When you look at a paper, you’re not just looking at what is written inside it, but its social context. There’s a reason why “prior work” is so important to academics. If you don’t understand a paper’s context, it’s unlikely you’ll understand the paper.
  • Researchers don’t necessarily talk to each other. Pay attention to who they cite; it says a little bit about what community they’re in.
  • Researchers are happy to send you copies of papers they have written (so fear not the paywall that your university hasn’t subscribed to). They may even volunteer extra information which may come in handy.
  • Be methodical. You’re doing a search, and this means carefully noting down which papers you skimmed, and what you got out of them, and keeping track of other veins of research that you need to follow up on. It’s like chasing a rabbit down a hole, but if you have some clearly defined search criteria, eventually you’ll bottom out. You can prune the uninteresting papers later; the point here is to avoid duplicating work.
  • Read papers critically. Not everything that is published is good; that’s the point of research!

What are your favorite maxims to keep in mind while you’re surveying the literature?