ezyang's blog

Practical Foundations for Programming Languages (first impressions)

August 15, 2012

Robert Harper has (somewhat) recently released a pre-print of a book (PDF) that he has been working on, Practical Foundations for Programming Languages. I downloaded a copy when it initially came out, but I was guilty of putting off actually digging into the book’s 590-some pages. It was only until Harper successfully baited me with one of his most recent blog posts that I finally sat down and skimmed it a bit more thoroughly.

The immediate temptation is to compare PFPL to Benjamin Pierce’s seminal Types and Programming Languages. At first glance, there would seem to be quite a bit of overlap, both in terms of content and in terms of presentation. Both books starting with a very simple programming language and successively add features to explain successively more complex topics in programming languages design. But PFPL consciously differs from TAPL in many aspects. For ideological reasons, Harper completely skips the untyped language, jumping straight to a typed language with variable let-bindings, in order to immediately introduce types, contexts and safety. This presentation is substantially more terse, and newcomers with no programming languages experience may perceive that PFPL feels more like a reference manual than a textbook—`one commenter <http://existentialtype.wordpress.com/2012/08/06/there-and-back-again/#comment-949>`_ likened it to a math textbook. (Apropos of nothing, Harper’s introductory class 15-312 Principles of Programming Languages, which uses PFPL, does start with the untyped lambda calculus.)

Nevertheless, this terseness is an asset for PFPL; for one thing, it permits Harper to cover a lot of ground, covering topics that TAPL did not handle at all. Nor does the terseness mean that Harper has “left anything out”, each chapter is self-contained and comprehensive on the topics it chooses to cover. It also makes it a fun read for people like me who do have familiarity with the topics discussed but benefit from seeing and thinking about a different treatment.

Harper has been blogging about his book, and I think his blog posts are a good indication of what parts of the book are particularly worth looking at. Harper has taken the style of going “all intuition” in his blog posts, and relegating most of the formalism to his book. I think this is a shame, since the formalisms he defines are quite accessible and would make things a lot clearer for many in his audience (judging from the comments section, at least!) Here are some of the pairings:

Dynamic Languages are Static Languages is a companion to Chapter 18, “Dynamic Typing”. There, he develops Dynamic PCF (essentially the core of Lisp) and shows how the usual concrete syntax masks the tagging that occurs, and the usual dynamics masks the wasteful and repetitive checks that are endemic to dynamically typed languages. There is always a temptation, in these holy wars, to expand the scope of the argument, but if you accept Dynamic PCF as a valid way of framing one aspect of the debate, it is extremely precise.
Referential transparency is a companion to Chapter 32, “Symbols”. Symbols are a little weird, because most languages don’t even have a way of even acknowledging this concept exists. You might think of it as an identifier for a “generalized mutable cell” apart from how you actually access it, but really you should just read the formal treatment, since it is very simple.
Words matter is a companion to Chapter 36, “Assignable References”. It’s a simple terminology split, motivated by how Harper defines the term “variable”, way in Chapter 1 of his book.
Haskell is Exceptionally Unsafe is a companion to Chapter 34, “Dynamic Classification”. It argues that it is important to be able to generate exception classes at runtime (the term “class” here has a very precise meaning, namely, it is an index of a finite sum, in this case the exception type; this is discussed in Chapter 12). At least in the Haskell community, this is not a particularly common usage of the term “dynamic” (though I agree with Harper that it is a correct usage), and PFPL spells exactly what it means, no more, no less.

All-in-all, Practical Foundations for Programming Languages is well worth checking out. It is a not too widely kept secret that no one really reads textbooks from tip to tail, but if you found yourself reading one of Harper’s blog posts and being puzzled, do give the companion chapter a chance. Even with just the small bits of the book I have read, PFPL has taught me new things and clarified my thinking.

Is Haskell liberal or conservative?

August 10, 2012

Steve Yegge has posted a fun article attempting to apply the liberal and conservative labels to software engineering. It is, of course, a gross oversimplification (which Yegge admits). For example, he concludes that Haskell must be “extreme conservative”, mostly pointing at its extreme emphasis on safety. This completely misses one of the best things about Haskell, which is that we do crazy shit that no one in their right mind would do without Haskell’s safety features.

So I thought I’d channel some Yegge and take a walk through the criteria proposed for assessing how conservative a user of a language is, and try to answer them to the best of my ability with my ”Haskell hat” on:

Software should aim to be bug free before it launches. Yes. Though, “Beware of bugs in the above code; I have only proved it correct, not tried it.”
Programmers should be protected from errors. Yes. But, Yegge then adds the sentence: “Many language features are inherently error-prone and dangerous, and should be disallowed for all the code we write.” This is not the approach that Haskell takes: if you want continuations with mutable state, Haskell will give them to you. (Try doing that in Python.) It doesn’t disallow language features, just make them more wordy (unsafePerformIO) or harder to use. Haskell has a healthy belief in escape hatches.
Programmers have difficulty learning new syntax. No. Haskell is completely on the wrong side of the fence here, with arbitrary infix operators; and even more extremist languages (e.g. Coq) go even further with arbitrary grammar productions. Of course, the reason for this is not syntax for its own sake, but syntax for the sake of closely modeling existing syntax that mathematicians and other practitioners already use. So we allow operator overloading, but only when it is backed up by algebraic laws. We allow metaprogramming, though I suspect it’s currently used sparingly only because it’s so unwieldy (but culturally, I think the Haskell community is very open to the idea of metaprogramming).
Production code must be safety-checked by a compiler. Yes. But, anyone who has used a dependently typed language has a much higher standard of what “safety-checked” means, and we regularly play fast and loose with invariants that we decided would be too annoying to statically encode. Note that Yegge claims the opposite of compiler safety-checking is succinctness, which is a completely false myth perpetuated by non-Hindley Milner type systems with their lack of type inference.
Data stores must adhere to a well-defined, published schema. Well-defined? Yes. Published? No. The emphasis that Haskell has on static checking mean that people writing data types are a lot more willing to update them as the needs of the application change, and don’t really mind global refactoring of the database because it’s so damn easy to get right.
Public interfaces should be rigorously modeled. Yes. (though cough “ideally object oriented” cough)
Production systems should never have dangerous or risky back-doors. Accidental. The lack of tooling here means that it’s pretty difficult to snoop into a running compiled executable and fiddle around with internal data: this is a big sore point for the current Haskell ecosystem. But in the abstract, we’re pretty flexible: XMonad, for example, can be restarted to run arbitrary new code while preserving the entirety of your working state.
If there is ANY doubt as to the safety of a component, it cannot be allowed in production. This is something of a personal question, and really depends on your project, and not so much on the language itself. Haskell is great for safety critical projects, but I also use it for one-off scripts.
Fast is better than slow. No. Haskell code has the opportunity to be really fast, and it tends to be quite zippy from the get go. But we’ve emphasized features (laziness and abstraction) which are known to cause performance problems, and most Haskellers take the approach of only optimizing when our (very awesome) profiler yells at us. Some Haskellers reflexively add ! {-# UNPACK #-} to their data types, but I don’t—at least, not until I decide my code is too slow.

Haskell has a lot of features which show up in Yegge’s “Liberal Stuff”. Here are some of them:

Eval: We love coding up interpreters, which are like type-safe evals.
Metaprogramming: Template Haskell.
Dynamic scoping: Reader monad.
all-errors-are-warnings: We can delay type errors to runtime!.
Reflection and dynamic invocation: class Data.
RTTI: I hear it’s called a “dictionary”.
The C preprocessor: Indispensable, begrudgingly.
Lisp macros: Why use macros when you can do it properly in Template Haskell!
Domain-specific languages: Haskell eats EDSLs for lunch.
Optional parameters: It’s called combinator libraries.
Extensible syntax: Fuck yeah infix!
Auto-casting: Numeric literals, anyone?
Automatic stringification: class Show and deriving.
Sixty-pass compilers: GHC does a lot of passes.
Whole-namespace imports: Yep (and it’s both convenient and kind of annoying).

The feeling I get from this conversation is that most people think “Haskell” and “static typing” and while thinking about how horrible it is to write traditional dynamically typed code in Haskell, forget that Haskell is actually a surprisingly liberal language prizing understandability, succinctness and risk-taking. Is Haskell liberal or conservative? I think of it as an interesting point in the design space which treats some conservative viewpoints as foundational, and then sees how far it can run from there. It’s folded so far right, it came around left again.

Two ways of representing perfect binary trees

August 4, 2012

A common simplification when discussing many divide and conquer algorithms is the assumption that the input list has a size which is a power of two. As such, one might wonder: how do we encode lists that have power of two sizes, in a way that lists that don’t have this property are unrepresentable? One observation is that such lists are perfect binary trees, so if we have an encoding for perfect binary trees, we also have an encoding for power of two lists. Here are two well-known ways to do such an encoding in Haskell: one using GADTs and one using nested data-types. We claim that the nested data-types solution is superior.

This post is literate, but you will need some type system features:

{-# LANGUAGE ScopedTypeVariables, GADTs, ImpredicativeTypes #-}

GADTs¶

One approach is to encode the size of the tree into the type, and then assert that the sizes of two trees are the same. This is pretty easy to do with GADTs:

data Z
data S n

data L i a where
    L :: a -> L Z a
    N :: L i a -> L i a -> L (S i) a

By reusing the type variable i, the constructor of N ensures that we any two trees we combine must have the same size. These trees can be destructed like normal binary trees:

exampleL = N (N (L 1) (L 2)) (N (L 3) (L 4))

toListL :: L i a -> [a] -- type signature is necessary!
toListL (L x) = [x]
toListL (N l r) = toListL l ++ toListL r

Creating these trees from ordinary lists is a little delicate, since the i type variable needs to be handled with care. Existentials over lists work fairly well:

data L' a = forall i. L' { unL' :: L i a }
data Ex a = forall i. Ex [L i a]

fromListL :: [a] -> L' a
fromListL xs = g (Ex (map L xs))
  where
    g (Ex [x]) = L' x
    g (Ex xs)  = g (Ex (f xs))
    f (x:y:xs) = (N x y) : f xs
    f _ = []

Nested data-types¶

Another approach is to literally build up a type isomorphic to a 2^n size tuple (modulo laziness). For example, in the case of a 4-tuple, we’d like to just say ((1, 2), (3, 4)). There is still, however, the pesky question of how one does recursion over such a structure. The technique to use here is bootstrapping, described in Adam Buchsbaum in his thesis and popularized by Chris Okasaki:

data B a = Two (B (a, a)) | One a
    deriving Show

Notice how the recursive mention of B does not hold a, but (a, a): this is so-called “non-uniform” recursion. Every time we apply a Two constructor, the size of our tuple doubles, until we top it off:

exampleB = Two (Two (One ((1,2), (3,4))))

fromListB :: [a] -> B a
fromListB [x] = One x
fromListB xs = Two (fromListB (pairs xs))
    where pairs (x:y:xs) = (x,y) : pairs xs
          pairs _ = []

toListB :: B a -> [a]
toListB (One x) = [x]
toListB (Two c) = concatMap (\(x,y) -> [x,y]) (toListB c)

Which is better?¶

At first glance, the GADT approach seems more appealing, since when destructing it, the data type looks and feels a lot like an ordinary binary tree. However, it is much easier to parse user data into nested data types than GADTs (due to the fact that Haskell is not a dependently typed language). Ralf Hinze, in his paper Perfect Trees and Bit-reversal Permutations, gives another argument in favor of nested datatypes:

Comparing [perfect trees and the usual definition of binary trees] it is fairly obvious that the first representation is more concise than the second one. If we estimate the space usage of an k-ary constructor at k+1 cells, we have that a perfect tree of rank n consumes (2^n-1)3+(n+1)2 cells with the first and (2^n-1)3+22^n* with the second. [The difference coming from all of the extra leaf nodes.]

Nevertheless, destructing the nested data type tree is very weird, and we might feel better about the “exotic” nested data type if there was an efficient transformation from the catamorphism (n :: t a -> t a -> t a , z :: a -> t a) on traditional trees:

cataL :: (t a -> t a -> t a, a -> t a) -> L i a -> t a
cataL (n,z) (N l r) = n (cataL (n,z) l) (cataL (n,z) r)
cataL (n,z) (L x) = z x

to a catamorphism (f :: a -> t a, g :: t (a, a) -> t a) on our nested data-type tree:

cataB :: (forall a. a -> t a, forall a. t (a, a) -> t a) -> B a -> t a
cataB (f,g) (One a) = f a
cataB (f,g) (Two t) = g (cataB (f,g) t)

This conversion is possible, though, alas, it is not a catamorphism:

cataLB :: forall t a. (t a -> t a -> t a, a -> t a) -> B a -> t a
cataLB (n,z) t = f t z
  where
    f :: forall b. B b -> (b -> t a) -> t a
    f (One a) z = z a
    f (Two t) z = f t (\(l,r) -> n (z l) (z r))

The idea is to create a function (a -> t a) -> t a, which we then pass z in order to get the final result. This is the time honored difference list/continuation passing trick, where we build up a chain of function invocations rather than attempt to build up the result directly, since ordinarily the catamorphism on nested data-type trees proceeds in the wrong direction. But now, we can easily perform any fold we would have done on ordinary trees on our nested data-type trees, which resolves any lingering concerns we may have had. Nested data types are superior… from a representation size perspective, in any case. (See Jeremy’s comment for another take on the issue, though.)

For further reading, check out Generalised folds for nested datatypes (Richard Bird, Ross Paterson).

Polymorphic variants in Ur/Web

July 29, 2012

This document explains how polymorphic variants in Ur/Web work. It was written because the official tutorial has no mention of them, the manual only devotes a paragraph to the topic, and there are some useful tricks for dealing with them that I picked up while wrangling with them in Logitext.

What are polymorphic variants?¶

Polymorphic variants may be familiar to OCaml users: they permit you to use the tags of variants in multiple types, and not just in the original algebraic data type a constructor was defined in. Instead having to keep names unique:

datatype yn = Yes1 | No1
datatype ynm = Yes2 | No2 | Maybe2

We can just reuse them:

con yn = variant [Yes = unit, No = unit]
con ynm = variant [Yes = unit, No = unit, Maybe = unit]

This is extremely convenient if you have a lot of constructors which you want to share between multiple logical types. Unfortunately, they have a number of nasty effects, largely stemming from the fact that they introduce subtyping to the language. In Ur, this inherent subtyping is modulated by the same row types that power Ur’s record system, so handling polymorphic variants is quite like handling records, and both are based off of Ur/Web’s type-level records.

How do I make a polymorphic variant?¶

To make a polymorphic variant type, instead of applying the $ operator, instead apply the variant to a type-level record. So:

$[A = int, B = bool]

generates a record with two fields, A containing an int and B containing a bool, whereas:

variant [A = int, B = bool]

generates a variant with two constructors, A containing just an int, or B containing just a bool.

To make a polymorphic variant value, use the make function, which requires a label (indicating the constructor) and the value:

make [#A] 2

Technically, during the construction of a polymorphic variant, you also need to know what the full set of constructors this value will be used with are. Normally Ur/Web will infer this for you, but this is an important restriction which will affect code that operates on variants. The full signature of make is this:

val make : nm :: Name
        -> t ::: Type
        -> ts ::: {Type}
        -> [[nm] ~ ts]
        => t -> variant ([nm = t] ++ ts)

The function of nm and t should be self-evident, and ts is the type-level record for the rest of the values in the variant, concatenated with [nm = t] to produce a type-level record which is guaranteed to contain nm.

How do I destruct a polymorphic variant?¶

Use the match function, which takes a variant and a record of functions which indicate how to process each possible constructor of the variant:

match t { A = fn a => a + 2,
          B = fn b => if b then 3 else 6 }

Indeed, the variant and the record use the same type-level record, though the types of the record are a little different, as seen in the type of match:

val match : ts ::: {Type}                (* the type-level record *)
         -> t ::: Type
         -> variant ts                   (* the variant *)
         -> $(map (fn t' => t' -> t) ts) (* the record *)
         -> t

What other operations can I perform on variants?¶

make and match are the only primitives you need: everything else can derived. However, the meta library has a Variant module which contains of useful derived functions for operating with variants. For example, this pair of functions:

val read : r ::: {Unit} -> t ::: Type -> folder r
        -> $(mapU t r) -> variant (mapU {} r) -> t
val write : r ::: {Unit} -> t ::: Type -> folder r
        -> $(mapU t r) -> variant (mapU {} r) -> t -> $(mapU t r)

Allow you to use variants as labels, to project and edit values from homogeneously typed records. The signatures are not too difficult to read: r is the type-level record which defines the variant, t is the type of the homogeneous record, folder r is a folder for the record (which usually will get inferred), $(mapU t r) is the type of a homogeneous record (we didn’t write $r because that would be a record containing only unit) and variant (mapU {} r) is the variant serving as a “label”. Here are some example uses of some of the simpler functions in this library:

read {A = 1, B = 2} (make [#A] ())
== 1

write {A = 1, B = 2} (make [#B] ()) 3
== {A = 1, B = 3}

search (fn v => match v {A = fn () => None, B = fn () => Some 2})
== Some 2

find (fn v => match v {A = fn () => True, B = fn () => False})
== Some (make [#A] ())

test [#A] (make [#A] 2)
== Some 2

weaken (make [#A] 2 : variant [A = int])
== make [#A] 2 : variant [A = int, B = int]

eq (make [#A] ()) (make [#B] ())
== False

mp (fn v => match v {A = fn () => 2, B = fn () => True})
== {A = 2, B = True}

fold (fn v i => match v {A = fn () => i + 1, B = fn () => i + 2}) 0
== 3

mapR (fn v x => match v {A = fn i => i * 2, B = fn i => i * 3}) { A = 2 , B = 3 }
== { A = 4, B = 9 }

What does destrR do?¶

This function has a somewhat formidable type:

val destrR : K --> f :: (K -> Type) -> fr :: (K -> Type) -> t ::: Type
          -> (p :: K -> f p -> fr p -> t)
          -> r ::: {K} -> folder r -> variant (map f r) -> $(map fr r) -> t

But really, it’s just a more general match. match can easily be implemented in terms of destrR:

match [ts] [t] v fs =
  destrR [ident] [fn p => p -> t] (fn [p ::_] f x => f x) v fs

destrR affords more flexibility when the record is not quite a function, but a record containing a function, or a type class, or even if the variant is the function and the record the data.

Is there a more concise way to match over multiple constructors?¶

Polymorphic variants frequently have a lot of constructors, all of which look basically the same:

con tactic a =
  [Cut = logic * a * a,
   LExact = int,
   LConj = int * a,
   LDisj = int * a * a,
   LImp = int * a * a,
   LIff = int * a,
   LBot = int,
   LTop = int * a,
   LNot = int * a,
   LForall = int * universe * a,
   LExists = int * a,
   LContract = int * a,
   LWeaken = int * a,
   RExact = int,
   RConj = int * a * a,
   RDisj = int * a,
   RImp = int * a,
   RIff = int * a * a,
   RTop = int,
   RBot = int * a,
   RNot = int * a,
   RForall = int * a,
   RExists = int * universe * a,
   RWeaken = int * a,
   RContract = int * a]

Filling out records to match against quickly gets old, especially if the functionality for any two constructors with the same type of data is the same:

let fun empty _ = True
    fun single (_, a) = proofComplete a
    fun singleQ (_, _, a) = proofComplete a
    fun double (_, a, b) = andB (proofComplete a) (proofComplete b)
in match t {Cut       = fn (_, a, b) => andB (proofComplete a) (proofComplete b),
            LExact    = empty,
            LBot      = empty,
            RExact    = empty,
            RTop      = empty,
            LConj     = single,
            LNot      = single,
            LExists   = single,
            LContract = single,
            LWeaken   = single,
            LTop      = single,
            RDisj     = single,
            RImp      = single,
            LIff      = single,
            RNot      = single,
            RBot      = single,
            RForall   = single,
            RContract = single,
            RWeaken   = single,
            LForall   = singleQ,
            RExists   = singleQ,
            LDisj     = double,
            LImp      = double,
            RIff      = double,
            RConj     = double
            }
end

Adam Chlipala and I developed a nice method for reducing this boilerplate by abusing local type classes, which allow us to lean of Ur/Web’s inference engine to automatically fill in the function to handle an element of a particular type. Here is that recursive traversal again, using our new method:

let val empty   = declareCase (fn _ (_ : int) => True)
    val single  = declareCase (fn _ (_ : int, a) => proofComplete a)
    val singleQ = declareCase (fn _ (_ : int, _ : Universe.r, a) => proofComplete a)
    val double  = declareCase (fn _ (_ : int, a, b) => andB (proofComplete a) (proofComplete b))
    val cut     = declareCase (fn _ (_ : Logic.r, a, b) => andB (proofComplete a) (proofComplete b))
in typeCase t end

For every “type” in the variant, you write a declareCase which takes that type and reduces it to the desired return type. (You also get, as the first constructor, a constructor function to create the original constructor; e.g. declareCase (fn f x => f x) is the identity transformation. Then you run typeCase and watch the magic happen. There are more detailed usage instructions in variant.urs.

How do I make the type of my variant bigger?¶

When writing metaprograms that create variant types, a common problem is that the variant you just created is too narrow: that is, the ts in variant ts doesn’t have enough entries in it. This is especially a problem when ts is the record you are folding over. Consider a simple example where we would like to write this function, which generates a record of constructors for each constructor of the variant:

fun ctors : ts ::: {Type} -> fl : folder ts -> $(map (fn t => t -> variant ts) ts)

Ur/Web is not clever enough to figure out the naive approach:

fun ctors [ts] fl =
  @fold [fn ts' => $(map (fn t => t -> variant ts) ts')]
      (fn [nm ::_] [v ::_] [r ::_] [[nm] ~ r] n => n ++ {nm = make [nm]})
      {} fl

because it has no idea that nm is a member of the type-level record ts (Ur/Web doesn’t directly have an encoding of field inclusion.)

The way to fix this problem is to use a trick that shows up again and again in variant metaprograms: make the accumulator polymorphic in the fields that are already processed. It is the same trick that is employed in value level programs, when you reverse a list by foldr by observing that you want to make the accumulator a function:

con accum r = s :: {Type} -> [r ~ s] => $(map (fn t => t -> variant (r ++ s)) r)
fun ctors [ts] fl =
  @fold [accum]
        (fn [nm::_] [v::_] [r::_] [[nm] ~ r]
            (k : accum r)
            [s::_] [[nm = v] ++ r ~ s] => k [[nm = v] ++ s] ++ {nm = make [nm]})
        (fn [s::_] [[] ~ s] => {}) fl [[]] !

accum is the type of the accumulator, and we can see it has a new type argument s :: {Type}. This argument is concatenated with the fields that are to be processed r and the current field nm in order to provide the full set of fields ts. During a fold over a record like [A = int, B = bool, C = string], we see:

r = [],                  nm = A, s = [B = bool, C = string]
r = [A = int],           nm = B, s = [C = string]
r = [A = int, B = bool], nm = C, s = []

r builds up fields as usual in a fold, but s builds up its fields in reverse, because, like list reverse, s is not determined until we’ve folded over the entire structure, and now evaluate the pile of type functions outside-in. Thus, it’s easy to see k [[nm = v] ++ s] will always have the correct type.

Conclusion¶

Polymorphic variants in Ur/Web are quite useful, and avoid many of the problems associated with unrestricted subtyping. Logitext wasn’t originally intending on using polymorphic variants, but we adopted them when they were found to be the most reliable method by which we could quickly implement JSON serialization via metaprogramming, and we’ve come to appreciate their metaprogrammability in a variety of other contexts too. Probably their biggest downside over traditional algebraic data types is the lack of recursion, but that too can be simulated by manually implementing the mu operator using Ur/Web’s module system. I hope this tutorial has given you enough meat to use polymorphic variants on your own, and maybe do a little metaprogramming with them too.

Managing the server/client split in Ur/Web

July 25, 2012

The holy grail of web application development is a single language which runs on both the server side and the client side. The reasons for this are multifarious: a single language promotes reuse of components that no longer need to be reimplemented in two languages and allows for much more facile communication between the server and the client. Web frameworks that explicitly strive to handle both the server and client include Meteor, Ur/Web, Opa and Google Web Toolkit.

One of the biggest implementation difficulties facing anyone wishing to build such a system is the fact that there are multiple runtimes: the server runtime and the browser runtime, each with an accordingly different set of primitives and APIs available. Furthermore, some code we might wish to only live on the server, and never be sent to the client. When a language feature can be implemented on both runtimes, we maintain the illusion that client and server are indistinguishable; when it cannot, the illusion is broken.

Thus, in order to support runtime-specific FFI calls in such an integrated language, the following questions must be answered:

When is code sent to the client and when is code kept on the server? This information must be exposed to the user (rather than be kept as an “implementation detail”).
How do I force execution on the server?
How do I force execution on the client?

In this blog post, I’ll discuss how Ur/Web addresses these questions. The answers are rather simple (any with any luck generalize to other similar systems), but they are rather difficult to come by if you treat the compiler as a black box.

1. Client/server split¶

An obvious solution to the client/server division problem is to label entry points (e.g. the main function or an onClick handler) as starting from the server (main) or client (onClick), and then conducting reachability analysis to label all other functions. Thus, in the following Ur/Web code, txn : transaction unit would execute on the server here:

fun main () =
  txn;
  return <xml><body>Done!</body></xml>

while it would execute on the client here:

fun main () =
  return <xml><body><a onclick={txn}>Click me!</a></body></xml>

When given a fragment like this:

fun foo body =
  r <- txn body;
  return <xml>{r}</xml>

it is not possible to know whether or not txn will be needed on the client side or server side without analyzing all of the callers and checking if they are client side or server side. Situations like this are the most important for forcing server-side or client-side behavior.

2. Forcing server-side¶

Suppose that we wanted to force txn to be executed on the server-side. If we’re already on the server, there is nothing more to do. However, if we’re on the client, we need to make an RPC call back to the server. In Ur/Web, this is easily done:

fun fooClient body =
  r <- rpc (txn body);
  return <xml>{r}</xml>

However, as rpc is a client-side only function in Ur/Web, we can no longer use this function for server-side computation. One consequence of this choice is it forces us to be explicit about when an RPC occurs, which is good news for understanding and security.

3. Forcing client-side¶

Suppose we wanted to force txn to be executed on the client-side. This is tricky: if we’re already on the client we can go ahead as normal, but if we’re executing in the server side, what does it mean to execute some code on the client?

One interpretation is this: since we are building some HTML that is to be shown to the client, txn should be run when the client actually displays the HTML. Ur/Web recently added the active tag which achieves just this effect:

fun fooServer body =
  return <xml><active code={txn body} /></xml>

The code attribute acts much like onclick and other similar attributes, in that it defines an entry point which happens to automatically get run when shown in the browser. It is still an event handler, in the sense that if someone invokes fooServer, but then doesn’t end up using the HTML, txn never gets called: active can be thought of a sort of lazy execution.

If we would truly like the client to execute some code immediately, our best bet is to shove an active tag down a source which is hooked up to an active dyn element:

fun fooServer body source =
  set source <xml><active code={txn body} /></xml>;
  return <xml>???</xml>

but in this case it is not really possible to ask the client what the result of the computation was (the server is not permitted to block!) This method of mobile code delivery can even be done asynchronously, using channels:

fun fooServerAsync body channel =
  send channel <xml><active code={txn body} /></xml>;
  return <xml>???</xml>

4. Interaction with the optimizer¶

The code within HTML event handlers (e.g. onclick={...} and active code={...}) can have free variables which bind to variables in their lexical scope, which may have been calculated from the server. As such, you might expect in this case, foo : int -> xbody would be executed on the server:

fun main n =
  let val x = foo n
  in return <xml><body><active code={txn; return x} /></body></xml>

However, Ur/Web’s optimizer is too smart for it’s own good: since foo is pure and thus referentially transparent, it can always be safely inlined (especially when there is only one use-site):

fun main n =
  return <xml><body><active code={txn; return (foo n)} /></body></xml>

Written this way, it is clear that foo is run from the client. Thus, an innocent transformation can break your code, if foo was a server FFI called that was unimplemented on the client.

The troubling conclusion is that variable substitution can make valid programs invalid. Of course, in an eagerly evaluated, impure language, variable substitution is not valid. But we might expect it to be true for a pure language like Ur/Web. In any case, we can teach Ur/Web not to inline by marking foo as being benignEffectful in our urp file.

5. Conclusions¶

In general, when writing Ur/Web applications, here are some useful guidelines:

Always mark server/client-only identifiers with serverOnly and clientOnly in your urp files. Ur/Web will generally handle one-sided FFI functions appropriately, but if you have code that takes advantage of language features that are only implemented on one side (e.g. closures on the server side), be sure to mark those functions appropriately.
Use rpc to move from client-to-server, and active to move from server-to-client. Because of the “rpc must refer to named function” invariant, the general structure of Ur/Web applications will be blobs of server code with client-side code embedded internally.
If you are interested in generating client code that includes pure server-computed data, make sure the functions computing that data are marked benignEffectful.
In general, don’t worry about the server/client split! Ur/Web will warn you if you need to move things around, but for the most part, things should just work.

One last word about security in a shared server/client model: how does Ur/Web ensure that input validation ends up on the server-side and not the client-side? It is rather simple: the only parts of your program that care about input validation are ones that involve persistent data, and all of these functions are server only. Thus, any user-data that makes it to any of these functions necessarily passed through a top-level page handler or an rpc, making it a relatively simple job to ensure that the validation is on the “right” side of the pond. If you use a data-structure that is correct-by-construction, you’re automatically done!

Secure multiparty Bitcoin anonymization

July 20, 2012

Abstract. We describe how secure multi-party sorting can serve as the basis for a Bitcoin anonymization protocol which improves over current centralized “mixing” designs.

Bitcoin is a pseudonymous protocol: while Bitcoin addresses are in principle completely anonymous, all traffic into and out of a wallet is publicly visible. With some simple network analysis collections of addresses can be linked together and identified.

The current state of the art for anonymizing Bitcoins is a mixing service, which is trusted third-party wallet which accepts incoming transactions, and in random increments scheduled at random times in the future, transfers a corresponding quantity to a new wallet of your choice. The result is given any Bitcoin that is distributed from this service, there exist a large number of identities from whom the Bitcoin may have originated.

Mixing services of this kind have a number of notable problems. First, the mixing service must be trusted not to keep logs or otherwise monitor the mixing: if they are compromised, the path of any given Bitcoin can be fully traced. The usual advice for this scenario is to use multiple mixing services, so that all of these services must be compromised before anonymity is lost. Second, the mixing service must be trusted not to turn around and refuse to give you back your funds; this makes such mixing services risky for anonymizing large quantities of Bitcoins. Finally, most mixing services charge a processing fee on top of the usual transaction fee one might expect to pay out for arranging for a Bitcoin transfer.

We propose a decentralized, secure multiparty protocol for implementing a mixing protocol. Such a system has been proposed in abstract (also here); in this post, we describe precisely how to implement it, in particular showing that multi-party sorting (a relatively well-studied algorithmic problem) is sufficient to implement this protocol. This protocol does not require a trusted third party (except to assist in discovery of participants interested in performing the mixing protocol), does not require you to reveal your input-output addresses beyond the ultimate transaction, and can be performed atomically using Bitcoin’s transaction language.

Protocol description¶

First some preliminaries: the multi-party sorting problem is usually formulated as follows: each party i contributes an input element A[i]. After the protocol is carried out, all parties learn the sorted sequence A in secret shared form, but do not learn who contributed any particular element A[i] of the sequence. Referring to the work of Jónsson, Kreitz and Uddin (2011), we assume this protocol as a primitive: the essential idea behind any multi-party sorting is to construct a fixed size sorting circuit for the inputs, and then use the general framework of multi-party computation on the resulting circuit description.

We now describe the mixing problem. Assume that some number of parties are assembled to mix 1 BTC of coins among themselves. (For now, we assume that every participant has the same number of Bitcoins; we will generalize this later.) In particular, each party i has a input wallet A[i] (with balance of at least 1 BTC) and an output wallet B[i], and will only sign a transaction in which 1 BTC is transferred to its output wallet B[i]. Any adversary participating in this transaction should not be able to learn the B[i] corresponding to A[i], except that it is among the set of output wallets taking part in the transaction.

The protocol proceeds as follows:

Every participant declares the input wallet it will be participating in the protocol with, and produces a signature to show that they own the wallet. These wallets are publically sorted into A.
With ordering defined as the numeric value of the public key of each participating output wallet, conduct a secure multi-party sort of all of the output wallets. We now have a sorted list of output wallets B, with no member of the transaction having learned who contributed any given output wallet. Each participant should check if their output wallet is contained in this list (to protect against Byzantine failure); if it is not, they should abort the protocol and destroy their output wallet (its identity has been leaked).
Construct a transaction transferring 1 BTC from A[0] to B[0] (from the sorted lists), A[1] to B[1], and so forth and broadcast it to all participants. Clients sign the transaction with their input wallet.
Once all signatures have arrived, the transaction is valid and is broadcast for incorporation into the block chain.

Mixing pools only work if all participants are attempting to mix identical amounts of Bitcoins. In order to manage participants who would like to mix larger amounts of Bitcoins, we suggest maintaining discovery channels for power of two sizes of Bitcoins, e.g. …1/4 BTC, 1/2 BTC, 1 BTC, 2 BTC, 4 BTC…

Analysis¶

Step (1) does not mention output wallets, and thus cannot leak information about the input-output correspondence. By definition, step (2) does not leak information about who contributed the output wallet. Furthermore, we don’t even require the sorted result to be secret shared: this sorted list will become public information once the transaction is published in the block chain. The case of aborting the transaction when your output wallet is not present in the result (in the case of Byzantine failure) is delicate: aborting does leak information, and thus you must not use the output wallet in any further transactions. In step (3), assuming that an attacker knows that a mixing transaction is taking place, the deterministic mapping of input to output wallets gives all participants no further bits of information. Thus, this protocol clearly fulfills its security requirements.

One odd thing about this protocol is that no random permutation between participants is explicitly constructed. This might seem unusual, since a natural security property one might expect is for an output wallet to receive its 1 BTC from a uniformly randomly chosen input wallet. However, this condition, while sufficient for anonymity, is not necessary. Just as adding a random permutation destroys all information about the original permutation, replacing the original permutation with a new constant permutation also destroys all information about the original permutation. Furthermore, honest participants will have picked their addresses uniformly at random, so the process of sorting automatically constructs a random permutation between these participants (dishonest participants must be excluded, since they can generate addresses from a skewed probability distributions).

What is the amount of anonymity granted by one round of this protocol? Consider the case where I have 1 BTC in an input wallet tied to my identity, and I participate in a mixing round with n honest participants with a fresh output wallet. Since no information about the source of this output wallet was leaked to the participants, an adversary in the transaction would have a 1/n-1 probability of guessing which output wallet was mine: call this the anonymity factor. In the case of a Sybil attack, the amount of anonymity conferred decreases. If the fraction of attackers is less than some fraction of the participants (for many secret sharing protocols, the magic numbers are 1/2 for passive attackers, and 1/3 for active attackers), then the anonymity factor is still 1/n-1, where n is the number of honest participants; but n is smaller than the number of visible participants in the protocol: the size of the transaction is not necessarily correlated with how anonymous it is! If the number of attackers is above this fraction, then the secret sharing scheme may leak information and no anonymity is gained. This allows for a denial of service attack against a mixing protocol; we describe some mitigation strategies against this attack later. (Note, however, that no matter how many attackers there are, you are guaranteed to not lose any Bitcoins, due to the verification step in (2).)

In practice¶

As with traditional mixing, a number of precautions should be taken in order to avoid accidentally revealing information about the source of Bitcoins via a side-channel. Consider the case where Alice has 2 BTC tied to her real-world identity, and she would like to anonymously pay 3/4 BTC to Bob.

Alice first prepare by creating a pool of empty wallets, which she will use to carry the anonymized Bitcoins. Alice connects to a tracker for 1 BTC mixing over Tor. (1 BTC is the amount she would like to pay to Bob, rounded up.) She waits a time window to expire for the next mixing, and then as the protocol takes place submits her (public) input wallet and her (anonymous) output wallet. If the protocol fails, she throws out her output wallet and submits a new one next time, and blacklists the misbehaving node if she can figure out who it is.

Once Alice has successfully carried out a mixing, she flips a coin (which comes up tails with probability 1/m). If the coin comes up heads, she waits for another mixing. The number of mixing transactions she is expected to perform is m (obeying the geometric distribution, so selected because it makes all output wallets behave identically with regards to remixing or exiting). Once Alice exits mixing, she now has a wallet containing an anonymous Bitcoin (more precisely, this Bitcoin could be attributable with equal probability to any of the other wallets that she participated in mixes with). She transfers 3/4 BTC to Bob, leaving 1/4 BTC in her wallet.

The remaining Bitcoins in the wallet should now be considered tainted (as they now have a direct relationship to Bob, who may have a public wallet). These Bitcoins should be split into mixable amounts and reanonymized, before used for any other purposes. Even after anonymization, these coins must still be used with care: in particular, they must not be transferred back to the original, public Bitcoin account. In such a situation, the graph structure of mixing transactions looks like this:

(The green node is your public node, the red node is your anonymous node). Network analysis that looks for cycles in Bitcoin transfers will be able to identify any transactions, even if the anonymizing pool has a large amount of turnover (though, amusingly enough, if many participants generate cycles, this attack is harder to carry out). To assist in the tracking of these coins, we suggest the development of wallet management software that can handle thousands of private keys, sort by “size of coin”, and track the easily traceable transaction history associated with any given coin.

In order to protect herself against Sybil attacks, Alice may wish to select her mixing tracker with care. Some mixing trackers could charge fees for listing: with sufficient volume, these charges would make it more expensive to carry out a sustained Sybil attack. (The fees can then be turned around and used to pay for the processing of the complicated mixing transactions, which have the social expectation of being accompanied with a transaction fee.) Every mixing should be conducted with a different IP address; if Alice is using Tor for anonymity she needs to reanonymize her connection each time.

Conclusion¶

Secure multi-party computation has always been in the eye of users of Bitcoin seeking anonymization, but to date there has not been any plausible implementation strategy for any such computation. We hope that this document describes such an implementation strategy and leads the way to a better ecosystem of Bitcoin anonymizers. As the fiascos at Bitcoinica and other exchanges have demonstrated, relying on third party wallets is dangerous. Fortunately, they are also unnecessary.

Acknowledgments¶

I would like to thank Eric Price for playing an instrumental role in the formulation of this protocol.

Why verification results in higher quality code

June 23, 2012

Correctness is overrated. After all, no one knows what it means for any reasonably complicated system to be “correct”, and even when we do, the mileposts move around on a daily basis. With the raison d’être of formal verification stripped away, can we still consider it a worthy goal?

Perhaps verification results in higher quality code. But this is not obviously true: correctness is not quality. We might hope that high quality code is readable and easily understood, that it should be as self-contained and independent from the rest of the system, that it is efficient and economical. There is no a priori reason to believe that verification would grant us any of these properties. No matter how horrible some code is, as long as it is correct, there exists a proof which vouches for its correctness.

But as anyone who has gone through the sweat and tears of verifying a program can tell you, formal verification really does make your code better. Here’s the secret: proving theorems is really hard. If we want any hope to successfully prove something about a program, we must to make reasoning about the code as easy as possible. A program under verification irresistibly gravitates towards it’s most “reasonable” form, because otherwise the proofs are simply too arduous to carry out. And in this form, the tenets of high quality code follow.

Take for example Bedrock, a system for building verified low-level programs which manipulate pointers and registers. These are programs that deal with mutable state, a feature which is well known to dramatically increase the difficulty of reasoning. Bedrock, and many systems like it, would be dead out of the water if not for the development of an important system called separation logic. The central idea behind it is so obvious to any experienced practitioner it is barely worth stating: private local state is easier to reason about than public global state—**modularity** is good. It enforces this through a clever formalism, the star operator, which combines two assertions about two regions of memory while assuring that the regions are disjoint. Regardless, the end result is this: if your components are independent, the theorem proving is easy; if your components are tangled together, the theorem proving is hard. You do the math.

But it doesn’t stop there. When different components do interact, the principle of encapsulation says that I do not want to know all of the gory details of a component, just its high-level interface. In theorem prover land, “all of the gory details” means unmanageably large facts about many pointers, and a “high-level interface” is an abstract predicate which rolls up all of these facts into a single, cohesive logical fact (“this is a linked list.”) Developing these predicates is critical to keeping your theorem statements concise and understandable, and in higher-order provers like Bedrock, they can apply not only to data but also to code, i.e. function pointers.

The tenets of high quality code speak for code that is written for humans to understand, and not just machines to execute. But code that is written for machines to understand have many of the same properties that are valued by humans, for if they do not, getting the machine to “understand” becomes an impossible task. Computers may be simple-minded, but all that means is code a computer can understand is code that you can understand too. And that is high quality code.

Thoughts on gamifying textbooks

May 24, 2012

Earlier this year, Woodie Flowers wrote this criticism of MITx:

We seem to have decided to offer “courses” rather than participate in the exciting new process of replacing textbooks with more effective training tools.

Logitext, true to its name, was intended to explore what a chapter from a next-generation textbook on formal logic might look like. But if you asked anyone what subjects the most important textbooks of this century would be about, I doubt logic would be particularly high on anyone’s list. In terms of relevance, Logitext misses the mark. But I do think there are some design principles that Logitext helps elucidate.

On interactivity¶

It is a good thing that the quality of the Metropolitan Opera’s productions is not correlated with the quality of their website pages. Here, we have an egregious example of interactivity for the sake of interactivity, with no real benefit to the presentation of information.

There are many things that interactivity can bring the table. However, interactive textbooks should still look like textbooks. The conventional textbook is a masterwork of design for the static medium: it is skimmable, supports random access and easily searched. You cannot beat this format at its own game, no matter how well designed your video game levels may be.

In order to apply interactivity tastefully, you must consider what the limitations of the static medium were: it is precisely here where interactivity can bring something to the table. Here are some examples:

Every field of study has its jargon, which assists people versed with the language but impedes those who are not. In a static medium, you can only define jargon a limited number of times: it clutters up a text to redefine it a term every time it shows up in the text, even if you know your students frequently forget what a term means. In an a dynamic medium, the fix is trivial: tooltips. Logitext did not start off with tooltips, but I quickly learned readers were getting confused about the meanings of “conjunction”, “disjunction” and “turnstile”. Tooltips let us easily extend the possible audience of a single work.
The written medium demands a linear narrative, only sparingly stopping for questions or elaborations. Too many waypoints, and you risk losing the reader. In an interactive text, the system can give context-sensitive information only when it is relevant. In Logitext, when a reader clicks on a clause which requires an instantiation of a variable, the system explains how this inference rule works. This explanation is given elsewhere in a more general explanation of how quantifiers work, but the system also knows how to offer this information a timely and useful manner.
Sometimes, the information you are trying to convey should also be given in another form. It’s the difference between describing a piece of music or actually hearing it, the difference between giving someone a map or letting them wander around for a few hours. Whenever possible, show, don’t tell. And if possible, show in different ways—different intuitions work for different people. I can explain what the “no free occurrence” rule is until the chickens come home, but the unexpected renaming of variables when you click “forall right” immediately introduces the intuitive idea (even though it still needs to be explained for full understanding.)

It is telling that each of these enhancements have been abused by countless systems in the past. Many people have a dim view of tooltips and Clippy, myself included. I think one way to limit the damage of any such abuse is to demand that the textbook gracefully degrade without interactivity. (For technological reasons, Logitext doesn’t render correctly without JavaScript, but I consider this a bug.)

On exercise design¶

In order to truly learn something, you must solve some problems with it. Gamification of exercises has done well at supplying extrinsic motivation, but up until recently, the state of the art in online exercise design has been something like this. I find this depressing: there is no indication the student is really learned the underlying concepts, or has just constructed an elaborate private system which happens also to be wrong. Remember when you were asked to show your work? We should be demanding this online too.

This is easier said than done. It was no accident that I picked the sequent calculus for Logitext: while logic holds a very special place in my heart, I also knew that it would be easy to automate. The road to a system for something as simple as High School Algebra will be long and stony. Logitext sidesteps so many important questions, even ones as simple as “How do we get student’s answers (with work) onto the computer?” let alone thorny issues such as one addressed by a recent PhD thesis: “How do we tell if the student needs to provide more work?”

I think I came away with two important ideas from Logitext. The first is a strong conviction that theorem provers are the right foundational technology for interesting exercises in mathematics. Building the Logitext system was a bit of work, but once the platform was designed, defining exercises was simple, literally a line of code per exercise. If every exercise had to be implemented from scratch, the cost would have been prohibitively expensive, and the tutorial would have looked a lot different. We know that, in principle, we can formalize all of mathematics; in the case of elementary mathematics, we may not even have to solve open research questions to get there. Theorem provers also know when you’ve gotten the answer right, and I suspect from a gamification perspective that is all you need.

The second important idea is that computers can assist exploration of high-level concepts, even when the foundations are shaky. For some people, copying down a string of mathematical symbols quickly and accurately is an ordeal: a system like Logitext abstracts that away and allows them to see the higher order structure of these proofs. It is true that it is better of students have a strong foundation, but if we had a system which could prevent them from getting left behind, I think the world would be strictly better for it. The solution to a curriculum which relies on a freakish knack for manipulating abstract symbols should not be eliminating symbols, but making it easier to manipulate them. Educational systems should have what I call adjustable affordance: you should have the option to do the low level manipulations, or have the system do them for you.

Conclusion¶

I have articulated the following design principles for the gamification of textbooks:

An interactive textbook should still look like a textbook.
Use interactivity to extend the possible audience of a textbook by assisting those with less starting knowledge.
Use interactivity to offer context-sensitive information only when relevant.
Use interactivity to show; but be sure to explain afterwards.
(Heavily modified) theorem provers will be the fundamental technology that will lie beneath any nontrivial exercise engine.
One of the most important contributions of computers to exercises will not be automated grading, but assisted exploration.

I’ve asserted these very confidently, but the truth is that they are all drawn from a sample size of one. While the Logitext project was a very informative exercise, I am struck by how little I know about K12 education. As an ace student, I have a rather unrepresentative set of memories, and they are probably all unreliable by now anyway. Education is hard, and while I think improved textbooks will help, I don’t really know if they will really change the game. I am hopeful, but I have the nagging suspicion that I may end up waiting a long time.

An Interactive Tutorial of the Sequent Calculus

May 22, 2012

You can view it here: An Interactive Tutorial of the Sequent Calculus. This is the “system in three languages” that I was referring to in this blog post. You can also use the system in a more open ended fashion from this page. Here’s the blurb:

This interactive tutorial will teach you how to use the sequent calculus, a simple set of rules with which you can use to show the truth of statements in first order logic. It is geared towards anyone with some background in writing software for computers, with knowledge of basic boolean logic.

Developing this system has been quite a fascinating foray into user interface and game design. While similar systems have been built in the past, my goal was a little different: I wanted something simple enough and accessible enough that anyone with a vague interest in the topic could pick it up, work through it in an hour, and learn something interesting about formal logic. I don’t think this demo will be particularly successful among the math-phobic, but perhaps it will be more successful with people who have an intrinsic interest in this sort of thing. I must have incorporated dozens of comments from my many friends at MIT who put up with my repeated badgering about the system. The first version looked very different. I give my superlative thanks to my beta testers.

There is a lot of hubbub about the next generation of online teaching systems (edX), and this demo (because, really, that’s what it is) is intended to explore how to blur the line between textbooks and video games. It doesn’t really go far enough: it is still too much like a textbook, there is not enough creative latitude in the early exercises. I don’t feel like I have gotten the right feel that a video game which progressively layers concepts (e.g. Portal). On the other hand, I do feel I have done a good job of making the text skimmable, and there are a lot of little touches which I think do enhance the experience. I am embarrassed to admit that there are some features which are not included because they were technically too annoying to implement.

If there is one important design principle behind this demo, it is that there is a difference between giving a person a map and letting a person wander around in a city for a few hours. But, fair reader, you probably don’t have a few hours, and I am probably demanding too much of your attention. Nevertheless, forgive my impoliteness and please, take it out for a spin.

Ubuntu Precise upgrade (Thinkpad/Xmonad)

May 18, 2012

It is once again time for Ubuntu upgrades. I upgraded from Ubuntu Oneiric Ocelot to Ubuntu Precise Pangolin (12.04), which is an LTS release. Very few things broke (hooray!)

The Monospace font changed to something new, with very wide glyph size. The old font was DejaVuSansMono, which I switched back to.
Xournal stopped compiling; somehow the linker behavior changed and you need to specify the linker flags manually.
gnome-keyring isn’t properly starting up for us non-Unity folks. The underlying problem appears to be packaging errors by Gnome, but adding eval `gnome-keyring-daemon -s to my .xsession` cleared things up.
The battery icon went away! I assume some daemon is failing to get run, but since I have a very nice xmobar display I’m not mourning its loss.
Default GHC is GHC 7.4.1! Time to rebuild; no Haskell Platform yet. (Note that GHC 7.4.1 doesn’t support the gold linker; this is the chunk-size error.)

I also upgraded my desktop from the previous LTS Lucid Lynx.

I had a lot of invalid signature errors, which prevented the release upgrade script from running. I fixed it by uninstalling almost all of my PPAs.
Offlineimap needed to be updated because some Python libraries it depended on had backwards incompatible changes (namely, the imap library.)
VirtualBox messed up its revision numbers, which contained an underscore which is forbidden. Manually editing it out of the file seems to fix it.

« Newer Posts Older Posts »