ezyang's blog

Generalizing the programmable semicolon

October 3, 2012

Caveat emptor: half-baked research ideas ahead.

What is a monad? One answer is that it is a way of sequencing actions in a non-strict language, a way of saying “this should be executed before that.” But another answer is that it is programmable semicolon, a way of implementing custom side-effects when doing computation. These include bread and butter effects like state, control flow and nondeterminism, to more exotic ones such as labeled IO. Such functionality is useful, even if you don’t need monads for sequencing!

Let’s flip this on its head: what does a programmable semicolon look like for a call-by-need language? That is, can we get this extensibility without sequencing our computation?

At first glance, the answer is no. Most call-by-value languages are unable to resist the siren’s song of side effects, but in call-by-need side effects are sufficiently painful that Haskell has managed to avoid them (for the most part!) Anyone who has worked with unsafePerformIO with NOINLINE pragma can attest to this: depending on optimizations, the effect may be performed once, or it may be performed many times! As Paul Levy says, “A third method of evaluation, call-by-need, is useful for implementation purposes. but it lacks a clean denotational semantics—at least for effects other than divergence and erratic choice whose special properties are exploited in [Hen80] to provide a call-by-need model. So we shall not consider call-by-need.” Paul Levy is not saying that for pure call-by-need, there are no denotational semantics (these semantics coincide exactly with call-by-name, call-by-need’s non-memoizing cousin), but that when you add side-effects, things go out the kazoo.

But there’s a hint of an angle of attack here: Levy goes on to show how to discuss side effects in call-by-name, and has no trouble specifying the denotational semantics here. Intuitively, the reason for this is that in call-by-name, all uses (e.g. case-matches) on lazy values with an effect attached cause the effect to manifest. Some effects may be dropped (on account of their values never being used), but otherwise, the occurrence of effects is completely deterministic.

Hmm!

Of course, we could easily achieve this by giving up memoization, but that is a bitter pill to swallow. So our new problem is this: How can we recover effectful call-by-name semantics while preserving sharing?

In the case of the Writer monad, we can do this with all of the original sharing. The procedure is very simple: every thunk a now has type (w, a) (for some fixed monoidal w). This tuple can be shared just as the original a was shared, but now it also has an effect w embedded with it. Whenever a would be forced, we simply append effect to the w of the resulting thunk. Here is a simple interpreter which implements this:

{-# LANGUAGE GADTs #-}
import Control.Monad.Writer

data Expr a where
    Abs :: (Expr a -> Expr b) -> Expr (a -> b)
    App :: Expr (a -> b) -> Expr a -> Expr b
    Int :: Int -> Expr Int
    Add :: Expr Int -> Expr Int -> Expr Int
    Print :: String -> Expr a -> Expr a

instance Show (Expr a) where
    show (Abs _) = "Abs"
    show (App _ _) = "App"
    show (Int i) = show i
    show (Add _ _) = "Add"
    show (Print _ _) = "Print"

type M a = Writer String a
cbneed :: Expr a -> M (Expr a)
cbneed e@(Abs _) = return e
cbneed (App (Abs f) e) =
    let ~(x,w) = run (cbneed e)
    in cbneed (f (Print w x))
cbneed e@(Int _) = return e
cbneed (Add e1 e2) = do
    Int e1' <- cbneed e1
    Int e2' <- cbneed e2
    return (Int (e1' + e2'))
cbneed (Print s e) = do
    tell s
    cbneed e

sample = App (Abs (\x -> Print "1" (Add x x))) (Add (Print "2" (Int 2)) (Int 3))
run = runWriter

Though the final print out is "122" (the two shows up twice), the actual addition of 2 to 3 only occurs once (which you should feel free to verify by adding an appropriate tracing call). You can do something similar for Maybe, by cheating a little: since in the case of Nothing we have no value for x, we offer bottom instead. We will never get called out on it, since we always short-circuit before anyone gets to the value.

There is a resemblance here to applicative functors, except that we require even more stringent conditions: not only is the control flow of the computation required to be fixed, but the value of the computation must be fixed too! It should be pretty clear that we won’t be able to do this for most monads. Yesterday on Twitter, I proposed the following signature and law (reminiscent of inverses), which would need to be implementable by any monad you would want to do this procedure on (actually, you don’t even need a monad; a functor will do):

extract :: Functor f => f a -> (a, f ())
s.t.  m == let (x,y) = extract m in fmap (const x) y

but it seemed only Writer had the proper structure to pull this off properly (being both a monad and a comonad). This is a shame, because the application I had in mind for this bit of theorizing needs the ability to allocate cells.

Not all is lost, however. Even if full sharing is not possible, you might be able to pull off partial sharing: a sort of bastard mix of full laziness and partial evaluation. Unfortunately, this would require substantial and invasive changes to your runtime (and I am not sure how you would do it if you wanted to CPS your code), and so at this point I put away the problem, and wrote up this blog post instead.

Template project for GHC plugins

September 28, 2012

There is a bit of scaffolding involved with making Core-to-Core transforming GHC plugins, so I made a little project, based off of Max Bolingbroke’s examples, which is a nice, clean template project which you can use to create your own GHC plugins. In particular, it has documentation and pointers to the GHC source as well as a handy-dandy shell script rename.sh MyProjectName which will let you easily rename the template project into whatever name you want. You can find it on GitHub. I’ll probably be adding more to it as I go along; let me know about any bugs too.

"This is really the End."

September 24, 2012

Done. This adjective rarely describes any sort of software project—there are always more bugs to fix, more features to add. But on September 20th, early one afternoon in France, Georges Gonthier did just that: he slotted in the last component of a six year project, with the laconic commit message, “This is really the End.”

It was complete: the formalization of the Feit-Thompson theorem.

If you’re not following the developments in interactive theorem proving or the formalization of mathematics, this achievement may leave you scratching your head a little. What is the Feit-Thompson theorem? What does it mean for it to have been formalized? What was the point of this exercise? Unlike the four coloring theorem which Gonthier and his team tackled previously in 2005, the Feit-Thompson theorem (also known as the odd order theorem) is not easily understandable by non-mathematician without a background in group theory (I shall not attempt to explain it). Nor are there many working mathematicians whose lives will be materially impacted by the formalization of this particular theorem. But there is a point; one that can be found between the broad motivation behind the computer-assisted theorem proving and the fascinating social context surrounding the Feit-Thompson theorem.

While the original vision of automated theorem proving, articulated by David Hilbert, was one where computers completely replaced mathematicians in the work of proving theorems (mathematicians would be consigned to dreaming up interesting theorem statements to prove), modern researchers in the formalization of mathematics consider the more modest question of whether or not a proof is true. That is to say, humans are fallible and can make mistakes while reviewing the proofs of their colleagues, while the computer is an idiot-savant who, once spoon-fed the contents of a proof, is essentially infallible in its judgment of correctness. The proofs of undergraduate mathematics classes don’t really need this treatment (after all, they have been inflicted on countless math majors over the centuries), but there are a few areas where this extra boost of confidence is really appreciated:

In theorems which are found in conventional software systems, due to the large number of incidental details making proofs superficial but large,
In extremely tricky, consistently unintuitive areas of mathematics, including questions of provability and soundness, and
In extremely long and involved proofs, for which traditional verification by human mathematicians is a Herculean task.

It is in this last area that we can really place both the Four Color Theorem and the Feit-Thompson Theorem. The Four Color Theorem’s original proof was controversial because of its reliance on a computer to solve 1936 special cases which the proof depended on. The Feit-Thompson Theorem is a bit of an annoyance to many mathematicians, due to its length: 255 pages, the global structure of which has resisted over half a century of attempts at simplification. And it itself is just a foothill on the way to the monster theorem that is the classification of finite simple groups (comprising tens of thousands of pages of proof.)

Formalizing the entirety of the Feit-Thompson is a technical achievement which applies computer-assisted theorem proving to a tremendously nontrivial mathematical proof, and does so in a setting where the ability to state “Feit-Thompson is True” has non-zero information content. It demonstrates that theorem proving in Coq does scale (at least, well enough pull off a proof of this scale) and it has generated a large toolbox for handling features of real mathematics including heavily overloaded notation and handing the composition of many, disparate theories (all of which the proof draws upon).

Of course, there is one niggling concern: what if there is a bug, and what Gonthier and his team have proved is not actually the Feit-Thompson Theorem? A story I once heard while I was at Galois was of a research intern who had been tasked with proving some theorems. He had some trouble at first, but eventually pulled off the first theorem and moved to the next, gradually gaining speed. Eventually, his advisor took a look at these proofs and realized that he had been taking advantage of a soundness problem in the proof checker to solve the proofs. I wonder if this is a story that wizened PhD students tell the first years to give them nightmares.

But I think the risk of this is very low. The mechanized proof follows the original quite closely, and so a wrong definition or a soundness bug is likely only to pose a local problem for the proof that can be easily solved. Indeed, if there is a problem with the proof that is so great that the proof must be thrown out and done again, that would be very interesting—it would say something about the original proof as well. That would be surprising, and worthy of attention, for it could mean the end of the proof of the Feit-Thompson Theorem as we know it. But I don’t think we live in that world.

Pop the champagne, and congratulations to Gonthier and his team!

Unintuitive facts about Safe Haskell

September 17, 2012

Safe Haskell is a new language pragma for GHC which allows you to run untrusted code on top of a trusted code base. There are some common misconceptions about how Safe Haskell works in practice. In this post, I’d like to help correct some of these misunderstandings.

[`system 'rm -Rf /' :: IO ExitCode`] is accepted by Safe Haskell¶

Although an IO action here is certainly unsafe, it is not rejected by Safe Haskell per se, because the type of this expression clearly expresses the fact that the operation may have arbitrary side effects. Your obligation in the trusted code base is to not run untrusted code in the IO monad! If you need to allow limited input/output, you must define a restricted IO monad, which is described in the manual.

Safe Haskell programs can hang¶

Even with killThread, it is all to easy to permanently tie up a capability by creating a non-allocating infinite loop. This bug has been open for seven years now, but we consider this a major deficiency in Safe Haskell, and are looking for ways to prevent this from occurring. But as things are now, Safe Haskell programs need to be kept under check using operating system level measures, rather than just Haskell’s thread management protocols.

Users may mistrust `Trustworthy` modules¶

The Trustworthy keyword is used to mark modules which use unsafe language features and/or modules in a “safe” way. The safety of this is vouched for by the maintainer, who inserts this pragma into the top of the module file. Caveat emptor! After all, there is no reason that you should necessarily believe a maintainer who makes such a claim. So, separately, you can trust a package, via the ghc-pkg database or the -trust flag. But GHC also allows you to take the package maintainer at their word, and in fact does so by default; to make it distrustful, you have to pass -fpackage-trust. The upshot is this:

Module trusted?	(no flags)	`-fpackage-trust`
Package untrusted	Yes	No
Package trusted	Yes	Yes

If you are serious about using Safe Haskell to run untrusted code, you should always run with -fpackage-trust, and carefully confer trusted status to packages in your database. If you’re just using Safe Haskell as a way of enforcing code style, the default are pretty good.

Explicit untrust is important for maintaining encapsulation¶

Safe Haskell offers safety inference, which automatically determines if a module is safe by checking if it would compile with the -XSafe flag. Safe inferred modules can then be freely used by untrusted code. Now, suppose that this module (inferred safe) was actually Data.HTML.Internal, which exported constructors to the inner data type which allowed a user to violate internal invariants of the data structure (e.g. escaping). That doesn’t seem very safe!

The sense in which this is safe is subtle: the correctness of the trusted code base cannot rely on any invariants supplied by the untrusted code. For example, if the untrusted code defines its own buggy implementation of a binary tree, catching the bugginess of the untrusted code is out of scope for Safe Haskell’s mission. But if our TCB expects a properly escaped HTML value with no embedded JavaScript, the violation of encapsulation of this type could mean the untrusted code could inject XSS.

David Terei and I have some ideas for making the expression of trust more flexible with regards to package boundaries, but we still haven’t quite come to agreement on the right design. (Hopefully we will soon!)

Conclusion¶

Safe Haskell is at its heart a very simple idea, but there are some sharp edges, especially when Safe Haskell asks you to make distinctions that aren’t normally made in Haskell programs. Still, Safe Haskell is rather unique: while there certainly are widely used sandboxed programming languages (Java and JavaScript come to mind), Safe Haskell goes even further, and allows you to specify your own, custom security policies. Combine that with a massive ecosystem of libraries that play well with this feature, and you have a system that you really can’t find anywhere else in the programming languages universe.

The Y Combinator and strict positivity

September 12, 2012

One of the most mind-bending features of the untyped lambda calculus is the fixed-point combinator, which is a function fix with the property that fix f == f (fix f). Writing these combinators requires nothing besides lambdas; one of the most famous of which is the Y combinator λf.(λx.f (x x)) (λx.f (x x)).

Now, if you’re like me, you saw this and tried to implement it in a typed functional programming language like Haskell:

Prelude> let y = \f -> (\x -> f (x x)) (\x -> f (x x))

<interactive>:2:43:
    Occurs check: cannot construct the infinite type: t1 = t1 -> t0
    In the first argument of `x', namely `x'
    In the first argument of `f', namely `(x x)'
    In the expression: f (x x)

Oops! It doesn’t typecheck.

There is a solution floating around, which you might have encountered via a Wikipedia article or Russell O’Connor’s blog, which works by breaking the infinite type by defining a newtype:

Prelude> newtype Rec a = In { out :: Rec a -> a }
Prelude> let y = \f -> (\x -> f (out x x)) (In (\x -> f (out x x)))
Prelude> :t y
y :: (a -> a) -> a

There is something very strange going on here, which Russell alludes to when he refers to Rec as “non-monotonic”. Indeed, any reasonable dependently typed language will reject this definition (here it is in Coq):

Inductive Rec (A : Type) :=
  In : (Rec A -> A) -> Rec A.

(* Error: Non strictly positive occurrence of "Rec" in "(Rec A -> A) -> Rec A". *)

What is a “non strictly positive occurrence”? It is reminiscent to “covariance” and “contravariance” from subtyping, but more stringent (it is strict, after all!) Essentially, a recursive occurrence of the type (e.g. Rec) may not occur to the left of a function arrow of a constructor argument. newtype Rec a = In (Rec a) would have been OK, but Rec a -> a is not. ((Rec a -> a) -> a is not OK either, despite Rec a being in a positive position.)

There are good reasons for rejecting such definitions. The most important of these is excluding the possibility of defining the Y Combinator (party poopers!) which would allow us to create a non-terminating term without explicitly using a fixpoint. This is not a big deal in Haskell (where non-termination abounds), but in a language for theorem proving, everything is expected to be terminating, since non-terminating terms are valid proofs (via the Curry-Howard isomorphism) for any proposition! Thus, adding a way to sneak in non-termination with the Y Combinator would make the type system very unsound. Additionally, there is a sense in which types that are non-strictly positive are “too big”, in that they do not have set theoretic interpretations (a set cannot contain its own powerset, which is essentially what newtype Rec = In (Rec -> Bool) claims).

To conclude, types like newtype Rec a = In { out :: Rec a -> a } look quite innocuous, but they’re actually quite nasty and should be used with some care. This is a bit of a bother for proponents of higher-order abstract syntax (HOAS), who want to write types like:

data Term = Lambda (Term -> Term)
          | App Term Term

Eek! Non-positive occurrence of Term in Lambda strikes again! (One can feel the Pittsburgh-trained type theorists in the audience tensing up.) Fortunately, we have things like parametric higher-order abstract syntax (PHOAS) to save the day. But that’s another post…

Thanks to Adam Chlipala for first introducing me to the positivity condition way back last fall during his Coq class, Conor McBride for making the offhand comment which made me actually understand what was going on here, and Dan Doel for telling me non-strictly positive data types don’t have set theoretic models.

So you want to hack on IMAP...

August 31, 2012

(Last IMAP themed post for a while, I promise!)

Well, first off, you’re horribly misinformed: you do not actually want to hack on IMAP. But supposing, for some masochistic reason, you need to dig in the guts of your mail synchronizer and fix a bug or add some features. There are a few useful things to know before you start your journey…

Read your RFCs. RFC 3501 is the actual specification, while RFC 2683 gives a lot of helpful tips for working around the hairy bits of the many IMAP servers out there in practice. You should also know about the UIDPLUS extension, RFC 4315, which is fairly well supported and makes a client implementor’s life a lot easier.
IMAP is fortunately a text-based protocol, so you can and should play around with it on the command line. A great tool to use for this is imtest, which has all sorts of fancy features such as SASL authentication. (Don’t forget to rlwrap it!) Make sure you prefix your commands with an identifier (UID is a valid identifier, so typing UID FETCH ... will not do what you want.)
It is generally a better idea to use UIDs over sequence numbers, since they are more stable, but be careful: as per the specification, UID prefixed commands never fail, so you will need to check untagged data in the response to see if anything actually happened. (If you have a shitty IMAP library, it may not clear out untagged data between requests, so watch out for stale data!) Oh, and look up UIDVALIDITY.
There exist a lot of software that interfaces with IMAP, all of which has accreted special cases for buggy IMAP servers over the years. It is well worth sourcediving a few to get a sense for what kinds of things you will need to handle.

OfflineIMAP sucks

August 30, 2012

I am going to share a dirty little secret with you, a secret that only someone who uses and hacks on OfflineIMAP could reasonably know: OfflineIMAP sucks. Of course, you can still use software that sucks (I do all the time), but it’s useful to know what some of its deficiencies are, so that you can decide if you’re willing to put up with the suckage. So why does OfflineIMAP suck?

This is not really a constructive post. If I were actually trying to be constructive, I would go and fix all of these problems. But all of these are big problems that require substantial amounts of effort to fix… and unfortunately I don’t care about this software enough.

Project health is anemic¶

The original author, John Goerzen, has moved on to greener, more Haskell-like pastures, and the current maintainership is having difficulty finding the time and expertise to do proper upkeep on the software. Here is one of the most recent calls for maintainers, as both of the two co-maintainers who were maintaining OfflineIMAP have failed to have enough free time to properly keep track of all submitted patches. There still seem to be enough people with a vested interest in seeing OfflineIMAP not bitrot that the project should continue to keep working for the foreseeable future, but one should not expect any dramatic new features or intensive work to be carried out on the codebase.

Nearly no tests¶

For most of OfflineIMAP’s history, there were no tests. While there is now a dinky little test suite, it has nowhere near the coverage that you would want out of such a data-critical program. Developers are not in the habit of adding new regression tests when they fix bugs in OfflineIMAP. But perhaps most perniciously, there is no infrastructure for testing OfflineIMAP against as wide a range of IMAP servers as possible. Here are where the really bad bugs can show up, and the project has none of the relevant infrastructure.

Over-reliance on UIDs¶

OfflineIMAP uses UIDs as its sole basis for determining whether or not two messages correspond to each other. This works almost most of the time, except when it doesn’t. When it doesn’t, you’re in for a world of hurt. OfflineIMAP does not support doing consistency checks with the Message-Id header or the checksum of the file, and it’s X-OfflineIMAP hack for servers that don’t support UIDPLUS ought to be taken out back and shot. To it’s credit, however, it has accreted most of the special casing that makes it work properly in all of the weird cases that show up when you have UIDs.

Poor space complexity¶

The memory usage of OfflineIMAP is linear with the number of messages in your inbox. For large mailboxes, this effectively means loading hundreds of thousands of elements into a set and doing expensive operations on it (OfflineIMAP consistently pegs my CPU when I run it). OfflineIMAP should be able to run in constant space, but zero algorithmic thought has been put into this problem space. It also has an extremely stupid default status folder implementation (think repeatedly writing 100MB to disk for every single file you upload), though you can fix that fairly easily by setting status_backend = sqlite. Why is it not default? Because it’s still experimental. Hm…

Unoptimized critical path¶

OfflineIMAP was never really designed for speed. This shows up in the synchronization time it takes, even in the common cases of no changes or just downloading a few messages. If one’s goal is to download your new messages as quickly as possible, a lot of adjustments could be made, including reducing the number of IMAP commands (esp. redundant selects and expunges), reducing the number of times we touch the filesystem, asynchronous filesystem access, not loading the entirety of a downloaded message in memory, etc. A corollary is that OfflineIMAP doesn’t really seem to understand what data it is allowed to lose, and what data it must fsync before carrying on to the next operation: “safety” operations are merely sprinkled through the code without any well-defined discipline. Oh, and how about some inotify?

Brain-dead IMAP library¶

OK, this one is not really OfflineIMAP’s fault, but imaplib2 really doesn’t protect you from the pointy-edged bits of the IMAP protocol (and how it is implemented in the real-world) at all. You have to do it all yourself. This is dumb, and a recipe for disaster when you forget to check UIDVALIDITY in that new IMAP code you were writing. Additionally, it encodes almost no knowledge of the IMAP RFC, with respect to responses to commands. Here is one place where some more type safety would really come in handy: it would help force people to think about all of the error cases and all of the data that could occur when handling any given command.

Algorithmica obscura¶

OfflineIMAP has a fair bit of debugging output and UI updating code interspersed throughout its core algorithms, and the overall effect is that it’s really hard to tell what the overall shape of the algorithm being employed is. This is not good if the algorithm is kind of subtle, and relies on some global properties of the entire execution to ensure its correctness. There is far too much boilerplate.

Conclusion¶

In conclusion, if you would like to use OfflineIMAP on a well-behaved, popular, open-source IMAP server which a maintainer also happens to use with a relatively small number of messages in your INBOX and are willing to put up with OfflineIMAP being an immutable black box that consumes some non-zero amount of time synchronizing in a wholly mysterious way, and never want to hack on OfflineIMAP, there is no finer choice. For everyone else, well, good luck! Maybe it will work out for you! (It mostly does for me.)

How OfflineIMAP works

August 27, 2012

As software engineers, we are trained to be a little distrustful of marketing copy like this:

OfflineIMAP is SAFE; it uses an algorithm designed to prevent mail loss at all costs. Because of the design of this algorithm, even programming errors should not result in loss of mail. I am so confident in the algorithm that I use my own personal and work accounts for testing of OfflineIMAP pre-release, development, and beta releases.

What is this algorithm? Why does it work? Where is the correctness proof? Unfortunately, no where in OfflineIMAP’s end user documentation is the algorithm described in any detail that would permit a software engineer to convince himself of OfflineIMAP’s correctness. Fortunately for us, OfflineIMAP is open source, so we can find out what this mysterious algorithm is. In fact, OfflineIMAP’s synchronization algorithm is very simple and elegant. (Nota bene: for simplicity’s sake, we don’t consider message flag synchronization.)

Preliminaries¶

Define our local and remote repositories (Maildir and IMAP, respectively) to consist of sets over messages L and R. In a no-delete synchronization scheme, we would like to perform some set of operations such that end states of the repositories L’ and R’ are L ∪ R.

However, no-delete synchronization schemes work poorly for email, where we would like the ability to delete messages and have those changes be propagated too. To this end, OfflineIMAP defines a third repository called the status repository, also a set over messages, which says whether or not a message has been synchronized in the past without an intervening synchronized delete. There are now seven possible states for a message to have, based on which repositories it is a member:

Considering all possible combinations:

Synchronized (L,R,S): The message is fully synchronized and needs no further processing.
New Local (L): The message was newly added to the local repository and needs to be uploaded.
New Remote (R): The message was newly added to the remote repository and needs to be downloaded.
Status Missing (L,R): The message is synchronized but our status is out-of-date.
Remote Removed (L,S): The message was synchronized, but since then was removed from the remote; it should now be removed from local.
Local Removed (R,S): The message was synchronized, but since then was removed from the local; it should now be removed from remote.
Missing (S): The message has been deleted everywhere and our status has a stale entry for it.

The green-shaded region of the Venn diagram is what we would like L, R and S to cover at the end of synchronization.

Algorithm¶

Define a synchronization operation on a source, destination and status repository syncto(src, dst, status) to be these two steps:

Calculate the set difference src - status, and copy these messages to dst and status.
Calculate the set difference status - src, and delete these messages from dst and status.

The full synchronization algorithm is then:

syncto(R, L, S) (download changes)
syncto(L, R, S) (upload changes)

How it works¶

In the absence of crashes, the correctness proof only involves verifying that the status repository invariant (that messages in status have been synchronized in the past without an intervening synchronized delete) is preserved over all four operations, and that the set differences are, in fact, precisely the sets of messages we want to copy and delete. However, we can also try and look at how the local, remote and status repositories change as the algorithm progresses. In particular, the contents of the status repository in the first syncto is slightly surprising as it evolves differently from local, despite having the same operations applied to it (it then evolves in lockstep with remote).

Another important correctness claim is that OfflineIMAP never “loses mail”. Under what conditions is mail deleted? When it is present in status repository, but not in the local or remote repository. So it is easy to see that when the status repository is “lost” (either corrupted, or deleted as the instructions tell you to if you delete the contents of your local folders), OfflineIMAP will conservatively perform a full, no-delete synchronization between the two sources. So long as the status repository never contains data for more messages than it ought to, OfflineIMAP will not delete your mail.

Variations¶

Suppose that I have more disk space available on local disk for Maildir than my remote IMAP server. Eventually, you will end up in the awkward position of wanting to delete messages from your remote IMAP server without correspondingly nuking them from your local mail store. OfflineIMAP provides the maxage option (in which OfflineIMAP refuses to acknowledge the existence of messages older than some sliding window), but what if we really wanted to be sure that OfflineIMAP would never ever delete messages from my local repository?

Simple: Skip step 1-2.

Conclusion¶

By utilizing a third repository, for which data loss results in a conservative action on the part of the program, OfflineIMAP achieves its claims of an algorithm designed to prevent mail loss at all costs. It is also a simple algorithm, and I hope that any computer scientist or software engineer using this software will take the time to convince themselves of its correctness, rather than relying on the hearsay of some marketing material.

The Monad Reader: Issue 20

August 25, 2012

After a long delay and a lot of editing, Issue 20 of The Monad Reader is finally out. Check it out!

Applicative functors

August 16, 2012

On the importance of primary sources.

(Introductory material ahead.) Most readers of this blog should have at least a passing familiarity with applicative functors:

class Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b

This interface is quite convenient for day-to-day programming (in particular, it makes for the nice f <$> a <*> b <*> c idiom), but the laws it obeys are quite atrocious:

[identity] pure id <*> v = v
[composition] pure (.) <*> u <*> v <*> w = u <*> (v <*> w) 
[homomorphism] pure f <*> pure x = pure (f x) 
[interchange] u <*> pure y = pure ($ y) <*> u

So, if you (like me twenty-four hours ago) haven’t seen it already, you should show that this interface is equivalent to Applicative:

class Functor f => Monoidal f where
  unit :: f ()
  (**) :: f a -> f b -> f (a,b)

(By the way, if you haven’t shown that join :: m (m a) -> m a for monads is equivalent to bind :: m a -> (a -> m b) -> m b, you should do that too.) The laws for this formulation are much nicer:

[naturality] fmap (f *** g) (u ** v) = fmap f u ** fmap g v
[left identity] unit ** v ≅ v
[right identity] u ** unit ≅ u
[associativity] u ** (v ** w) ≅ (u ** v) ** w

Where f *** g = \(x,y) -> (f x, g y). I’ve prettied things up a bit by using “is isomorphic to” in order to suppress the differences between ((), a) and a, as well as (a,(b,c)) and ((a,b),c), for strict equalities you’ll need some extra functions to massage the results into the right types. It seems that there is a general pattern where the API which has nice formulations of laws is not convenient to program with, and the formulation which is nice to program with does not have nice laws. C’est la vie… but at least they’re equivalent!

With this formulation, it becomes trivial to state what laws commutative applicative functors obey:

[commutativity] u ** v ≅ v ** u

The original paper Applicative Programming With Effects is well worth a read. Check it out! That concludes this public service announcement.

« Newer Posts Older Posts »