July 8, 2011I recently concluded a year long study-abroad program at the University of Cambridge. You can read my original reasons and first impressions here.

It is the Sunday before the beginning of exams, and the weather is spectacular. Most students (except perhaps the pure historians) are dourly inside, too busy revising to take advantage of it. But I am thrust out of my den, constructed of piles of notes and past exam questions, in order to go to one final revision with our history supervisor, Mirjam. I cycle down Grange road, park my bicycle outside Ridley Hall, and am greeted to a pleasant surprise: Mirjam has elected to hold the revision outside on a cluster of park benches flanked everywhere by grass and trees, and has brought a wickerbasket containing fresh fruit, small cupcakes and other treats, and also sparkling wine and beer (perhaps not the best drink for what is ostensibly a revision, but we avail ourselves of it anyway.)

History and Philosophy of Science is a course made of two strands. They start off relatively distinct from each other: History of Science began with Galileo and the dawn of the 16th century, while Philosophy of Science began by analyzing the nature of causality. But by the end of the course, philosophical questions about quantification and quantum mechanics are couched heavily in terms of the histories of these sciences, and prevailing attitudes about the origins of western civilization being rooted in Greece and the mathematics of ancient Babylonia are analyzed with increasing philosophical flavor. The mixture of the two subjects work, and taking this course has been an incredibly rich experience. (At times, I even wish we had been asked to write more than one essay a week—something that would be pretty unheard of for an MIT humanities class.)
It was also hard work. Especially in history, where my initial essays had a rocky start due to my misunderstanding of how I was supposed to be synthesizing material from a broader range of materials than just the survey texts that were required reading (I might have been a “little” negligent doing all of the required reading.) Writing HPS essays required me to block out large chunks of my time during the weekend for them; I rarely stayed up late doing assigned computer science work, but did so more than I would like the night an HPS essay was due. I always attempted to pay attention in lectures (some times more successfully than others—to this day I still don’t really understand the rise of the modern research university system.)

A few professors stand out in my memories. I will never forget Hasok Chang (pictured above)’s introductory lecture, where, in a digression, he explained he originally majored in Physics in Caltech, but switched to philosophy after his professors got annoyed when he asked questions like “What happened before the Big Bang?” It was his first year teaching the Philosophy of the Physical Sciences series of lectures (much to the chagrin of many students, who did not have any past exam questions to study against), but he went on to deliver an extremely solid series of engaging and informative lectures (I think they also appealed a bit to physicists who were tired of such abstract questions as the nature of causation and induction). He even invited us to a round of drinks at the nearby pub after his last lecture, where we had a lively debate about a recent topic over some beers.
Stephen John delivered an extremely lively and thought-provoking series of lectures on ethics in science. Most of his lectures left us with more questions than answers, but they were an elucidating journey into the realms of informed consent, the precautionary principle and cost-benefit analysis. (Informed consent is one of my new personal favorites: it’s an awesome example of how an enshrined medical practice is both too conservative and too liberal.) And I will always remember our supervisor, Richard Jennings (pictured below), as the perpetually smiling, moustached man whose supervisions weren’t really supervisions but rather just conversations about the topics which had been recently covered in lecture.

Among history lecturers, I have to tip my hat to Eleanor Robson, who delivered the last four lectures in history of science. I will admit: I wasn’t too keen on learning about Ancient Babylonia when I first read the syllabus, but what the lectures ended up being were a discussion about historiographical issues: how do historians come to know the “dry facts” that anyone who studied history in High School comes to know too well? In many ways, it’s astounding that we know as much as we do. It’s one of those lessons that you wish you learned earlier, even though, in the back of your head, you know that you wouldn’t have fully appreciated them earlier on.

Simon Schaffer (picture above) was also quite a character, a rather forceful individual who delivered our first set of history lectures. You can perhaps get a flavor of his style from the short BBC series The Light Fantastic, though he was perhaps a smidge more vulgar (in a good way!) when you saw him lecture in person. (“Fat. Irascible. Brilliant. Definitely a role model,” Schaffer, on Tycho Brahe.) And of course, Mirjam, our history supervisor, who persistently encouraged us to improve our essay writing.
HPS was pretty amazing. (I even tagged along to the Part II orientation event, even though I wouldn’t be eligible for it.) If you are a scientist who at all had an interest in philosophy or history (I first dipped my toe into philosophy taking “Logic and Reasoning” at CTY), I cannot recommend this program more highly. Readers of my blog may have noticed various attempts to discuss these issues on my blog—they are a bit harder to write about than some of the more technical things I discuss, but I think they are also very rewarding (and occasionally, some of my most popular writings—though in retrospect this one seems a bit Whiggish now.) It’s also fun: I managed to write an essay about Kuhnian scientific revolutions by framing it in the context of MIT Mystery Hunt. I won’t deny it: for a brief period of time, I felt like a liberal arts student. It has made me a better person, in many, many ways.
July 6, 2011I use Mendeley because it lets me easily search for papers I care about. Unfortunately, that seems to be all Mendeley has been doing for me… and that’s a damn shame. Maybe it’s because I’m an undergraduate, still dipping my toe into an ocean of academic research. Mendeley was aimed at practicing researchers, but not people like me, who are stilll aiming for breadth not depth. I can count on two hands the number of technical papers I’ve really dug into—I’m trying to figure out what it is exactly that I want to specialize in.
From this perspective, there are many, many things that Mendeley could be doing for me that it simply isn’t doing right now. And this is not purely a selfish perspective: the world of academics is a relatively small one, and I’d like to think a move towards my preferences is a move towards a larger body of potential users and customers. My request/plea/vision can be summed up as follows: Mendeley needs better metadata.
Metadata extraction has been a long standing complaint of many, many Mendeley users. One might at least partially chalk it up to the obsessive-compulsive nature of researchers: we want our metadata to be right, and having to manually fix essentially every paper we import our database creates a huge extra cost on using Mendeley.
How correct should this metadata be?
- If I search by author name, the list of papers I get should be equal in quality to that author’s listing of publications on his personal website. (Name de-duplication is pretty terrifying: Mendeley gives no indication of what permutation of a name is likely to be the right one and gives no advice about whether or not initials or full names should be preferred). As a prospective graduate student with some degree of specialization, knowing which authors I have the most papers of gives hints as to what departments I may be interested in.
- If I search by conference name, the list of papers I get should be equal in quality to the table of contents for that conference’s proceedings. Furthermore, there should be continuity over the years, so that I can search for all-time ICFP, not just 2011’s proceedings. Conferences have a notoriously large number of permutations of spellings (is it ICFP or International Conference for Functional Programming or…), and with no canonicalization keeping these straight is hopeless.
Here is something that would be really interesting to extract from papers, that is not readily available anywhere on the Internet: who funded the piece of work, and under what program! Researchers are obligated to acknowledge their funding source in a footnote or a section at the end of their paper, and access to this metadata would let me know “Who funds most of the research in this area” or “What grants should I be looking at for the next funding cycle.” This information is, of course, currently passed down as folklore from advisor to advisee. Extracting, polishing and publishing this data would be an interesting endeavor in its own right.
Papers don’t exist in a vacuum. On the trivial level, any paper is frequently accompanied by a slide deck or a recorded video of a talk—Mendely should track that. But on a deeper level, I’m sure many academics would die from loneliness if they were the only ones working in their field. There is a person behind the paper, and there is more information about them than just what papers they have published. What organization are they are part of? (A prospective grad student would love to know if they are faculty at a particular university.) What organization were they a part of when they published the paper? Who tends to collaborate with who—are there cliques of academics? Who was a member of the panel for what conferences?
One critical component of this is the citations of papers. Existing paper databases have concluded that this curation problem is just too hard: they simply publish plain text extracts of the reference sections. But here is a perfect opportunity to harness the power of the social Internet. Wikify the damn metadata, and let academics with an interest in this sector of academia create the mythical hyperlinked web of papers. When I can browse papers like I can browse Wikipedia will be a happy day. No paper exists in a vacuum.
Mendeley does a great job letting you attach metadata to a specific paper, which is searchable but not much else. But there are plenty of small chunks of information that could be profitably converted into generalized schemes. For example, Mendeley currently stores a bit indicating whether or not a paper is “read” or not. This is woefully insufficient: one does not simply read an academic paper (nor does one simply walk into Mordor). Maybe you’ve read the abstract, maybe you’ve skimmed the paper, maybe the paper is one that you are actively attempting to understand as part of some of your research. This workflow should be made first-class and the relevant user interface for it exposed. This information is also important if you’re trying to draw inferences about your paper reading habits from Mendeley’s database: papers you’ve imported but never looked at probably shouldn’t be counted.
Conclusion
In many ways, this is the call for the equivalent of the “Semantic Web” for academic papers. By in large, we are still attempting to realize this vision for the Internet at large. But there are good reasons to believe that the world of academic papers may be different. For one thing, the web of academic papers has not even reached the point of the traditional Internet: it doesn’t have hyperlinks! Additionally, academic papers are much slower moving than traditional web content and far more permanent. I want there to be no reason for my friends in industry to complain that they’d much rather a researcher publish a blog post rather than a paper. It’s ambitious, I know. But that’s my vision for Mendeley.
July 4, 2011
The first thing to convince yourself of is that there actually is a problem with the code I posted last week. Since this is a memory leak, we need to keep track of creations and accesses of IVars. An IVar allocation occurs in the following cases for our example:
- Invocation of
return, which returns a full IVar, - Invocation of
tick, which returns an empty IVar and schedules a thread to fill this IVar, and - Invocation of
>>=, which returns an empty IVar and a reference to this IVar in the callback attached to the left-hand IVar.
An IVar access occurs when we dereference an IORef, add a callback or fill the IVar. This occurs in these cases:
- Invocation of
>>=, which dereferences the left IVar and adds a callback, - Invocation of the callback on the left argument of
>>=, which adds a callback to the result of f x, - Invocation of the callback on the result of
f x (from the above callback), which fills the original IVar allocated in (3), and - Invocation of the scheduled thread by
tick, which fills the empty IVar it was scheduled with.
We can now trace the life-cycle of an IVar allocated by the >>= in the code loop = tick >>= loop.
IVar allocated by >>=. Two references are generated: one in the callback attached to tick and one returned.

Scheduler runs thread that fills in IVar from tick, its callback is run. IVar is reachable via the newly allocated callback attached to f x. Note that f in this case is \() -> loop, so at this point the recursive call occurs.

Scheduler runs thread that fills in IVar from f x, its callback is run. IVar is filled in, and the reference to it in the callback chain is now dead. Life of the IVar depends solely on the reference that we returned to the client.

Notice that across the first and second scheduler rounds, the bind-allocated IVar is kept alive by means other than the reference we returned to the client. In the first case it’s kept alive by the callback to tick (which is in turn kept alive by its place in the execution schedule); in the second case it’s kept alive by the callback to f x. If we can get to the third case, everything will manage to get GC’d, but that’s a big if: in our infinite loop, f x is never filled in.

Even if it does eventually get filled in, we build up an IVar of depth the length of recursion, whereas if we had some sort of “tail call optimization”, we could immediately throw these IVars away.
July 1, 2011One downside to the stupid scheduler I mentioned in the previous IVar monad post was that it would easily stack overflow, since it stored all pending operations on the stack. We can explicitly move all of these pending callbacks to the heap by reifying the execution schedule. This involves adding Schedule state to our monad (I’ve done so with IORef Schedule). Here is a only slightly more clever scheduler (I’ve also simplified some bits of code, and added a new addCallback function):
import Data.IORef
data IVarContents a =
Blocking [a -> IO ()]
| Full a
type Schedule = [IO ()]
type IVar a = IORef (IVarContents a)
newtype T a = T { runT :: IORef Schedule -> IO (IVar a) }
instance Monad T where
return x = T (\_ -> newIORef (Full x))
m >>= f = T $ \sched ->
do xref <- runT m sched
mx <- readIORef xref
case mx of
Full x -> runT (f x) sched
Blocking cs -> do
r <- newIORef (Blocking [])
let callback x = do
y <- runT (f x) sched
addCallback y (fillIVar sched r)
addCallback xref callback
return r
addCallback :: IVar a -> (a -> IO ()) -> IO ()
addCallback r c = do
rc <- readIORef r
case rc of
Full x -> c x
Blocking cs -> writeIORef r (Blocking (c:cs))
fillIVar :: IORef Schedule -> IVar a -> a -> IO ()
fillIVar sched ref x = do
r <- readIORef ref
writeIORef ref (Full x)
case r of
Blocking cs -> schedule sched (map ($x) cs)
Full _ -> error "fillIVar: Cannot write twice"
-- FIFO scheduler
schedule :: IORef Schedule -> [IO ()] -> IO ()
schedule sched to_sched = do
cur <- readIORef sched
writeIORef sched (cur ++ to_sched)
run :: T () -> IO ()
run initial_job = do
sched <- newIORef []
writeIORef sched [runT initial_job sched >> return ()]
let go = do
jobs <- readIORef sched
case jobs of
[] -> return ()
(job:rest) -> writeIORef sched rest >> job >> go
go
Here is some sample code that demonstrates the basic idea:
-- Does more work than return (), but semantically the same
tick :: T ()
tick = T $ \sched ->
do r <- newIORef (Blocking [])
schedule sched [fillIVar sched r ()]
return r
main = run loop
loop = tick >> loop
Actually, this simple infinite loop leaks space. (The reader is invited to try it out themselves.) This is precisely the problem the authors of LWT ran into. I hate chopping blog posts into little pieces, but getting this code right took a little longer than I expected and I ran out of time—so please wait till next time for more treatment!
June 29, 2011An IVar is an immutable variable; you write once, and read many times. In the Par monad framework, we use a prompt monad style construction in order to encode various operations on IVars, which deterministic parallel code in this framework might use. The question I’m interested in this post is an alternative encoding of this functionality, which supports nondeterministic concurrency and shows up in other contexts such as Python Twisted, node.js, any JavaScript UI library and LWT. Numerous bloggers have commented on this. But despite all of the monad mania surrounding what are essentially glorified callbacks, no one actually uses this monad when it comes to Haskell. Why not? For one reason, Haskell has cheap and cheerful preemptive green threads, so we can write our IO in synchronous style in lots of threads. But another reason, which I will be exploring in a later blog post, is that naively implementing bind in this model space leaks! (Most event libraries have worked around this bug in some way or another, which we will also be investigating.)
First things first, though. We start by implementing the IVar monad in Haskell. We build it incrementally, starting by demonstrating that IO (IORef a) is a monad. It’s not particularly interesting: we could get all of it’s features using IO. Our main interest in it is demonstrating the basic structure by which we will present a nondeterministic IVar monad. :
import Data.IORef
newtype R a = R { runR :: IO (IORef a) }
instance Functor R where
fmap f m = R $ do xref <- runR m
x <- readIORef xref
newIORef (f x)
instance Monad R where
return x = R (newIORef x)
m >>= f
= R $ do xref <- runR m
x <- readIORef xref
runR (f x)
We never ever pass around values: rather, we put them inside IORef boxes. The bind operation involves reading out the content of a box, and then getting a new box out of the function we’re binding to f. We always know what the contents of a box are: we never call writeIORef. Also notice that retrieving the reference is in IO, so arbitrary other side effects could occur while this is happening. When we have an actual IVar, those side effects could involve spinning off another thread of execution, which will eventually fill the IVar. Pay attention to these “boxes”: we’ll be interested in their usage properties for performance purposes.
In the case of an IVar, we now would like to have “empty” boxes, which may only get filled in at some later date. We might be tempted to implement this using IO (IORef (Maybe a)):
newtype S a = S { runS :: IO (IORef (Maybe a)) }
instance Monad S where
return x = S (newIORef (Just x))
m >>= f
= S $ do xref <- runS m
mx <- readIORef xref
case mx of
Just x -> runS (f x)
Nothing -> ???
But we’re in a bit of a bind (ahem): we don’t actually know what value we need to pass to f if the box is still empty. What do we do?
The traditional solution is save f away for another time when the value truly does become available, at which point we invoke all of the blocking callbacks with the new value. Since our monad admits arbitrary side effects, these callbacks can still do useful work. (By the way, IORef (IVarContents a) is essentially what the Par monad uses to encode IVars.) :
data IVarContents a =
Empty
| Blocking [a -> IO ()]
| Full a
newtype T a = T { runT :: IO (IORef (IVarContents a)) }
Now we can implement that last case:
instance Monad T where
return x = T (newIORef (Full x))
m >>= f
= T $ do xref <- runT m
mx <- readIORef xref
r <- newIORef Empty
let callback x = runT (f x >>= fillIVar r) >> return ()
case mx of
Full x -> callback x
Empty -> writeIORef xref (Blocking [callback])
Blocking cs -> writeIORef xref (Blocking (callback:cs))
return r
filIVar is some magical function which fills an empty IVar and reschedules anyone who was waiting for that value for execution. One possible (and a little silly) implementation, which assumes single-threading, could be:
fillIVar :: IORef (IVarContents a) -> a -> T ()
fillIVar ref x = T $ do
r <- readIORef ref
writeIORef ref (Full x)
case r of
Empty -> newIORef (Full ())
Blocking cs -> mapM_ ($x) cs >> newIORef (Full ())
Full _ -> error "fillIVar: Cannot write twice"
This is all fairly straightforward, and some variant of this has been reimplemented by basically any cooperative nonblocking async library. In my next post, I’d like to explicate some problems with this naive monadic encoding, as explained by the authors of LWT, and put a finger precisely what kind of variations of this pattern we actually see in the wild.
June 27, 2011Today I would like to describe how I pin down compiler bugs, specifically, bugs tickled by optimizations, using a neat feature that Hoopl has called optimization fuel. Unfortunately, this isn’t a particularly Googleable term, so hopefully this post will get some Google juice too. Optimization fuel was originally introduced by David Whalley in 1994 in a paper Automatic isolation of compiler errors. The basic idea is that all optimizations performed by the compiler can be limited (e.g. by limiting the fuel), so when we suspect the optimizer is misbehaving, we binary search to find the maximum amount of fuel we can give the compiler before it introduces the bug. We can then inspect the offending optimization and fix the bug. Optimization fuel is a feature of the new code generator, and is only available if you pass -fuse-new-codegen to GHC.
The bug
The bug shows up when I attempt to build GHC itself with the new code generator. Building GHC is a great way to ferret out bugs, since it has so much code in it, it manages to cover a lot of cases:
"inplace/bin/ghc-stage1" (...) -o compiler/stage2/build/FastString.o
ghc-stage1: panic! (the 'impossible' happened)
(GHC version 7.1 for i386-unknown-linux):
RegAllocLinear.makeRegMovementGraph
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
We quickly grep the codebase to find the relevant error, which is in compiler/nativeGen/RegAlloc/Linear/JoinToTargets.hs:
-- | Construct a graph of register\/spill movements.
--
-- Cyclic components seem to occur only very rarely.
--
-- We cut some corners by not handling memory-to-memory moves.
-- This shouldn't happen because every temporary gets its own stack slot.
--
makeRegMovementGraph :: RegMap Loc -> RegMap Loc -> [(Unique, Loc, [Loc])]
makeRegMovementGraph adjusted_assig dest_assig
= let
mkNodes src vreg
= expandNode vreg src
$ lookupWithDefaultUFM_Directly
dest_assig
(panic "RegAllocLinear.makeRegMovementGraph")
vreg
in [ node | (vreg, src) <- ufmToList adjusted_assig
, node <- mkNodes src vreg ]
But the source code doesn’t particularly suggest what the problem might be. Time to start using optimization fuel!
Binary search
We can modify the amount of optimization fuel GHC has, for running optimizations, by changing the value of -dopt-fuel. The first thing we do if see the bug is present with zero optimization fuel:
$ "inplace/bin/ghc-stage1" (...) -o compiler/stage2/build/FastString.o -dopt-fuel=0
Great, it worked! We pick some large number to start our binary search at (and pass -fforce-recomp, so GHC actually compiles the program.)
$ "inplace/bin/ghc-stage1" (...) -o compiler/stage2/build/FastString.o -dopt-fuel=1000 -fforce-recomp
ghc-stage1: panic! (the 'impossible' happened)
(GHC version 7.1 for i386-unknown-linux):
RegAllocLinear.makeRegMovementGraph
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
I then binary search (test 500, if that fails test 750, etc), until I find the point at which adding one to the fuel causes the failure.
$ "inplace/bin/ghc-stage1" (...) -o compiler/stage2/build/FastString.o -dopt-fuel=709 -fforce-recomp
$ "inplace/bin/ghc-stage1" (...) -o compiler/stage2/build/FastString.o -dopt-fuel=710 -fforce-recomp
ghc-stage1: panic! (the 'impossible' happened)
(GHC version 7.1 for i386-unknown-linux):
RegAllocLinear.makeRegMovementGraph
Viewing the culprit
How do we convince GHC to tell us what optimization it did with the 710st bit of fuel? My favorite method is to dump out the optimized C– from both runs, and then do a diff. We can dump the C– to a file using -ddump-opt-cmm -ddump-to-file, and then diff reveals:
@@ -10059,7 +10059,6 @@
}
c45T:
_s3es::I32 = I32[Sp + 4];
- _s3eu::I32 = I32[Sp + 0];
// deleted: if (0) goto c460;
// outOfLine should follow:
_s3er::I32 = 0;
@@ -10093,1354 +10092,3 @@
jump (I32[Sp + 0]) ();
}
The optimization is deleting an assignment. Is this valid? Here is the full code, with some annotations:
FastString.$whashStr_entry()
{ [const 131081;, const 0;, const 15;]
}
c45T:
_s3es::I32 = I32[Sp + 4];
_s3eu::I32 = I32[Sp + 0]; // deleted assignment
_s3er::I32 = 0;
_s3ex::I32 = 0;
goto c463;
c460:
R1 = FastString.$whashStr_closure;
jump (I32[BaseReg - 4]) ();
c463:
if (I32[GHC.Types.Bool_closure_tbl + ((_s3er::I32 == _s3es::I32) << 2)] & 3 >= 2) goto c46d;
// uh oh, assignment used here
_s3IC::I32 = %MO_S_Rem_W32(%MO_UU_Conv_W8_W32(I8[_s3eu::I32 + (_s3er::I32 << 0)]) + _s3ex::I32 * 128,
4091);
_s3er::I32 = _s3er::I32 + 1;
_s3ex::I32 = _s3IC::I32;
goto c463;
c46d:
R1 = _s3ex::I32;
Sp = Sp + 8;
jump (I32[Sp + 0]) ();
}
Seems not: the variable is used in MO_S_Rem_W32: that’s no good. We conclude that the bug is in an optimization pass, and it is not the case that the register allocator failed to handle a case that our optimization is now tickling.
Fixing the bug
With this information, we can also extract the program fragment that caused this bug:
hashStr :: Ptr Word8 -> Int -> Int
hashStr (Ptr a#) (I# len#) = loop 0# 0#
where
loop h n | n GHC.Exts.==# len# = I# h
| otherwise = loop h2 (n GHC.Exts.+# 1#)
where !c = ord# (indexCharOffAddr# a# n)
!h2 = (c GHC.Exts.+# (h GHC.Exts.*# 128#)) `remInt#` 4091#
We can also see how our pipeline is processing the program, and observe precisely where in the process the bad optimization was made:
==================== Post Proc Points Added ====================
{offset
c43r:
_s3es::I32 = I32[(old + 8)];
_s3eu::I32 = I32[(old + 12)];
if (Sp - <highSp> < SpLim) goto c43y; else goto c43u;
==================== Post spills and reloads ====================
{offset
c43r:
_s3es::I32 = I32[(old + 8)];
_s3eu::I32 = I32[(old + 12)];
if (Sp - <highSp> < SpLim) goto c43y; else goto c43u;
==================== Post rewrite assignments ====================
{offset
c43r:
_s3es::I32 = I32[(old + 8)];
if (Sp - <highSp> < SpLim) goto c43y; else goto c43u;
Since this is a spurious instance of code removal, we look for all mentions of emptyGraph in the rewrite assignments optimization pass:
usageRewrite :: BwdRewrite FuelUniqSM (WithRegUsage CmmNode) UsageMap
usageRewrite = mkBRewrite3 first middle last
where first _ _ = return Nothing
middle :: Monad m => WithRegUsage CmmNode O O -> UsageMap -> m (Maybe (Graph (WithRegUsage CmmNode) O O))
middle (Plain (CmmAssign (CmmLocal l) e)) f
= return . Just
$ case lookupUFM f l of
Nothing -> emptyGraph
Just usage -> mkMiddle (AssignLocal l e usage)
middle _ _ = return Nothing
last _ _ = return Nothing
This looks like it should be an unobjectionable case of dead assignment elimination coupled with liveness analysis, but for some reason, the backwards facts are not being propagated properly. In fact, the problem is that I attempted to optimize the Hoopl dataflow function, and got it wrong. (Fixpoint analysis is tricky!) After reverting my changes, the unsound optimization goes away. Phew.
June 24, 2011This post describes how to change which monitor xmobar shows up on in a multi-monitor setup. This had always been an annoyance for me, since on an initial switch to multi-monitor, xmobar would be living on the correct monitor, but if I ever restarted XMonad thereafter, it would migrate to my other monitor, much to my annoyance. Note that a monitor is different from an X screen, which can be directly configured from xmobar using the -x command line.
How does xmobar pick what screen to use? It selects the “primary” monitor, which by default is the first entry in your xrandr list:
Screen 0: minimum 320 x 200, current 2464 x 900, maximum 8192 x 8192
VGA1 connected 1440x900+1024+0 (normal left inverted right x axis y axis) 408mm x 255mm
1440x900 59.9*+ 75.0
1280x1024 75.0 60.0
1280x960 60.0
1152x864 75.0
1024x768 75.1 70.1 66.0 60.0
832x624 74.6
800x600 72.2 75.0 60.3 56.2
640x480 72.8 75.0 66.7 66.0 60.0
720x400 70.1
LVDS1 connected 1024x768+0+0 (normal left inverted right x axis y axis) 245mm x 184mm
1024x768 60.0*+
800x600 60.3 56.2
640x480 59.9
We can switch the primary monitor using the xrandr --output $MONITOR --primary command. However, this change is not persistent; you’d have to run this command every time you add a new monitor.
Fortunately, it turns out gnome-settings-daemon records information about monitors it has seen in order to configure them properly. This information is in .config/monitors.xml:
<monitors version="1">
<configuration>
<clone>no</clone>
<output name="VGA1">
<vendor>HSD</vendor>
<product>0x8991</product>
<serial>0x01010101</serial>
<width>1440</width>
<height>900</height>
<rate>60</rate>
<x>0</x>
<y>0</y>
<rotation>normal</rotation>
<reflect_x>no</reflect_x>
<reflect_y>no</reflect_y>
<primary>no</primary>
</output>
<output name="LVDS1">
<vendor>LEN</vendor>
<product>0x4002</product>
<serial>0x00000000</serial>
<width>1024</width>
<height>768</height>
<rate>60</rate>
<x>1440</x>
<y>0</y>
<rotation>normal</rotation>
<reflect_x>no</reflect_x>
<reflect_y>no</reflect_y>
<primary>no</primary>
</output>
</configuration>
</monitors>
So all we need to do is tweak primary to be yes on the appropriate monitor.
Hat tip to David Benjamin and Evan Broder for letting me know how to do this.
June 22, 2011This is notebook two.
Max Schäfer: Refactoring Java
Most Java refactoring tools built into IDEs like Eclipse are little more than glorified text manipulation macros. There are no guarantees that the result of your refactoring will have the same behavior as the original: you can even refactor code that doesn’t even compile! To prevent this, most refactorings require heavy and hard-to-understand preconditions for refactoring. Max brings two ideas to the table:
- Rather than attempt to write a complex pre-condition that may or may not accurately reflect safety concerns, we instead do the transformation, and then verify that the refactoring did not break anything. We can do this with a dependency-based description of the behavior program, which overspecifies the original semantics (otherwise, such analysis wouldn’t be tractable, as it’s undecidable.)
- Rather than attempt to write a monster refactoring that attempts to handle all possible cases, we decompose refactorings into microrefactorings on simpler versions of the source language. For example, moving a chunk of code into a method would involve closure conversion (control), lambda lifting (data), and then the actual outwards motion, which at this point is trivial. We can then resugar into the original source language. This allows us to abstract over corner cases.
Modularity is a very powerful philosophy, and one that Hoopl takes too. (One might wonder if Hoopl might be useful for refactoring, one big problem I observe is that Hoopl’s representation is too low level, and really the point of a high-level language is that you don’t need complex dataflow analysis.)
There are some bad assumptions about this, however. It assumes we know the whole program, that it’s only written in one language (no XML manifest files), and it’s statically typed and class based. Of course, one fails these assumptions for all real programs, so if we actually want people to adopt this workflow, we need a story for them. Refactoring is a transition from sloppy to structured code. (I have scribbled at the bottom of the page responsibility calculus, but I have no idea what that means now.)
Semantic overspecification reminds me of SLAM’s approach to iterative approximations of program behavior.
Mike Dodds: Modular Reasoning for Deterministic Parallelism
A frequent problem with separation logics is that you need to introduce a new primitive for every sort of concurrency construct you may want to model, and so you end up with a bazillion different logics, each for your own model of concurrency. Mike presents a way of incrementally building up any sort of semantics you could possibly want, using “concurrent abstract predicates.” You use predicates to generate variables that satisfy various predicates, and then require these conditions for various other specifications of functions.
The majority of this particular talk was working through an odd concurrency construct called wait/grant, described in “Quasi-Static Scheduling for Safe Futures.” It’s a barrier that preserves “necessary” sequential dependencies. A bit of time was spent actually clarifying what this construct did, and why it was different from a buffered channel. Tony Hoare remarked that it resembled a “production line” in real-life, which was not very common in programming, though it happened to be quite natural for the problem the original paper authors were tackling (parallelizing sequential programs).
My notes invite me to “implement this in Haskell”, and also have a remark about “tree shaped bit vector propagation”, which is apparently the optimized version. There are also a bunch of code snippets but I’m sure I can find those in the slides.
Tomas Petricek: Joinads
Joinads for F# are a system for pattern matching on computations (as opposed to just values). How is this any different from monads? Joinads support extra operations: merge :: m a -> m b -> m (a, b) and choose :: [m (Maybe (m a))] -> m a, which can implement special scheduling properties. This can be useful for futures (Manticore), events (FRP) or the join calculus (joins execute when a channel contains value: it’s cheap and cheerful multiplexing.) It turns out you get a Joinad for free from commutative monads, which might say something about what kind of syntax extensions would be useful for those monads.
I wasn’t particularly impressed by this talk, for whatever reason. I guess my reason was I didn’t feel any deep theoretical reason why Joinads might be a particularly interesting theoretical construct to study, nor did it seem that that they were in any sense minimal. Also, in Haskell, we implement multiplexing by just forking off extra green threads, which is a much nicer model, in my opinion.
Computing with Real Numbers
Dan Piponi has written on this topic before, but this talk really helped put things in perspective for me.
There are many types of numbers for which we can easily compute with: the field of two elements (booleans), integers (well, discounting overflow), and rationals (pairs of integers). But real numbers pose some difficulties. They are the metric closure of rational numbers Q, i.e. everything that can be a limit of a Cauchy sequence of Q. This is reminiscent of a decimal expansion.
We now want to consider functions over reals. How do we know if something is computable? We might try saying it is a relation from strings of digits to strings of digits, where any finite prefix of F(p) can be uniformly computed from some finite prefix of sufficient length of p. But this is clearly insufficient for something like 0.3333, where we need to read infinitely many digits to see whether or not this is truly one third. (I remark that infinity is a bit like oracles in Turing Machine land.)
Instead, we say that a sequence of rationals q_i (with i ranging over naturals) represents a real number x if |x - q_i| < 2^-i for all i. (the base doesn’t really matter). Let p(q) = x, then f is computable if there exists an F such that p(F(prefix)) = f(p(prefix). With this definition, it turns out that addition, multiplication, division, trigonometric functions, exponentiation and logarithms are computable.
An interesting restriction of function is that of continuity, familiar from any treatment in a Calculus textbook: a function f : R -> R is continuous if for all x in the domain of f, and for all ε > 0, there exists a δ > 0 that for all y in the domain of f, |x - y| < δ implies |f(x) - f(y)| < ε. It has been proposed that every computable function is continuous: thus, computability stems from our ability to approximate things, which we can’t do for discontinuous functions (which side of the discontinuity are we on?) We can further restrict functions to C(R,R), which are the set of continuous functions which can also be approximated by infinite sequences (e.g. polynomials.)
Consider the monotone intermediate value thoerem, which states that if f is monotone, f(0) < 0and f(1) > 0, there exists some x such that f(x) = 0. Can we compute this number? Bisection doesn’t work, since determining if a number is less than or greater than another is in general incomputable. (The general intermediate value theorem is not computable, since we could get infinitely close to the origin line.) We can use trisection instead. Compute f(0.3) and f(0.7), and concurrently perform the following comparisons:
- If
f(0.3) < 0, the new range is [0.3, 1] - If
f(0.3) > 0, the new range is [0, 0.3] - If
f(0.7) < 0, the new range is [0.7, 1] - If
f(0.7) > 0, the new range is [0, 0.7]
You’ll have to do a little work to get a formal proof of correctness, but it’s easy to see why this intuitively might work: comparisons take infinite time only if the number is close to the point of comparison, but then our other comparison is guaranteed to succeed, since it is some non-infinitesimal distance away. (I have a little margin note: can we use this to do cryptography?)
It turns out there are various schools on the computability of real numbers. The perspective I have just described is the Polish School. However, the Russian School says that uncomputable points don’t exist: the code fragment is the real number. Thus, effective analysis is not classical analysis. There is a correspondence between constructive and classical mathematics. Something about Banach spaces (linear operator has to be bounded), so differentiation is not computable in general, though integration is! (This is quite different from the symbolic evaluation world.) For further reading, see Computable Analysis, Springer, 2000.
Martin Escardó: Selection Functions Everywhere
I didn’t understand this talk. I don’t understand most of Martin’s work, really. But I have a bunch of quotes:
- “I don’t know why it’s called the continuation monad; there is no control, this talk is completely out of control.”
- “I’m not going to explain what this means, but I’m going to explain how this works.”
- “[Incomprehensible statement.] Which is not too deep.”
- “And the reason is, uh, probably in the next slide.”
- “Every game is of length 2: you have the first move, and then all the others.”
- “You’ll get to use the monoidal monad structure to calculate these two people.”
Selection functions select individuals with the “highest” truth value. Max-value theorem, min-value theorem and universal-value theorem (Drinker’s paradox) are all uncomputable. Selection functions are a two stage process. K A -> A is double negation elimination, J A -> A is Peirce’s law. Bekic’s lemma gives a fixpoint. Bar recursion and do induction on p (continuous function) with p as a tree. We can calculate the optimal strategy. System T plus an equation for J-shift is strongly normalizing. We can encode the axiom of countable choice (but it’s not choice anymore!). Dependent choice: (AC classical choice, Tychonoff theory, etc. See that paper.) Why is J a monad?
Sir Roger Penrose: Twistor Theory
Penrose… doesn’t give very comprehensible talks. Well, he explained the Riemann sphere (a stereographic projection) quite well, but then things got very smooshy. Some quotes: “And that’s all you have to do in mathematics: imagine it.” “Small does not apply to distance.” “So my best way to do that is draw it like a sausage.” “I don’t think I should explain this picture.” “First I’ll confuse you more.” “It can be done in cricket, and we hope it can be done in Twistor theory too.” There was an interesting aside about Cohomology: a precise nonlocal measure about the degree of impossibility; I think I perked up when I saw the Escher picture. Not even the physicists think Twistor Theory reflects reality. It’s a mathematical plaything.
On a completely unrelated note, that night RAG blind date was going on.
Conor McBride: The Kleisli Arrows of Outrageous Fortune
“Consider the following program, which is, alas, Poor!” With a slide of code containing variables b and c: “b or not b, that is the question! … or to take arms against a c of troubles.” “The world’s most successful dependently typed language: Haskell.” Wave hello to Simon Peyton-Jones in the room.
I’ve been meaning to write a blog post about this talk, and maybe still will, although the talk is no longer as fresh in my mind. Conor describes the technical machinery necessary to simulate dependent types even when you can’t obviously push values into the types. Programs should instead be strategy trees, which cover all possible responses (even though reality will select one.) We don’t need dependent types: we can use parametrized monads to encode the pre and post conditions of Hoare logic. (Resulting in a predictable world.) These are the so called “braces of upwards mobility” (upwards mobility referring to values becoming types.) This talk made me wonder if the Strathclyde Haskell Extension would be a good platform for session types, which suffer tremendously from the lack of true dependent types. (It also made me wonder about efficient type-level computation.)
The devil is ∀ (see the horns), e.g. a universal quantifier in the wrong place (I thought maybe you could view them as skolem variables). Terms hold evidence on types. (Typical Conor to make a joke about that.) Kleisli Arrow is the Hoare Triple (think about the programmable semicolon), and with bind you don’t get to choose what state you end up in (there’s an existential.) However, we can use an alternative bind which forces the demon to give us the value: Atkey parametrized monads. We also need to make sure that our naive free monad doesn’t blow up our runtime. This is not very coherent, and cleaning up this story was what I needed to do before making a real post.
Conor’s main message was that data are witness to the proofs! “I’m in the witness protection program.” If we have some piece of data, we have discharged some proof obligations. (But I wonder, what about polymorphism?)
Have you ever wondered why the matrix multiplication trick works for Floyd-Warshall all pairs shortest paths? It’s because the relevant operations form a semiring. The traditional semiring is (N, +, *, 1, 0). For classical routing algorithms like Dijkstra and Bellman-Ford it’s (N_inf, min, +, 0, inf). All the same semiring properties hold, so we can use them interchangeably.
However, real protocols do not obey the semiring axioms (distributivity versus economics.) So we‘re not actually solving the shortest path problem (stable path problems), instead, we’re finding “best path according to neighbor choice.” This talk looks a methods for converting properties of these protocols into decision procedures for constructors (a bottom-up inference method.) If we have a particular set of axioms, we want to see if we can get a counter-example when things go a bit wonky. An example given was lexicographic pair of length and then bandwidth, which is distributive. If we modify these to be add1 minplus * add0 maxMin, the one/zero elements are in the wrong place and we lose distributivity.
It turns out that when axioms are failed, what goes wrong depends on the algorithm is run. We want protocols that work in all possible configurations.
Petri nets for concurrency
We want performance nondeterminism, so we use petrinets to encode the nondeterminism contracts, and then we push a flow through the graph to figure out what kind of performance we need. Scheduling and placement is easy to handle, because they are just modifications on the petri net. But I wasn’t so sure if this approach would work, because petri nets can get pretty bit and you may not get the performance you need to do interesting analysis.
Grant Passmore: Strategy in Computer Algebra
Automated reasoning problems are undecidable, and user-guided. Computer algebra problems are unfeasible, but the algorithms tend to be black box, secret sauce solutions (e.g. Mathematica) with all sorts of tirkcs like Groebner Bases, Quantifier Elimination, SAT for nonlinear complex arithmetic (Hilbert’s Weak Nullstellensatz) and reduction to rewriting.
Grant wants to put the choice back in computer algebra. There are many choice points: quadratic preprocessing, choice of pairs, growth of basis, forward simplification. He gives a comparison of ATP and GB. With LCF, functional parameters are key. Bag of tricks is OK, but the proof procedure is a strategy.
- What about an algorithm can be changed while preserving correctness?
- What arbitrary choices are my algorithm making?
(Margin note: can we understand fusion by understanding the mutable state it creates?)
(Untitled Software Engineering talk)
A persistent problem with formal methods is they operate in a domain of assumptions and requirements, and there is no way to tell if these assumptions or requirements are right! It’s a rather philosophical question: can we have “recognized assurance deficits”: a known unknown? These are potential sources of counter-evidence. What are the worst implications of not knowing? What claims have you made? (It would be interesting to see what elements of philosophy are relevant here.) Sometimes safety arguments are just assertions “because I say so”: there is epistemic uncertainty. The talk giver argues we should swap probabilistic integrity arguments with qualitative confidence arguments: it’s a question of safety versus confidence. We should explicitly acknowledge assurance deficits, and stop taking too much on trust.
I ask the question: is there incentive for this debate to happen? (Engineers want to look good.) Giving it to evaluators is certainly worse, since they are not in a proper position to assess the system. He didn’t have an answer, but said, “Not doing so is unacceptable.”
I somewhat amusedly note that near the end he pushed some technical software for handling this, ostensibly a product of his own research program. I remain unsure about a technical solution to this particular problem.
Approximating Markov Chains
Labeled Markov processes encode continuous state spaces, and for the longest time, we’ve noted that bisimulation works for discrete state spaces, but not continuous ones. We view the final state as a probability distribution, and we press buttons on our machine (labels) to change the distribution: each label is a stochastic kernel (a generalized binary relation.) Of course, reasoning about continuous state spaces is important: they cover things like Brownian motion, performance, probabilistic process algebras with recursion, hybrid control systems (flight management systems), as well complex discrete phenomena such as population growth and stock prices.
I didn’t understand most of the technical details, but the main point was that co-bisimulation was the correct way to do this: it is only a coincidence that bisimulation works for discrete systems. The projective limit is exactly the smallest bisimilar process. Also some material on metric spaces. This certainly seems like a field where we’re just importing well-known results from esoteric fields of mathematics.
Financial Cryptography
Unbeknown to Americans, whose financial system is still stuck on magnetic stripe readers, bank cards in Europe have moved on to EMV, in which a mini-processor is embedded in your chip, which can do true challenge-response authentication. This talk looks at how we might bootstrap a P2P transaction system that bypasses banks, using the existing EMV hardware.
How might we do this? We can use a device called the CAP, which has a small display that when given a pin gives an authentication code (two factor.) We associate transactions with CAP code, so the merchant has agreed to receive money. But you still need collaboration from the banks to do this. (This is the SDA method.) So we get rid of the bank all-together and use Diffie-Hellman (DDA). The cards are simply convenient existing infrastructure to get name authentication. There are some technical details, since we can only sign 32-bits at a time, and we usually need a bit more than that. (Margin note: “the threat model was privacy advocates.”)
The talk was short, so afterwards we had a discussion why this scheme wouldn’t actually work. My objection was that it banks would simply make it illegal to use their bank cards in this way. There was a discussion whether or not they could technically enforce this: Marcus Kuhn used passports as an example, where you can’t read passports if you don’t have Internet access, since the passport itself has an embedded monotonic clock that refuses to give information to the scanner if the scanner software is not up to date. How does the passport know what the latest version is? Its clock gets updated when it sees a new scanner.) Passport security technology is pretty interesting! They invented a block cipher over the alphabet for this purpose.
Verifying the validity of QBF
SAT is exponential, and when you add quantifiers to the mix, you get “another” exponential, but this time in the certificate. How do you verify that a universally quantified formula is in fact true? The certificate in this case is extension variables and witnesses: we provide concrete implementations for all existentially quantified variables, which can then be substituted in to give a traditional SAT problem. So once we have a certificate, what was a PSPACE problem is now “just” a NP problem.
(Technical note: We want to eliminate hypotheses in topological order (using Refl, Gen and Exists). Get the quantifiers in the right order, witnesses depend on existential variables, extension variables depend on it.)
The talk described how he hooked into the internals of Squolem to get actually get these certificates. It turned out that de Bruijn was faster than name carrying (which is different from typical QBF invalidity checking.) He even found a bug in the non-LCF style validator (due to a lack of a cyclic check.)
Applications: model checking (bounded and unbounded), PSPACE problems. (Margin note: “comparison of BDD and QBF?”)
This concludes notebook two.
June 20, 2011Over the year, I’ve accumulated three notebooks worth of miscellaneous notes and musings. Since these notebooks are falling apart, I’ve decided to transfer their contents here. Warning: they might be slightly incoherent! This is the first of three notebooks. I recommend skimming the section headers and seeing if any of them pop out.
Tony Hoare: Abstract Separation Algebra
Tony Hoare wants to leverage “the hard work [that] is already solved” by placing the formalism of separation logic (e.g. Hoare triples) into an abstract algebra. The idea is that by encoding things in pairs, not triples, we can take advantage of the numerous results in algebra. The basic idea is we take a traditional triple {p} q {r} and convert it into a ordered semigroup relation p; q <= r, where ; is a monoidal operation. In the end we end up with a separation algebra, which is a monoidal lattice with an extra star operator. The choice of axioms is all: “This is abstract algebra, so you should be willing to take these axioms without having any model in mind.” (Scribbled here: “Inception as a metaphor for mathematical multi-level thinking.”) We have a homomorphism (not isomorphism) between implementations and specifications (the right direction is simplification, the left direction is a Galois connection.) In fact, as a commenter in the audience points out, this is known as the Stone Dualities—something like how two points determine a line—with contravariant points and properties. I believe Tony’s been thinking about this topic a bit since I went to this talk at the very beginning of this year, so its likely some or all of this has been superseded by later discoveries. C’est la vie!
Satnam Singh: Multi-target parallel processing
Can we write parallel code that can execute on multiple types of hardware: e.g. vectorized operations on a traditional CPU, a GPU or an FPGA? He presents an EDSL that can be embedded in any language (well, for this particular representation, C#), with constructs like newFloatParallelArray, dx9Target.toArray1D(z) and overloaded operators. In my notes, I remark: can this representation be implemented taglessly, or do we always pay the cost of building a description of the system before we can execute it? Pushing software to hardware is especially important in the face of heterogenous processors (e.g. Metropolis). Satnam was a very engaging speaker, and many of the quotes here are attributed to him—though one quote I do have is “I hope this is not going to be quoted” (don’t worry, I haven’t quoted that sentence). Dancing is a metaphor for parallel processing (though I don’t remember what the metaphor was.) What about self-modifying hardware: we mmap the circuit description and let the hardware reprogram the FPGA!
Higher level information is crucial to optimization: thus we may want a symbolic evaluator with just in time compilation (except we can’t do that for FPGAs.) Memory access fusion is important: we want to get rid of accidental semicolons. Array -> Stream / Shift = Delay. Research idea: geometric encoding of common concurrency problems. Matrix inversion is a problem (so don’t invert the matrix, silly), local memory bounds GPU versus FPGA, and scheduling problem of energy.
Building “edit automata” with one very easy, simple idea: use strace to generate trees, and then perform regular expression on the system calls to look for viral behavior. It’s an approach similar in spirit to seLinux, which runs the program in order to determine an appropriate policy for it (and then notifies you when it does something outside of normal.) But I remark, the usefulness of an strace might not be so good in the face of poorly written Gnome applications (which touch every file and their mom); nor can you strace the kernel. What about a virus that notices when its being traced? It seems this strategy would need a lot of concrete pouring to become solid. Would an strace puppeteer be useful: that is, we get to fake responses to all syscalls a program makes? Might be kind of interesting, if hard to instrument.
Streambase
Streambase is a company that implements a visual, event stream processing language. I interviewed with them to possibly work on their compiler; while I ended up turning them down, it was a very interesting interview and I think I would have had a lot of fun working for them (though working in Java, not so much!) The interview was very fun: one question was, “Explain monads to me.” Man, I still don’t know how to do that properly. (Side note: people really, really like side effects. The programming folklore around writing performant programs that work on persistent data is very new, perhaps I’d say even more so than the folklore around lazy evaluation).
Skepticism
One of the things Alexander Bird’s book Philosophy of Science taught me was how to identify unproductive skepticism, even when it is not obvious, in things such as Hume’s problem of induction. My essay on Reliabilism was quite good; it was the only essay I managed to get a first on in my philosophy supervisions during the year. Like type theory, justification is stratified, with layers upon layers.
Simon Peyton Jones: Let Should Not Be Generalized
Programmers in Hindley-Milner type systems have long enjoyed the benefits of practical type inference: we can generally expect the most general type to be inferred, and syntactic substitution of expressions in their use-sites will always typecheck. Of course, type inference algorithms are in general EXPTIME-complete, but type theorists can’t complain too much about that, since for more powerful logics inference is usually undecidable. (BTW: dictionaries constitute runtime evidence. Good way of thinking about it.) Curiously enough, in the presence of more advanced type features, writing type signatures can actually make the type checker’s job harder, but they add local equality assumptions that need to be handled by the constraint solver. Generalized let means that all of these constraints cannot be solved until we reach the call site. Can we work around this problem by doing on the fly solving of equality constraints? The French have a paper about this, but Peyton Jones recommends carrying along a jar of aspirins if you decide to read the paper. After his talk, one of the grad students remarked that Hindley-Milner is, in many ways, an anomaly: users of Agda have an expectation of needing to specify all type signatures, except in some special cases where they can eliminate them, whereas users of Haskell have an expectation of needing to specify no type signatures, except in special cases.
Stream processing of trees
A long-standing problem with the Document Object Model is that it requires the entire document to be loaded up in memory. In cases like PHP’s documentation manual, the memory usage can be over a gigabyte large. Unfortunately, the mental model for manipulating a DOM is much more natural than that for manipulating a stream of XML tag events. Is there a way to automatically project changes to the DOM into changes on a stream? We’d like to construct an isomorphism between the two. I’m seeking a functional representation of the DOM, for manipulation (you’re still going to need mutability for a DOM-style event programming model.) “Clowns to the left of me, jokers to the right” emphasizes the difference between local and global analysis. One way you might look at traversal of a token stream is simply traversal, with a zipper to keep track of where you are. Of course, the zipper takes up memory (in effect, it forms something like a stack, which is exactly how you would convert a token stream into a tree.) So we can efficiently build the tree representation without mutation, but we still end up with a tree representation. At this point, I have written down, “Stop hitting yourself.” Indeed. Can we take advantage of domain specific knowledge, a claim that I promise not to go beyond this point? The idea of projecting DOM operations into XML stream operations, and using this as a sort of measurement for how costly something is may be profitable. Of course, now I should do a literature search.
Regular expression edit distance
Given a regular expression and a non-matching string, what is the minimum number of edits necessary to make the string match? There may be multiple answers, and the algorithm should allow you to weight different changes differently.
Frank Tip: Test Generation and Fault Localization for Web Applications
Apollo takes a hybrid approach to testing web applications, combining concrete and symbolic execution. The idea is that most web applications have diffuse, early conditionalization, with no complex state transformations or loops. So, we generate path constraints on the controller and solve them, and then generate inputs which let us exercise all of the control paths. Data is code: we want to describe the data. I must not have been paying very close attention to the presentation, because I have all sorts of other things scribbled in: “Stacks are the wrong debugging mechanism for STGs” (well, yes, because we want to know where we come from. Unfortunately, knowing where we are going isn’t very useful either) and “Can we automatically generate QuickCheck shrink implementations using execution traces?” (a sort of automated test-case minimization) and a final musing, “Haskell is not a good langauge for runtime inspection or fault localization.”
Benjamin Pierce: Types and Programming Languages
It would be very cool if someone made an interactive visualization of how type systems grow and are extended as you add new features to them, a sort of visual diff for typing rules and operational semantics.
Smoothed trending
As a blogger, my page view counts tend to be very spiky, corresponding to when my post hits a popular news site and gets picked up (to date, my Bitcoin post has had 22k views. Not bad!) But this doesn’t help me figure out long term trends for my website. Is there a way to smooth the trends so that spikes become simply “popular points” on a longer term trend?
User interfaces
I want a bible of minimal technical effort best practice user interfaces, patterns that are easy to implement and won’t confuse users too much. UI design is a bit too tweaky for my tastes. In the case of intelligent interfaces, how do we not piss off the user (e.g. Google Instant.) We have a user expectation where the computer will not guess what I want. That’s just strange.
Page 32
In big letters I have: “Prove locality theorems. No action at a distance.” Almost precisely these same words Norman Ramsey would tell me while I was working with Hoopl. I think this is a pretty powerful idea.
Separation Logic and Graphical Models
I have some mathematical definitions written down, but they’re incomplete. I don’t think I wrote anything particularly insightful. This says something about the note-taking enterprise: you should record things that you would not be able to get later, but you should also make sure you follow up with all the complete information you said you’d look up later.
Jane Street
I have two pages of scribblings from solving problems over a telephone interview. I quite enjoyed them. One was a dynamic programming question (I moffed the recurrence at first but eventually got it), the second was implementing a functional programming feature in OCaml. Actually, I wanted to write a blog post about the latter, but it’s so far been consigned to my drafts bin, awaiting a day of resurrection. Later in my notes (page 74) I have recorded the on-site interview questions, unfortunately, I can’t share them with you.
Quote
“It’s like hiring an attorney to drive you across town.” I don’t remember what the context was, though.
Mohan Ganesalingam: Language of Mathematics
I really enjoyed this talk. Mohan looks at applying NLP at a domain which is likely to be more tractable than the unrestricted human corpus: the domain of mathematical language. Why is it tractable? Math defines its lexicon in text (mathematical words must be explicitly defined), we mix symbols and natural language, and the grammar is restricted. Montague Grammars are in correspondence with Denotational Semantics. Of course, like normal language, mathematical language is heavily ambiguous. We have lexical ambiguity (“prime” can describe numbers, ideals, etc.), structural ambiguity (p is normal if p generates the splitting field of some polynomial over F_0—is F_0 referring to the generating or the polynomial?), symbolic ambiguity (d(x + y), and this is not just operator overloading because parse trees can change: take for example (A+B)=C versus λ+(M=N)), and combined symbolic and textual ambiguity. It turns out the linguistic type system of maths, which is necessary to get the correct parse trees, is not mathematical at all: integers, reals and friends are all lumped into one big category of numbers and types are not extensional (objects have different types depending on contents.) We need a dynamic type system, not a structural or nominal one, and we need to infer types while parsing.
Writing notes
From December 1st. I seem to need to write concluding paragraphs that are more concluding, and use shorter sentences. Summarize parts of my arguments, give more detail about experiments, and not to forget that a large part of historic mathematics was geometry. Aim for more in less sentences. Amen!
Another set of notes: all questions are traps: the examiner wants you to think about what is asked. Think about the broader context around events. You may not have enough time to compare with contemporary perspectives. Put guideposts in your essay. Be careful about non sequiturs. Colons are good: they add emphasis (but use them carefully.) Short sentences!
Principia Mathematica
What a wonderful conference! There were a lot of talks and I should have taken more notes, but this is what I have, some quotes and sketches.
The algebraist versus the analyst. “Four riding in on a bicycle and then riding off again.” Numbers as moments, not an object (though it doesn’t lose generality.) “Cantor is hopeless at it.” (on zero.) “Do [numbers] start with 0 or 1? Yes and yes.” Frege and Russell finally give proper status to zero. The misreading of counting: does arithmetic start from counting? Number sequence is already in place, rather, we construct an isomorphism. There is a mistaken belief we count from one. Isomorphisms avoid counting, give proper status to zero, and sidestep the issue of how counting actually works (a transitive verb: pre-counting, we must decide what to count.) Contrary to popular depiction in Logicomix, Gödel and Russell did meet. Quine logic and Church logic. “It is not true that the square root of two is not irrational” requires every number be rational or irrational.
Why do we care about old people? How do we make progress in philosophy? Orders were syntactic rather than semantic: Kripke and Tarksi developed a hierarchy of truths. Free variable reasoning helped resolve nominala nd typical ambiguity: a scientific approach to a philosophical problem. “What nowadays constitutes research—namely, Googling it.” Nominal ambiguity: assert “x is even”, actually “forall x, x is even.” Quote: “It’s clear from the letter that he didn’t get past page five [of the Principia].” The word variable is deeply misleading, it’s not a variable name (progress!) “There are no undetermined men.” Anaphoric pronoun. We can’t express rules of inference this way.
Types: variables must have ranges. Hardly any theorems (all of the statements were schemas): we wanted to prove things about all types, but couldn’t on pain of contradiction. So all of the variables are type ically ambiguous. There is an argument in volume 2 respecting infinity, but a small world gives you the wrong mathematics (positivism.) But there was a bright idea:even if there aren’t enough things in the world, if there are k things, there are 2^k classes of things, etc. Go up the hierarchy. This is the interpretation for typical ambiguity. Whitehead thought theories were meaningless strings without types (a sort of macro in theory-land). ST made the language/metalanguage distinction!! “Seeing” is determining the type. The logocentric predicament is you’re supposed to use reasoning, but this reasoning is outside the formal system. Puns of operators on higher types, decorating all operators with a type label. The stratification of types.
Free variable reasoning is the same for typically ambiguous reasoning. Abbreviation for quantified reasoning (needs messy rules for inside quantifiers), indefinite names (can’t be the variable name, can’t lead to indefinite things), schematic names (lambdas: correct for variables, and modern for types.) But arguments that don’t convince someone unless they believe it (skepticism) sees: if the correct logic is type theoretic and oustide it, then we don’t have a position outside reasoning. (It’s a one way direction.) I think there is a way of talking about a system from within it. We have a weakened sense of truth: if you already believe it, it’s OK, but there is no convincing power.
The next lecture came from the computer scientist world. “Arguably, the more [programming languages] took from formal logic, the better it is.” Otherwise, it is the “ad-hoc craetion of electricians”. Computers allow easy formal manipulation and correctness checks. But for the mathematics? There isn’t very much of it. Proofs can be checked algorithmically (with formal inference rules). “Because there are many philosophers here, I hope I can answer questions in a suitably ambiguous manner.” Symbolism allows us to do “easy” things mechanically (Whitehead quote.) Do we need formal methods? In 1994 Pentium was found to have an error in floating point division. Robin’s conjecture was incorrectly proved. Different proof systems: de Bruijn generates proofs which are checked by a separate checker, LCF reduces all rules to primitive inferences checked by a logical kernel. After all, why don’t we prove that our proof assistants work? HOL Light (Principia) is only 430 lines of code. Schaffer’s joke: Ramseyfied Types. Right now, formal logic is on the edge of 20th century research mathematics, proofs needing “only 10k lines of code.” Maintenance of formal proofs is a big problem: we need intermediate declarative abstract models. Check out Flyspeck.
I had some scribblings in the margins: “references in logic?” (I think that’s linear logic), how about performance proofs (guaranteed to run in such-and-such time, realtime proofs), or probabilistically checkable proofs. Maybe complexity theory has something to say here.
Turing Machines
Their method of efficient access is… a zipper. Oh man!
GHC
My scribblings here are largely illegible, but it seems a few concepts initially gave me trouble:
- Stack layout, keeping up and down straight, info tables, and the motion of the stack pointer. I have a pretty good idea how this all works now, but in the beginning it was quite mysterious.
CmmNode constructors have a lot of field, and constructing a correspondence with printed C– is nontrivial.- Sizes of variables.
- Headers, payloads and code.
- Pointer tagging, esp. with respect to values living in registers, on the stack, and what the tag bits mean depending on context (functions or data). I never did figure out how the compacting GC worked.
This concludes the first notebook.
June 17, 2011I recently encountered the following pattern while writing some Haskell code, and was surprised to find there was not really any support for it in the standard libraries. I don’t know what it’s called (neither did Simon Peyton-Jones, when I mentioned it to him), so if someone does know, please shout out. The pattern is this: many times an endomorphic map (the map function is a -> a) will not make very many changes to the underlying data structure. If we implement the map straight-forwardly, we will have to reconstruct the entire spine of the recursive data structure. However, if we use instead the function a -> Maybe a, we can reuse old pieces of the map if there were no changes to it. (Regular readers of my blog may recognize this situation from this post.) So what is such an alternative map (a -> Maybe a) -> f a -> Maybe (f a) called?
One guess it might be the traverse function in Data.Traversable: it certainly has a very similar type signature: Applicative f => (a -> f b) -> t a -> f (t b). However, the semantics are subtly different, as you can see from this example:
Data.Traversable> traverse (\x -> if x == 2 then Just 3 else Nothing) [1,2,3]
Nothing
Recall that our function only returns Nothing in the event of no change. Thus, we should have gotten the result Just [1,3,3]: the first and third elements of the list unchanged, and the second element of the list with its new value.
How would we implement such a function for lists? Here’s a simple implementation:
nonSharingMap :: (a -> Maybe a) -> [a] -> Maybe [a]
nonSharingMap f xs = let (b, r) = foldr g (False, []) (zip xs (map f xs))
in if b then Just r else Nothing
where g (y, Nothing) (b, ys) = (b, y:ys)
g (_, Just y) (_, ys) = (True, y:ys)
But we can do better than this. Consider a situation where all elements of the list except the head stay the same:

We would like to share the tail of the list between the old and new versions. With a little head-scratching, and the realization that tails shares, we can write this version:
sharingMap :: (a -> Maybe a) -> [a] -> Maybe [a]
sharingMap f xs = let (b, r) = foldr g (False, []) (zip3 (tails xs) xs (map f xs))
in if b then Just r else Nothing
where g (_, y, Nothing) (True, ys) = (True, y:ys)
g (_, _, Just y) (True, ys) = (True, y:ys)
g (ys', _, Nothing) (False, _) = (False, ys')
g (_, _, Just y) (False, ys) = (True, y:ys)
Open questions: what is this pattern called? Why doesn’t it follow the usual applicative structure? Does it fulfill some higher order pattern? Also, this scheme isn’t fully compositional: if I pass you a Nothing, you have no access to the original version in case there was a change elsewhere in the structure: (Bool, a) might be a little more compositional. Does this mean this is an example of the state monad? What about sharing?
Update. Anders Kaseorg writes in with a much more straight-forward, directly recursive version of the function:
sharingMap f [] = Nothing
sharingMap f (x : xs) = case (f x, sharingMap f xs) of
(Nothing, Nothing) -> Nothing
(y, ys) -> Just (fromMaybe x y : fromMaybe xs ys)
I haven’t checked, but one hope of expressing the function in terms of foldr and zip3 is that one may be able to get it to fuse. Of course, for actual recursive spine-strict data types, you usually won’t be able to fuse, and so a more straightforward presentation is more normal.