In his classic essay Reflections on Trusting Trust, Ken Thompson describes a self-replicating compiler bug which is undetectable by source code inspection. The self-replication is made possible by the fact that most compilers are self-compiling: old versions of a compiler are used to compile new ones, and if the old version is malicious, it can slip the same bug when it detects it is compiling itself.
A new trend is precisely this self-hosting process, but for self-certifying typecheckers: typecheckers which are used to prove their own correctness. (Note that these are powerful typecheckers, close to being able to check arbitrary theorems about code.) This may seem a little odd, since I could write a trivial typechecker which always claimed it was correct. In order to work around this, we must bootstrap the correctness proof by proving the typechecker correct in another language (in the case of F*, this language is Coq). Once this has been done, we can then use this verified typechecker to check a specification of itself. This process is illustrated below.
Read more...
What do automatic memory management, static types and purity have in common? They are methods which take advantage of the fact that we can make programs obviously correct (for some partial definition of correctness) upon visual inspection. Code using automatic memory management is obviously correct for a class of memory bugs. Code using static types is obviously correct for a class of type bugs. Code using purity (no mutable references or side effects) is obviously correct for a class of concurrency bugs. When I take advantage of any of these techniques, I don’t have to prove my code has no bugs: it just is, automatically!
Read more...
Being back in town over MIT’s Infinite Activities Period is making me think about what kind of short lecture I want to try teaching. I’ve been turning over the idea of a polyglot programming class in my head: the idea is that while most people feel really comfortable with only one or two programming languages, you can learn how to make this knowledge work for you in almost any programming language you could possible language.
Read more...
The opinions presented in this post are not necessarily mine. I’m just one very confused undergraduate senior with a lot of soul searching to do.
When I tell my friends, “I’m going to get a PhD,” I sometimes get the response, “Good for you!” But other times, I get the response, “Why would you want to do that?” as if I was some poor, misguided soul, brainwashed by a society which views research as the “highest calling”, the thing that the best go to, while the rest go join industry. “If you’re a smart hacker and you join industry,” they say, “you will have more fun immediately, making more impact and more money.”
Read more...
This is notebook two.
Max Schäfer: Refactoring Java
Most Java refactoring tools built into IDEs like Eclipse are little more than glorified text manipulation macros. There are no guarantees that the result of your refactoring will have the same behavior as the original: you can even refactor code that doesn’t even compile! To prevent this, most refactorings require heavy and hard-to-understand preconditions for refactoring. Max brings two ideas to the table:
Read more...
I recently attended a talk which discussed extending proof assistants with diagrammatic reasoning support , helping to break the hegemony of symbolic systems that is predominant in this field. While the work is certainly novel in some respects, I can’t also but help think that we’ve come back full circle to the Ancient Greeks, who were big fans of geometry, and its correspondingly visual form of reasoning. The thought came up again while I was reading a mathematics text and marveling at the multiple methods of presenting a single concept. In this essay, I’d like to look at this return to older, more “intuitive” forms of reasoning: I’ve called it “Hellenistic reasoning” because geometry and the Socratic method nicely sum up visual and interactive reasoning that I’d like to discuss. I argue that this resurgence is a good thing, and that though these forms of reasoning may not be as powerful or general as symbolic reasoning, they will be critical to the application and communication of abstract mathematical results.
Read more...
Books are expensive, but by the power of higher-education (also expensive, but differently so), vast oceans of books are available to an enterprising compsci. Here’s my reading list for the spring break lending period (many of which were recommended on #haskell:

- Concepts, Techniques, and Models of Computer Programming by Peter Van Roy and Seif Haridi. Wonderfully iconoclastic book, and probably one of the easier reads on the list.
- Types and Programming Languages by Benjamin Pierce. I’ve been working on this one for a while; this break I’m focusing on the proof strategies for preservation, progress and safety, and also using it to complement a self-study course summed up by the next book.
- Lectures on the Curry-Howard Isomorphism by M.H. Sørensen and P. Urzycyzn. Very good, I’ve skimmed the first three chapters and I’m working on the exercises in chapter 2. I’ve been prone to making silly mis-assertions about the Curry-Howard Isomorphism (or is it?), so I’m looking forward to more firmly grounding my understanding of this correspondence. The sections on intuitionistic logic has already been very enlightening.
- Type Theory and Functional Programming by Simon Thompson. Haven’t looked at it yet, but fits into the general course of the previous two books.
- Purely Functional Data Structures by Chris Okasaki. Also one I’ve been working on a while. Working on compressing all the information mentally.
- Basic Category Theory for Computer Scientists by Benjamin Pierce. I’ve got two items on category theory; I got this one on a whim. Haven’t looked at it yet.
- Pearls of Functional Algorithm Design by Richard Bird. Something like a collection of puzzles. I think I will enjoy reading through them and working out the subtleties. I probably won’t get to the information compression stage this time around.
- Category Theory by Steve Awodey. I was working on the exercises in this textbook, and think I might get past the first chapter.
I was flipping through An Introduction to Non-Classical Logic by Graham Priest and the section on many-valued logics caught my eye. Many-valued logics are logics with more than the usual two truth values true and false. The (strong) Kleene 3-valued logic, sets up the following truth table with 0, 1 and x (which is thought to be some value that is neither true nor false):
NOT
1 0
x x
0 1
AND
1 x 0
1 1 x 0
x x x 0
0 0 0 0
OR
1 x 0
1 1 1 1
x 1 x x
0 1 x 0
IMPLICATION
1 x 0
1 1 x 0
x 1 x x
0 1 1 1
I’ve always thought many-valued logics were a bit of a “hack” to deal with the self-referentiality paradoxes, but in fact, Kleene invented his logic by thinking about what happened with partial functions where applied with values that they were not defined for: a sort of denotation failure. So it’s not surprising that these truth tables correspond to the parallel-or and and operators predicted by denotational semantics.
Read more...
This is a collection of WTFs due to misuse of mutable state.
We’ll start off with some Java. What do you expect this snippet of code to do? :
Sensor Accel = sm.getDefaultSensor(Sensor.TYPE_ACCELEROMETER);
Sensor Orient = sm.getDefaultSensor(Sensor.TYPE_ORIENTATION);
sm.registerListener((SensorEventListener) this, Accel, sm.SENSOR_DELAY_FASTEST);
Ostensibly, it registers the current object to receive just accelerometer updates. But what if I told you getDefaultSensor was implemented like this:
public Sensor getDefaultSensor (int type){
if(sensors == null) {
sensors = new Sensor(mContext, type);
return sensors;
}else if(sensors.checkList(type)){
sensors.addSensor(type);
return sensors;
}else{
sensors.removeSensor(type);
return sensors;
}
}
This code completely fails to manage the expected semantics: there is a single sm wide Sensor object (stored in sensors) that accumulates sensor values as getDefaultSensor is called. So in fact, this will receive events from both the accelerometer and the magnetometer. The only saving grace is that when we register event listeners, we usually do want them to receive all events, so we might not notice if we weren’t looking too closely. This is real code from OpenIntents SensorSimulator.
Read more...
What semantics has to say about specifications
Conventional wisdom is that premature generalization is bad (architecture astronauts) and vague specifications are appropriate for top-down engineering but not bottom-up. Can we say something a little more precise about this?
Semantics are formal specifications of programming languages. They are perhaps some of the most well-studied forms of specifications, because computer scientists love tinkering with the tools they use. They also love having lots of semantics to pick from: the more the merrier. We have small-step and big-step operational semantics; we have axiomatic semantics and denotational semantics; we have game semantics, algebraic semantics and concurrency semantics. Describing the programs we actually write is difficult business, and it helps to have as many different explanations as possible.
Read more...