ezyang's blog

the arc of software bends towards understanding

Posts

Someone is wrong on the Internet

The perverse incentive structure of the Internet

Suppose you have a Random Question™ that you would like answered. For example, “Did Sir Francis Galton have any children?” It’s the type of thing you’d type into Google.

image

Answers.com is not a terribly good source, but it is something to at least see if you can scrounge up some extra information on. So suppose you dig a little deeper and discover that Francis Galton only married once and that this marriage was barren. So unless some historian’s account of Galton suffering from venereal disease (Kevles 12) indicates that he knocked up some ladies in Syria and managed to pop out four offspring, who happened to conveniently be named “Francis, Harriet, Matilda and Mark,” Answers.com is wrong.

Read more...

Many-valued logics and bottom

I was flipping through An Introduction to Non-Classical Logic by Graham Priest and the section on many-valued logics caught my eye. Many-valued logics are logics with more than the usual two truth values true and false. The (strong) Kleene 3-valued logic, sets up the following truth table with 0, 1 and x (which is thought to be some value that is neither true nor false):

NOT
    1 0
    x x
    0 1

AND
      1 x 0
    1 1 x 0
    x x x 0
    0 0 0 0

OR
      1 x 0
    1 1 1 1
    x 1 x x
    0 1 x 0

IMPLICATION
      1 x 0
    1 1 x 0
    x 1 x x
    0 1 1 1

I’ve always thought many-valued logics were a bit of a “hack” to deal with the self-referentiality paradoxes, but in fact, Kleene invented his logic by thinking about what happened with partial functions where applied with values that they were not defined for: a sort of denotation failure. So it’s not surprising that these truth tables correspond to the parallel-or and and operators predicted by denotational semantics.

Read more...

Killer mutants attack (mutation gone wrong)

This is a collection of WTFs due to misuse of mutable state.


We’ll start off with some Java. What do you expect this snippet of code to do? :

Sensor Accel = sm.getDefaultSensor(Sensor.TYPE_ACCELEROMETER);
Sensor Orient = sm.getDefaultSensor(Sensor.TYPE_ORIENTATION);
sm.registerListener((SensorEventListener) this, Accel, sm.SENSOR_DELAY_FASTEST);

Ostensibly, it registers the current object to receive just accelerometer updates. But what if I told you getDefaultSensor was implemented like this:

public Sensor getDefaultSensor (int type){
        if(sensors == null) {
                sensors = new Sensor(mContext, type);
                return sensors;                 
        }else if(sensors.checkList(type)){
                sensors.addSensor(type);
                return sensors;
        }else{
                sensors.removeSensor(type);
                return sensors;
        }
}

This code completely fails to manage the expected semantics: there is a single sm wide Sensor object (stored in sensors) that accumulates sensor values as getDefaultSensor is called. So in fact, this will receive events from both the accelerometer and the magnetometer. The only saving grace is that when we register event listeners, we usually do want them to receive all events, so we might not notice if we weren’t looking too closely. This is real code from OpenIntents SensorSimulator.

Read more...

Type Technology Tree

They say that one doesn’t discover advanced type system extensions: rather, the type system extensions discover you! Nevertheless, it’s worthwhile to know what the tech tree for GHC’s type extensions are, so you can decide how much power (and the correspondingly headache inducing error messages) you need. I’ve organized the relations in the following diagram with the following criterion in mind:

  1. Some extensions automatically enable other extensions (implies);
  2. Some extensions offer all the features another extension offers (subsumes);
  3. Some extensions work really nicely with other extensions (synergy);
  4. Some extensions offer equivalent (but differently formulated) functionality to another extension (equiv).

It’s also worth noting that the GHC manual divides these extensions into “Extensions to data types and type synonyms”, “Class and instances declarations”, “Type families” and “Other type system extensions”. I have them organized here a little differently.

Read more...

Petri net concurrency

A petri net is a curious little graphical modeling language for control flow in concurrency. They came up in this talk a few weeks ago: Petri-nets as an Intermediate Representation for Heterogeneous Architectures, but what I found interesting was how I could describe some common concurrency structures using this modeling language.

Here is, for example, the well venerated lock:

image

The way to interpret the graph is thus: each circle is a “petri dish” (place) that may contain some number of tokens. The square boxes (transitions) are actions that would like to fire, but in order to do so all of the petri dishes feeding into them must have tokens. It’s the sort of representation that you could make into a board game of sorts!

Read more...

The creation of a statically-typed functional programmer

The bug bit me in early 2009, during MIT’s independent activities period; really, it was two bugs. The first was 6.184, the re-animated introductory computer science class taught in Scheme—for obvious reasons. But I don’t think that was sufficient: I seemed to recall thinking Scheme was interesting but not a language I actually wanted to code in. The second was a comment made by Anders Kaseorg after I finished delivering a talk Introduction to Web Application Security (one of the few things that, as a freshman at MIT, I thought I knew well enough to give a lecture on). One of the emphases of the talk was all about types: that is, the fact that “string” doesn’t adequately represent the semantic content of most bits of text that float around in our applications these days. Haskell came up as a way of making your compiler make sure you didn’t mix up HTML with plain text.

Read more...

On cargo culting and hacking

two inflammatory vignettes

The term to cargo cult is one with derogatory connotations: it indicates the act of imitating the superficial exterior without actually understanding the underlying causal structure of the situation. The implication is that one should try to understand what one is doing, before doing it. There is, however, an ounce of truth in the practice of cargo culting: when you are in a situation in which you legitimately do not know what’s going on (e.g. the context of an experiment), it is safest to preserve as many superficial traits as possible, in case a “superficial” trait in fact has a deep, non-obvious connection to the system being studied. But in this regard, beneficial “cargo culting” is nothing like the islanders throwing up airstrips in hopes of attracting planes—understanding what conditions are applicable for this treatment is often the mark of experience: the novice ignores conditions that should be preserved and does not know how to probe deeper.

Read more...

Multi-day debugging

Most of my hacking cycles right now are going towards debugging the new code generator for GHC. The code generation stage of GHC takes the Spineless Tagless G-machine (STG) intermediate representation (IR) to the C– high-level assembly representation; the old code generator essentially performed this step in one big bang. The new code generator is many things. It is a more modular, understandable and flexible codebase. It is a client of cutting edge research in higher-order frameworks for control-flow optimization.

Read more...

Ad hoc approximations

In his book Against Method, Paul Feyerabend writes the following provocative passage about ‘ad hoc approximations’, familiar to anyone whose taken a physics course and thought, “Now where did they get that approximation from…”

The perihelion of Mercury moves along at a rate of about 5600" per century. Of this value, 5026" are geometric, having to do with the movement of the reference system, while 531" are dynamical, due to the perturbations in the solar system. Of these perturbations all but the famous 43" are accounted for by classical mechanics. This is how the situation is usually explained.

Read more...

Semi-automatic testing

When programmers automate something, we often want to go whole-hog and automate everything. But it’s good to remember there’s still a place for manual testing with machine assistance: instead of expending exponential effort to automate everything, automate the easy bits and hard-code answers to the hard research problems. When I was compiling the following graph of sources of test data, I noticed a striking polarization at the ends of “automated” and “non-automated.”

Read more...