Killer mutants attack (mutation gone wrong) : ezyang’s blog

Killer mutants attack (mutation gone wrong)

This is a collection of WTFs due to misuse of mutable state.

We'll start off with some Java. What do you expect this snippet of code to do?

Sensor Accel = sm.getDefaultSensor(Sensor.TYPE_ACCELEROMETER);
Sensor Orient = sm.getDefaultSensor(Sensor.TYPE_ORIENTATION);
sm.registerListener((SensorEventListener) this, Accel, sm.SENSOR_DELAY_FASTEST);

Ostensibly, it registers the current object to receive just accelerometer updates. But what if I told you getDefaultSensor was implemented like this:

public Sensor getDefaultSensor (int type){
        if(sensors == null) {
                sensors = new Sensor(mContext, type);
                return sensors;
        }else if(sensors.checkList(type)){
                sensors.addSensor(type);
                return sensors;
        }else{
                sensors.removeSensor(type);
                return sensors;
        }
}

This code completely fails to manage the expected semantics: there is a single sm wide Sensor object (stored in sensors) that accumulates sensor values as getDefaultSensor is called. So in fact, this will receive events from both the accelerometer and the magnetometer. The only saving grace is that when we register event listeners, we usually do want them to receive all events, so we might not notice if we weren't looking too closely. This is real code from OpenIntents SensorSimulator.

Lest you think I only make fun of other people's code, here is a diff from a project of my own:

@@ -197,13 +197,7 @@ def upload_all(tree, ftp, base):

     ftp.cwd(base)
     for blob in tree.blobs:
-        logging.info('Uploading ' + '/'.join((base, blob.name)))
-        try:
-            ftp.delete(blob.name)
-        except ftplib.error_perm:
-            pass
-        ftp.storbinary('STOR ' + blob.name, blob.data_stream)
-        ftp.voidcmd('SITE CHMOD ' + format_mode(blob.mode) + ' ' + blob.name)
+        upload_blob(blob, ftp, base)

@@ -260,11 +254,25 @@ def upload_diff(diff, tree, ftp, base):
             node = subtree/components[-1]
             assert isinstance(node, Blob)

-            logging.info('Uploading ' + full_path)
-            ftp.storbinary('STOR ' + file, node.data_stream)
-            ftp.voidcmd('SITE CHMOD ' + format_mode(node.mode) + ' ' + file)
-            # Don't do anything if there isn't any item; maybe it
-            # was deleted.
+            upload_blob(node, ftp, base)

It looks fairly plausible: I’ve factored out some common storebinary logic. Can you tell what the bug is? Here’s a hint.

The problem is that the upload_all changes the current working directory on the FTP connection (mutable state!), while upload_diff does not (working entirely from the base working directory). The upload function assumed upload_all style working directory changes, and so all upload_diff uploads were dumped in the base directory. Mutability hurts modularity! The fix was to get rid of this mutation and manually calculate the full path; this also removed some delicate invariant preservation in the original upload_all implementation.

Paradoxically enough, though Haskell encourages you not to use mutation, when you do use it, Haskell expressive static type system gives you the unusual ability to statically encode complicated invariants about your mutation—invariants that would not have been necessary if you hadn’t used mutation. A small example of this is ST monad, which uses rank-2 types to ensure that references to mutable memory cannot escape runST, the isolated “mutation thread.”

To the limit, you may find yourself knee deep in advanced type system features if you try to statically rule out incorrect usages of a mutable API. I found this out when I worked with abcBridge, and tried very hard to use types to prevent improper use of underlying C library. Here is one relevant code quote:

-- | When you duplicate a network, node indexes may change, so if you
-- would like to use old references, they need to be translated first.
-- You can read this type as @Node n -> NT n2 (Node n2)@ or @Node n ->
-- NQ n2 (Node n2)@, with the condition that the @n2@ index was one
-- generated by 'branch' or 'speculate'.
translate :: (G.NetworkMonad m AIG (Dup n n2)) => Node n -> m AIG (Dup n n2) (Node (Dup n n2))

This truly is some WTF, rank-2-phantom-types code, but it grew out of a very specific bug I stumbled onto and was unconvinced that I’d remember to avoid in the future (can you guess what it was?) A curious reader may ask, why do I need to duplicate networks in the first place? Because some operations that the underlying library provides are destructive, and the only way I can provide the illusion of persistent networks is duplicating before destruction.

In summary:

Mutation is frequently not what people expect,
Mutation is not modular, and
Mutation is complicated.

Avoid it when you can!

Robert Massaioli says:

March 13, 2011 at 11:41 pm

Agreed Mutation sucks in all regards. Just writing some Java now and if it was not for the fact that I need some better performance then everything would be immutable.

Behrang says:

March 15, 2011 at 7:06 am

You can not conclude from the first example that mutation is bad. That code is bad, because you have a method that looks like an accessor but does unrelated things as well.

It’s like having a method named start() that terminates a program and then concluding that terminating programs is bad.

March 15, 2011 at 7:08 am

Plus, in Java programs the convention is to start variables with a lower case character.

Achilleas Margaritis says:

March 15, 2011 at 7:22 am

As if the same logical mistake cannot be programmed by Haskell. You people are amazing. Haskell is the new Religion, isn’t it?

Greg Young says:

March 15, 2011 at 12:05 pm

On this first one an ever small dose of CQS would have identified that. Great thing to check on your build server …

Just bad code.

ryaner says:

March 15, 2011 at 4:18 pm

Indeed it is just a problem of bad code.. The point of the post I think is that if you are working with a language like Haskell, this sort of bad code is harder to write.. because of purity, the type system helping you and all.

who? says:

March 18, 2011 at 4:42 pm

Terminating programs is bad.

> It’s like having a method named start() that terminates a program […]

Terminating clearly is a side effect, so it is bad from a “pure” point of view.
For example, in Java you can terminate using the System.exit(…) hack [which does not even unwind the stack], by throwing an Exception [abusing it exceptions for control flow] or using nonmodular return/break/continue statements (in which case you would need extra code in the caller of your start() method to check whether the caller should return/terminate or not).
None of these options is purely functional.

(It is far too easy to use mutable state accidentially in Java, especially in comparison with Haskell.)