Why verification results in higher quality code

by Edward Z. Yang

Correctness is overrated. After all, no one knows what it means for any reasonably complicated system to be "correct", and even when we do, the mileposts move around on a daily basis. With the raison d'être of formal verification stripped away, can we still consider it a worthy goal?

Perhaps verification results in higher quality code. But this is not obviously true: correctness is not quality. We might hope that high quality code is readable and easily understood, that it should be as self-contained and independent from the rest of the system, that it is efficient and economical. There is no a priori reason to believe that verification would grant us any of these properties. No matter how horrible some code is, as long as it is correct, there exists a proof which vouches for its correctness.

But as anyone who has gone through the sweat and tears of verifying a program can tell you, formal verification really does make your code better. Here's the secret: proving theorems is really hard. If we want any hope to successfully prove something about a program, we must to make reasoning about the code as easy as possible. A program under verification irresistibly gravitates towards it’s most “reasonable” form, because otherwise the proofs are simply too arduous to carry out. And in this form, the tenets of high quality code follow.

Take for example Bedrock, a system for building verified low-level programs which manipulate pointers and registers. These are programs that deal with mutable state, a feature which is well known to dramatically increase the difficulty of reasoning. Bedrock, and many systems like it, would be dead out of the water if not for the development of an important system called separation logic. The central idea behind it is so obvious to any experienced practitioner it is barely worth stating: private local state is easier to reason about than public global state—modularity is good. It enforces this through a clever formalism, the star operator, which combines two assertions about two regions of memory while assuring that the regions are disjoint. Regardless, the end result is this: if your components are independent, the theorem proving is easy; if your components are tangled together, the theorem proving is hard. You do the math.

But it doesn’t stop there. When different components do interact, the principle of encapsulation says that I do not want to know all of the gory details of a component, just its high-level interface. In theorem prover land, “all of the gory details” means unmanageably large facts about many pointers, and a "high-level interface" is an abstract predicate which rolls up all of these facts into a single, cohesive logical fact (“this is a linked list.”) Developing these predicates is critical to keeping your theorem statements concise and understandable, and in higher-order provers like Bedrock, they can apply not only to data but also to code, i.e. function pointers.

The tenets of high quality code speak for code that is written for humans to understand, and not just machines to execute. But code that is written for machines to understand have many of the same properties that are valued by humans, for if they do not, getting the machine to “understand” becomes an impossible task. Computers may be simple-minded, but all that means is code a computer can understand is code that you can understand too. And that is high quality code.