ezyang's blog

the arc of software bends towards understanding

Galois Tech Talk

Tour of a distributed Erlang application

Bonus post today! Last Tuesday, John Erickson gave a Galois tech talk entitled “Industrial Strength Distributed Explicit Model Checking” (video), in which he describe PReach, an open-source model checker based on Murphi that Intel uses to look for bugs in its models. It is intended as a simpler alternative to Murphi’s built-in distributed capabilities, leveraging Erlang to achieve much simpler network communication code.

image

First question. Why do you care?

  • Model checking is cool. Imagine you have a complicated set of interacting parallel processes that evolve nondeterministically over time, using some protocol to communicate with each other. You think the code is correct, but just to be sure, you add some assertions that check for invariants: perhaps some configurations of states should never be seen, perhaps you want to ensure that your protocol never deadlocks. One way to test this is to run it in the field for a while and report when the invariants fail. Model checking lets you comprehensively test all of the possible state evolutions of the system for deadlocks or violated invariants. With this, you can find subtle bugs and you can find out precisely the inputs that lead to that event.
  • Distributed applications are cool. As you might imagine, the number of states that need to be checked explodes exponentially. Model checkers apply algorithms to coalesce common states and reduce the state space, but at some point, if you want to test larger models you will need more machines. PReach has allowed Intel to run the underlying model checker Murphi fifty times faster (with a hundred machines).

This talk was oriented more towards to the challenges that the PReach team encountered when making the core Murphi algorithm distributed than how to model check your application (although I’m sure some Galwegians would have been interested in that aspect too.) I think it gave an excellent high level overview of how you might design a distributed system in Erlang. Since the software is open source, I’ll link to relevant source code lines as we step through the high level implementation of this system.

Read more...

Maximum matching deadlock solution

Last Monday, I presented a parallel algorithm for computing maximum weighted matching, and noted that on real hardware, a naive implementation would deadlock.

Several readers correctly identified that sorting the nodes on their most weighted vertex only once was insufficient: when a node becomes paired as is removed from the pool of unpaired nodes, it could drastically affect the sort. Keeping the nodes in a priority queue was suggested as an answer, which is certainly a good answer, though not the one that Feo ended up using.

Read more...

Graphs not grids: How caches are corrupting young algorithms designers and how to fix it

Subtitle: Massively multithreaded processors make your undergraduate CS education relevant again.

Quicksort. Divide and conquer. Search trees. These and other algorithms form the basis for a classic undergraduate algorithms class, where the big ideas of algorithm design are laid bare for all to see, and the performance model is one instruction, one time unit. “One instruction, one time unit? How quaint!” proclaim the cache-oblivious algorithm researchers and real world engineers. They know that the traditional curriculum, while not wrong, is quite misleading. It’s simply not enough to look at some theoretical computing machine: the next-generation of high performance algorithms need to be in tune with the hardware they run on. They couldn’t be more right.

Read more...

Well-founded recursion in Agda

Last Tuesday, Eric Mertens gave the Galois tech talk Introducing Well-Founded Recursion. I have to admit, most of this went over my head the first time I heard it. Here are some notes that I ended up writing for myself as I stepped through the code again. I suggest reading the slides first to get a feel for the presentation. These notes are oriented towards a Haskell programmer who feels comfortable with the type system, but not Curry-Howard comfortable with the type system.

Read more...

Static Analysis for everyday (not-PhD) man

Bjarne Stroustrup once boasted, “C++ is a multi-paradigm programming language that supports Object-Oriented and other useful styles of programming. If what you are looking for is something that forces you to do things in exactly one way, C++ isn’t it.” But as Taras Glek wryly notes, most of the static analysis and coding standards for C++ are mostly to make sure developers don’t use the other paradigms.

image

On Tuesday, Taras Glek presented Large-Scale Static Analysis at Mozilla. You can watch the video at Galois’s video stream. The guiding topic of the talk is pretty straightforward: how does Mozilla use static analysis to manage the millions of lines of code in C++ and JavaScript that it has? But there was another underlying topic: static analysis is not just for the formal-methods people and the PhDs; anyone can and should tap into the power afforded by static analysis.

Read more...

Databases are categories

Update. The video of the talk can be found here: Galois Tech Talks on Vimeo: Categories are Databases.

On Thursday Dr. David Spivak presented Categories are Databases as a Galois tech talk. His slides are here, and are dramatically more accessible than the paper Simplicial databases. Here is a short attempt to introduce this idea to people who only have a passing knowledge of category theory.

An essential exercise when designing relational databases is the practice of object modeling using labeled graphs of objects and relationships. Visually, this involves drawing a bunch of boxes representing the objects being modeled, and then drawing arrows between the objects showing relationships they may have. We can then use this object model as the basis for a relational database schema.

Read more...