ezyang's blog

the arc of software bends towards understanding

Posts

Pipelines and continuations

Attention conservation notice. Function pipelines offer an intuitive way to think about continuations: continuation-passing style merely reifies the pipeline. If you know continuations, this post probably won’t give you much; otherwise, I hope this is an interesting new way to look at them. Why do you care about continuations? They are frequently an extremely fast way to implement algorithms, since they are essentially pure (pipeline) flow control.

In Real World Haskell, an interesting pattern that recurs in functional programs that use function composition (.) is named: pipelining. It comes in several guises: Lispers may know it as the “how many closing parentheses did I need?” syndrome:

Read more...

System.Posix.Redirect

System.Posix.Redirect is a Haskell implementation of a well-known, clever and effective POSIX hack. It’s also completely fails software engineering standards. About a week ago, I excised this failed experiment from my work code and uploaded it to Hackage for strictly academic purposes.

What does it do? When you run a command in a shell script, you have the option of redirecting its output to another file or program:

$ echo "foo\n" > foo-file
$ cat foo-file
foo
$ cat foo-file | grep oo
foo

Many APIs for creating new processes which allow custom stdin/stdout/stderr handles exist; what System.Posix.Redirect lets you do is redirect stdout/stderr without having to create a new process:

Read more...

Maximum matching deadlock solution

Last Monday, I presented a parallel algorithm for computing maximum weighted matching, and noted that on real hardware, a naive implementation would deadlock.

Several readers correctly identified that sorting the nodes on their most weighted vertex only once was insufficient: when a node becomes paired as is removed from the pool of unpaired nodes, it could drastically affect the sort. Keeping the nodes in a priority queue was suggested as an answer, which is certainly a good answer, though not the one that Feo ended up using.

Read more...

Flipping arrows in coBurger King

Category theory crash course for the working Haskell programmer.

A frequent question that comes up when discussing the dual data structures—most frequently comonad—is “What does the co- mean?” The snippy category theory answer is: “Because you flip the arrows around.” This is confusing, because if you look at one variant of the monad and comonad typeclasses:

class Monad m where
  (>>=) :: m a -> (a -> m b) -> m b
  return :: a -> m a

class Comonad w where
  (=>>) :: w a -> (w a -> b) -> w b
  extract :: w a -> a

there are a lot of “arrows”, and only a few of them flipped (specifically, the arrow inside the second argument of the >>= and =>> functions, and the arrow in return/extract). This article will make precise what it means to “flip arrows” and use the “dual category”, even if you don’t know a lick of category theory.

Read more...

Graphs not grids: How caches are corrupting young algorithms designers and how to fix it

Subtitle: Massively multithreaded processors make your undergraduate CS education relevant again.

Quicksort. Divide and conquer. Search trees. These and other algorithms form the basis for a classic undergraduate algorithms class, where the big ideas of algorithm design are laid bare for all to see, and the performance model is one instruction, one time unit. “One instruction, one time unit? How quaint!” proclaim the cache-oblivious algorithm researchers and real world engineers. They know that the traditional curriculum, while not wrong, is quite misleading. It’s simply not enough to look at some theoretical computing machine: the next-generation of high performance algorithms need to be in tune with the hardware they run on. They couldn’t be more right.

Read more...

Safety first: FFI and threading

Update. While this blog post presents two true facts, it gets the causal relationship between the two facts wrong. Here is the correction.

Attention conservation notice. Don’t use unsafe in your FFI imports! We really mean it!

Consider the following example in from an old version of Haskellwiki’s FFI introduction:

{-# INCLUDE <math.h> #-}
{-# LANGUAGE ForeignFunctionInterface #-}
module FfiExample where
import Foreign.C -- get the C types

-- pure function
-- "unsafe" means it's slightly faster but can't callback to haskell
foreign import ccall unsafe "sin" c_sin :: CDouble -> CDouble
sin :: Double -> Double
sin d = realToFrac (c_sin (realToFrac d))

The comment blithely notes that the function can’t “callback to Haskell.” Someone first learning about the FFI might think, “Oh, that means I can put most unsafe on most of my FFI declarations, since I’m not going to do anything advanced like call back to Haskell.”

Read more...

Groom: human readable Show for Haskell

Tapping away at a complex datastructure, I find myself facing a veritable wall of Babel.

image

“Zounds!” I exclaim, “The GHC gods have cursed me once again with a derived Show instance with no whitespace!” I mutter discontently to myself, and begin pairing up parentheses and brackets, scanning the sheet of text for some discernible feature that may tell me of the data I am looking for.

But then, a thought comes to me: “Show is specified to be a valid Haskell expression without whitespace. What if I parsed it and then pretty-printed the resulting AST?”

Read more...

Little’s law

A short thought from standing in line at the World Expo: Little’s law is a remarkable result that relates the number of people in a queue, the arrival rate of people to the queue, and the time spent waiting in the queue. It seems that it could be easily applied to a most universal feature of theme parks: waiting queues. Instead of such unreliable methods as giving visitors tokens to test how long it takes to traverse some portion of the line and then eyeballing the wait time from there, it would be a simple matter to install two gates: one to count incoming visitors and one to count outgoing visitors, and with this data derive an instantaneous “wait time in queue” figure based on a smoothed running average of queue size and arrival rate. Added benefit for being electronic, which means you can easily beam it to information boards across the park!

Read more...

MVC and Purity

Attention conservation notice. Purely functional programming demonstrates the same practices recommended by object-oriented MVC practice.

Model-View-Controller is a widely used object-oriented design pattern for organizing functionality in an application with a user interface. I first ran across it in my early days programming web applications. The Model/View separation made deep intuitive sense to me as a PHP programmer: without it, you’d end up with spaghetti templates with HTML print statements interleaved with MySQL queries. But Controller was always a little wishy-washy. What exactly did it do? It was some sort of “glue” code, the kind of stuff that bound together the Model and View and gave them orders. But this was always a sort of half-hearted answer for me (where should input validation go?), and soon I left the world of web applications, my questions unanswered.

Read more...

Call and fun: marshalling redux

This part six of a six part introduction to c2hs. We finally talk about what ostensibly is the point of c2hs: calling C functions from Haskell. c2hs, due to its knowledge of the C headers, can already do the work for generating FFI imports. The call hook simply tells c2hs to generate the FFI import, while the fun hook generates another Haskell function which performs marshalling.

Call. The format of call is quite simple, because like get and set, it is meant to be interleaved with other Haskell code. If I would like to invoke the readline function from readline/readline.h, a {#call readline #} would suffice; c2hs will then generate the FFI import with the correct signature and transform the call directive into the name of the FFI import.

Read more...