OCaml gotchas
February 4, 2011I spent some time fleshing out my count min sketch implementation for OCaml (to be the subject of another blog post), and along the way, I noticed a few more quirks about the OCaml language (from a Haskell viewpoint).
Unlike Haskell’s
Int, which is 32-bit/64-bit, the built-in OCamlinttype is only 31-bit/63-bit. Bit twiddlers beware! (There is anativeinttype which gives full machine precision, but it less efficient than theinttype).Semicolons have quite different precedence from the “programmable semicolon” of a Haskell do-block. In particular:
let rec repeat n thunk = if n == 0 then () else thunk (); repeat (n-1) thunkdoesn’t do what you’d expect similarly phrased Haskell. (I hear I’m supposed to use
beginandend.)You can only get 30-bits of randomness from the Random module (an positive integer using Random.bits), even when you’re on a 64-bit platform, so you have to manually stitch multiple invocations to the generator together.
I don’t like a marching staircase of indentation, so I hang my “in”s after their statements—however, when they’re placed there, they’re easy to forget (since a
letin a do-block does not require aninin Haskell).Keyword arguments are quite useful, but they gunk up the type system a little and make it a little more difficult to interop keyword functions and non-keyword functions in a higher-order context. (This is especially evident when you’re using keyword arguments for documentation purposes, not because your function takes two ints and you really do need to disambiguate them.)
One observation about purity and randomness: I think one of the things people frequently find annoying in Haskell is the fact that randomness involves mutation of state, and thus be wrapped in a monad. This makes building probabilistic data structures a little clunkier, since you can no longer expose pure interfaces. OCaml is not pure, and as such you can query the random number generator whenever you want.
However, I think Haskell may get the last laugh in certain circumstances. In particular, if you are using a random number generator in order to generate random test cases for your code, you need to be able to reproduce a particular set of random tests. Usually, this is done by providing a seed which you can then feed back to the testing script, for deterministic behavior. But because OCaml’s random number generator manipulates global state, it’s very easy to accidentally break determinism by asking for a random number for something unrelated. You can work around it by manually bracketing the global state, but explicitly handling the randomness state means providing determinism is much more natural.
If you want to use different generators in parallel, avoiding interferences, you may use the Random.State module which is intended for this purpose: http://caml.inria.fr/pub/docs/manual-ocaml/libref/Random.State.html
Random.State provides a reification of the generator state that can be used independently – instead of a global generator state. Of course, you then have to pass those local states around, either by hand or using a monadic style.
Luke, that’s quite a good way of thinking about it. It makes me wonder now if we can design a random monad that supports symbolic evaluation and simulation.
gasche and Jean-Denis, good points.
Two alternate ways of putting ‘repeat’. I’m not sure if I prefer mine, though.
let rec repeat thunk = function | 0 -> () | n -> thunk (); repeat thunk (pred n)
let rec repeat thunk n = if n > 0 then begin thunk (); repeat thunk (pred n) end if n = 0 then () else thunk (); repeat (n-1) thunk
My big hassle about Haskell’s random monad is that it can’t guarantee that you don’t run it twice with the same initial seed, an easy error to make that could be fatal to some crypto applications. I had a chat about this on irc and the conclusion seems to be that doing it the right way would require linear types in the language.
May I ask what possessed you to start using Ocaml? How do you like it? I’m annoyed and disillusioned with Haskell in several ways, but Ocaml seems pretty crude by comparison. I haven’t yet felt pioneerish enough to try DDC.