ezyang's blog

the arc of software bends towards understanding

OCaml

First impressions of module programming

During my time at Jane Street, I’ve done a fair bit of programming involving modules. I’ve touched functors, type and module constraints, modules in modules, even first class modules (though only peripherally). Unfortunately, the chapter on modules in Advanced Topics in Types and Programming Languages made my eyes glaze over, so I can’t really call myself knowledgeable in module systems yet, but I think I have used them enough to have a few remarks about them. (All remarks about convention should be taken to be indicative of Jane Street style. Note: they’ve open sourced a bit of their software, if you actually want to look at some of the stuff I’m talking about.)

Read more...

From data type definitions to code

What do these problems have in common: recursive equality/ordering checks, printing string representations, serializing/unserializing binary protocols, hashing, generating getters/setters? They are repetitive boilerplate pieces of code that have a strong dependence on the structure of the data they operate over. Since programmers love automating things away, various schools of thought have emerged on how to do this:

  1. Let your IDE generate this boilerplate for you. You right-click on the context menu, click “Generate hashCode()”, and your IDE does the necessary program analysis for you;
  2. Create a custom metadata format (usually XML), which you then run another program on which converts this description into code;
  3. Add sufficiently strong macro/higher-order capabilities to your language, so you can write programs which generate implementations in-program;
  4. Add sufficiently strong reflective capabilities to your language, so you can write a fully generic dynamic implementation for this functionality;
  5. Be a compiler and do static analysis over abstract syntax trees in order to figure out how to implement the relevant operations.

It hadn’t struck me how prevalent the fifth option was until I ran into wide scale use of one particular facet of the camlp4 system. While it describes itself as a “macro system,” in sexplib and bin-prot, the macros being used are not those of the C tradition (which would be good for implementing 3), rather, they are in the Lisp tradition, including access to the full syntax tree of OCaml and the ability to modify OCaml’s grammar. Unlike most Lisps, however, camlp4 has access to the abstract syntax tree of data type definitions (in untyped languages these are usually implicit), which it can use to transform into code.

Read more...

Variant types and GADTs

OCaml supports anonymous variant types, of the form type a = [`Foo of int | `Bar of bool], with the appropriate subtyping relations. Subtyping is, in general, kind of tricky, so I have been using these variant types fairly conservatively. (Even if a feature gives you too much rope, it can be manageable and useful if you use discipline.) Indeed, they are remarkably handy for one particular use-case for which I would have normally deployed GADTs. This is the “Combining multiple sum types into a single sum type” use-case.

Read more...

OCaml gotchas

I spent some time fleshing out my count min sketch implementation for OCaml (to be the subject of another blog post), and along the way, I noticed a few more quirks about the OCaml language (from a Haskell viewpoint).

  • Unlike Haskell’s Int, which is 32-bit/64-bit, the built-in OCaml int type is only 31-bit/63-bit. Bit twiddlers beware! (There is a nativeint type which gives full machine precision, but it less efficient than the int type).

    Read more...