A taste of Cabalized Backpack
Update. Want to know more about Backpack? Read the specification
So perhaps you've bought into modules and modularity and want to get to using Backpack straightaway. How can you do it? In this blog post, I want to give a tutorial-style taste of how to program Cabal in the Backpack style. These examples are executable, but you'll have to build custom versions of GHC and Cabal to build them. Comments and suggestions would be much appreciated; while the design here is theoretically well-founded, for obvious reasons, we don't have much on-the-ground programmer feedback yet.
A simple package in today's Cabal
To start, let's briefly review how Haskell modules and Cabal packages work today. Our running example will be the bytestring package, although I'll inline, simplify and omit definitions to enhance clarity.
Let's suppose that you are writing a library, and you want to use efficient, packed strings for some binary processing you are doing. Fortunately for you, the venerable Don Stewart has already written a bytestring package which implements this functionality for you. This package consists of a few modules: an implementation of strict ByteStrings...
module Data.ByteString(ByteString, empty, singleton, ...) where data ByteString = PS !(ForeignPtr Word8) !Int !Int empty :: ByteString empty = PS nullForeignPtr 0 0 -- ...
...and an implementation of lazy ByteStrings:
module Data.ByteString.Lazy(ByteString, empty, singleton, ...) where data ByteString = Empty | Chunk !S.ByteString ByteString empty :: ByteString empty = Empty -- ...
These modules are packaged up into a package which is specified using a Cabal file:
name: bytestring version: 0.10.4.0 library build-depends: base >= 4.2 && < 5, ghc-prim, deepseq exposed-modules: Data.ByteString, Data.ByteString.Lazy, ... other-modules: ...
We can then make a simple module and package which depends on the bytestring package:
module Utils where import Data.ByteString.Lazy as B blank :: IO () blank = B.putStr B.empty
name: utilities version: 0.1 library build-depends: base, bytestring >= 0.10 exposed-modules: Utils
It's worth noting a few things about this completely standard module setup:
- It's not possible to switch Utils from using lazy ByteStrings to strict ByteStrings without literally editing the Utils module. And even if you do that, you can't have Utils depending on strict ByteString, and Utils depending on lazy ByteString, in the same program, without copying the entire module text. (This is not too surprising, since the code really is different.)
- Nevertheless, there is some amount of indirection here: while Utils includes a specific ByteString module, it is unspecified which version of ByteString it will be. If (hypothetically) the bytestring library released a new version where lazy byte-strings were actually strict, the functionality of Utils would change accordingly when the user re-ran dependency resolution.
- I used a qualified import to refer to identifiers in Data.ByteString.Lazy. This is a pretty common pattern when developing Haskell code: we think of B as an alias to the actual model. Textually, this is also helpful, because it means I only have to edit the import statement to change which ByteString I refer to.
Generalizing Utils with a signature
To generalize Utils with some Backpack magic, we need to create a signature for ByteString, which specifies what the interface of the module providing ByteStrings is. Here one such signature, which is placed in the file Data/ByteString.hsig inside the utilities package:
signature Data.ByteString where import Data.Word data ByteString instance Eq ByteString empty :: ByteString singleton :: Word8 -> ByteString putStr :: ByteString -> IO ()
The format of a signature is essentially the same of that of an hs-boot file: we have normal Haskell declarations, but omitting the actual implementations of values.
The utilities package now needs a new field to record signatures:
name: utilities library build-depends: base exposed-modules: Utils signatures: Data.ByteString
Notice that there have been three changes: (1) We've removed the direct dependency on the bytestring package, and (2) we have a new field signatures which simply lists the names of the signature files (also known as holes) that we need filled in.
How do we actually use the utilities package, then? Let's suppose our goal is to produce a new module, Utils.Strict, which is Utils but using strict ByteStrings (which is exported by the bytestring package under the module name Data.ByteString). To do this, we'll need to create a new package:
name: strict-utilities library build-depends: utilities, bytestring reexported-modules: Utils as Utils.Strict
That's it! strict-utilities exports a single module Utils.Strict which is utilities using Data.ByteString from bytestring (which is the strict implementation). This is called a mix-in: in the same dependency list, we simply mix together:
- utilities, which requires a module named Data.ByteString, and
- bytestring, which supplies a module named Data.ByteString.
Cabal automatically figures out that how to instantiate the utilities package by matching together module names. Specifically, the two packages above are connected through the module name Data.ByteString. This makes for a very convenient (and as it turns out, expressive) mode of package instantiation. By the way, reexported-modules is a new (orthogonal) feature which lets us reexport a module from the current package or a dependency to the outside world under a different name. The modules that are exported by the package are the exposed-modules and the reexported-modules. The reason we distinguish them is to make clear which modules have source code in the package (exposed-modules).
Unusually, strict-utilities is a package that contains no code! Its sole purpose is to mix existing packages.
Now, you might be wondering: how do we instantiate utilities with the lazy ByteString implementation? That implementation was put in Data.ByteString.Lazy, so the names don't match up. In this case, we can use another new feature, module thinning and renaming:
name: lazy-utilities library build-depends: utilities, bytestring backpack-includes: bytestring (Data.ByteString.Lazy as Data.ByteString) reexported-modules: Utils as Utils.Lazy
The new backpack-includes field says that only the Data.ByteString.Lazy module should brought into scope, under the name Data.ByteString. This is sufficient to mix in link utilities with the lazy implementation of ByteString.
An interesting duality is that you can do the renaming the other way:
name: lazy-utilities library build-depends: utilities (Utils, Data.ByteString as Data.ByteString.Lazy), bytestring
Instead of renaming the implementation, I renamed the hole! It's equivalent: the thing that matters it that the signature and implementation need to be mixed under the same name in order for linking (the instantiation of the signature with the implementation) to occur.
There are a few things to note about signature usage:
If you are using a signature, there's not much point in also specifying an explicit import list when you import it: you are guaranteed to only see types and definitions that are in the signature (modulo type classes... a topic for another day). Signature files act like a type-safe import list which you can share across modules.
A signature can, and indeed often must, import other modules. In the type signature for singleton in Data/ByteString.hsig, we needed to refer to a type Word8, so we must bring it into scope by importing Data.Word.
Now, when we compile the signature in the utilities package, we need to know where Data.Word came from. It could have come from another signature, but in this case, it's provided by the definite package base: it's a proper concrete module with an implementation! Signatures can depend on implementations: since we can only refer to types from those modules, we are saying, in effect: any implementation of the singleton function and any representation of the ByteString type is acceptable, but regarding Word8 you must use the specific type from Data.Word in prelude.
What happens if, independently of my packages strict-utilities, someone else also instantiatiates utilities with Data.ByteString? Backpack is clever enough to reuse the instantiation of utilities: this property is called applicativity of the module system. The specific rule that we use to decide if the instantiation is the same is to look at how all of the holes needed by a package are instantiated, and if they are instantiated with precisely the same modules, the instantiated packages are considered type equal. So there is no need to actually create strict-utilities or lazy-utilities: you can just instantiate utilities on the fly.
Mini-quiz: What does this package do?
name: quiz-utilities library build-depends: utilities (Utils, Data.ByteString as B), bytestring (Data.ByteString.Lazy as B)
Sharing signatures
It's all very nice to be able to explicitly write a signature for Data.ByteString in my package, but this could get old if I have to do this for every single package I depend on. It would be much nicer if I could just put all my signatures in a package and include that when I want to share it. I want all of the Hackage mechanisms to apply to my signatures as well as my normal packages (e.g. versioning). Well, you can!
The author of bytestring can write a bytestring-sig package which contains only signatures:
name: bytestring-sig version: 1.0 library build-depends: base signatures: Data.ByteString
Now, utilities can include this package to indicate its dependence on the signature:
name: utilities library build-depends: base, bytestring-sig-1.0 exposed-modules: Utils
Unlike normal dependencies, signature dependencies should be exact: after all, while you might want an upgraded implementation, you don't want the signature to change on you!
We can summarize all of the fields as follows:
- exposed-modules says that there is a public module defined in this package
2. other-modules says that there is a private module defined in this package 4. signatures says that there is a public signature defined in this package (there are no private signatures; they are always public, because a signature always must be implemented) 5. reexported-modules says that there is a public module or signature defined in a dependency.
In this list, public means that it is available to clients. Notice the first four fields list all of the source code in this package. Here is a simple example of a client:
name: utilities-extras library build-depends: utilities exposed-modules: Utils.Extra
Summary
We've covered a lot of ground, but when it comes down to it, Backpack really comes together because of set of orthogonal features which interact in a good way:
- Module signatures: the heart of a module system, giving us the ability to write indefinite packages and mix together implementations,
- Module reexports: the ability to take locally available modules and reexport them under a different name, and
- Module thinning and renaming : the ability to selectively make available modules from a dependency.
To compile a Backpack package, we first run the traditional version dependency solving, getting exact versions for all packages involved, and then we calculate how to link the packages together. That's it! In a future blog post, I plan to more comprehensively describe the semantics of these new features, especially module signatures, which can be subtle at times.