So perhaps you've bought into modules and modularity and want to get to using Backpack straightaway. How can you do it? In this blog post, I want to give a tutorial-style taste of how to program Cabal in the Backpack style. None of these examples are executable, because only some of this system is in GHC HEAD--the rest are on branches awaiting code review or complete vaporware. However, we've got a pretty good idea how the overall design and user experience should go, and so the purpose of this blog post is to communicate that idea. Comments and suggestions would be much appreciated; while the design here is theoretically well-founded, for obvious reasons, we don't have much on-the-ground programmer feedback yet.
A simple package in today's Cabal
To start, let's briefly review how Haskell modules and Cabal packages work today. Our running example will be the bytestring package, although I'll inline, simplify and omit definitions to enhance clarity.
Let's suppose that you are writing a library, and you want to use efficient, packed strings for some binary processing you are doing. Fortunately for you, the venerable Don Stewart has already written a bytestring package which implements this functionality for you. This package consists of a few modules: an implementation of strict ByteStrings...
module Data.ByteString(ByteString, empty, singleton, ...) where data ByteString = PS !(ForeignPtr Word8) !Int !Int empty :: ByteString empty = PS nullForeignPtr 0 0 ...
...and an implementation of lazy ByteStrings:
module Data.ByteString.Lazy(ByteString, empty, singleton, ...) where data ByteString = Empty | Chunk !S.ByteString ByteString empty :: ByteString empty = Empty ...
These modules are packaged up into a package which is specified using a Cabal file (for now, we'll ignore the ability to define libraries/executables in the same Cabal file and assume everything is in a library):
name: bytestring version: 0.10.4.0 build-depends: base >= 4.2 && < 5, ghc-prim, deepseq exposed-modules: Data.ByteString, Data.ByteString.Lazy, ... other-modules: ...
We can then make a simple module and package which depends on the bytestring package:
module Utils where import Data.ByteString.Lazy as B blank :: IO () blank = B.putStr B.empty
name: utilities version: 0.1 build-depends: base, bytestring >= 0.10 exposed-modules: Utils
It's worth noting a few things about this completely standard module setup:
- It's not possible to switch Utils from using lazy ByteStrings to strict ByteStrings without literally editing the Utils module. And even if you do that, you can't have Utils depending on strict ByteString, and Utils depending on lazy ByteString, in the same program, without copying the entire module text. (This is not too surprising, since the code really is different.)
- Nevertheless, there is some amount of indirection here: while Utils includes a specific ByteString module, it is unspecified which version of ByteString it will be. If (hypothetically) the bytestring library released a new version where lazy byte-strings were actually strict, the functionality of Utils would change accordingly when the user re-ran dependency resolution.
- I used a qualified import to refer to identifiers in Data.ByteString.Lazy. This is a pretty common pattern when developing Haskell code: we think of B as an alias to the actual model. Textually, this is also helpful, because it means I only have to edit the import statement to change which ByteString I refer to.
Generalizing Utils with a signature
To generalize Utils with some Backpack magic, we need to create a signature for ByteString, which specifies what the interface of the module providing ByteStrings is. Here one such signature, which is placed in the file Data/ByteString.hsig inside the utilities package:
module Data.ByteString where import Data.Word data ByteString instance Eq ByteString empty :: ByteString singleton :: Word8 -> ByteString putStr :: ByteString -> IO ()
The format of a signature is essentially the same of that of an hs-boot file: we have normal Haskell declarations, but omitting the actual implementations of values.
The utilities package now needs a new field to record signatures:
name: utilities indefinite: True build-depends: base exposed-modules: Utils required-signatures: Data.ByteString
Notice that there have been three changes: (1) We've removed the direct dependency on the bytestring package, (2) we've added a new field indefinite, which indicates that this indefinite package has signatures and cannot be compiled until those signatures are filled in with implementations (this field is strictly redundant, but is useful for documentation purposes, as we will see later), and (3) we have a new field required-signatures which simply lists the names of the signature files (also known as holes) that we need filled in.
How do we actually use the utilities package, then? Let's suppose our goal is to produce a new module, Utils.Strict, which is Utils but using strict ByteStrings (which is exported by the bytestring package under the module name Data.ByteString). To do this, we'll need to create a new package:
name: strict-utilities build-depends: utilities, bytestring reexported-modules: Utils as Utils.Strict
That's it! strict-utilities exports a single module Utils.Strict which is utilities using Data.ByteString from bytestring (which is the strict implementation). This is called a mix-in: in the same dependency list, we simply mix together:
- utilities, which requires a module named Data.ByteString, and
- bytestring, which supplies a module named Data.ByteString.
Cabal automatically figures out that how to instantiate the utilities package by matching together module names. Specifically, the two packages above are connected through the module name Data.ByteString. This makes for a very convenient (and as it turns out, expressive) mode of package instantiation. By the way, reexported-modules is a new (orthogonal) feature which lets us reexport a module from the current package or a dependency to the outside world under a different name. The modules that are exported by the package are the exposed-modules and the reexported-modules. The reason we distinguish them is to make clear which modules have source code in the package (exposed-modules).
Unusually, strict-utilities is a package that contains no code! Its sole purpose is to mix existing packages.
Now, you might be wondering: how do we instantiate utilities with the lazy ByteString implementation? That implementation was put in Data.ByteString.Lazy, so the names don't match up. In this case, we can use another new feature, module thinning and renaming:
name: lazy-utilities build-depends: utilities, bytestring (Data.ByteString.Lazy as Data.ByteString) reexported-modules: Utils as Utils.Lazy
The utilities dependency is business as usual, but bytestring has a little parenthesized expression next to it. This expression is the thinning and renaming applied to the package import: it controls what modules are brought into the scope of the current package from a dependency, possibly renaming them to different names. When I write build-depends: bytestring (Data.ByteString.Lazy as Data.ByteString), I am saying "I depend on the bytestring package, but please only make the Data.ByteString.Lazy module available under the name Data.ByteString when considering module imports, and ignore all the other exposed modules." In strict-utilities, you could have also written bytestring (Data.ByteString), because this is the only module that utilities uses from bytestring.
An interesting duality is that you can do the renaming the other way:
name: lazy-utilities build-depends: utilities (Utils, Data.ByteString as Data.ByteString.Lazy), bytestring
Instead of renaming the implementation, I renamed the hole! It's equivalent: the thing that matters it that the signature and implementation need to be mixed under the same name in order for linking (the instantiation of the signature with the implementation) to occur.
There are a few things to note about signature usage:
If you are using a signature, there's not much point in also specifying an explicit import list when you import it: you are guaranteed to only see types and definitions that are in the signature (modulo type classes... a topic for another day). Signature files act like a type-safe import list which you can share across modules.
A signature can, and indeed often must, import other modules. In the type signature for singleton in Data/ByteString.hsig, we needed to refer to a type Word8, so we must bring it into scope by importing Data.Word.
Now, when we compile the signature in the utilities package, we need to know where Data.Word came from. It could have come from another signature, but in this case, it's provided by the definite package base: it's a proper concrete module with an implementation! Signatures can depend on implementations: since we can only refer to types from those modules, we are saying, in effect: any implementation of the singleton function and any representation of the ByteString type is acceptable, but regarding Word8 you must use the specific type from Data.Word in prelude.
What happens if, independently of my packages strict-utilities, someone else also instantiatiates utilities with Data.ByteString? Backpack is clever enough to reuse the instantiation of utilities: this property is called applicativity of the module system. The specific rule that we use to decide if the instantiation is the same is to look at how all of the holes needed by a package are instantiated, and if they are instantiated with precisely the same modules, the instantiated packages are considered type equal. So there is no need to actually create strict-utilities or lazy-utilities: you can just instantiate utilities on the fly.
Mini-quiz: What does this package do?
name: quiz-utilities build-depends: utilities (Utils, Data.ByteString as B), bytestring (Data.ByteString.Lazy as B)
We've covered a lot of ground, but when it comes down to it, Backpack really comes together because of set of orthogonal features which interact in a good way:
- Module signatures (mostly implemented but needs lots of testing): the heart of a module system, giving us the ability to write indefinite packages and mix together implementations,
- Module reexports (fully implemented and in HEAD): the ability to take locally available modules and reexport them under a different name, and
- Module thinning and renaming (fully implemented and in code review): the ability to selectively make available modules from a dependency.
To compile a Backpack package, we first run the traditional version dependency solving, getting exact versions for all packages involved, and then we calculate how to link the packages together. That's it! In a future blog post, I plan to more comprehensively describe the semantics of these new features, especially module signatures, which can be subtle at times. Also, note that I've said nothing about how to type-check against just a signature, without having any implementation in mind. As of right now, this functionality is vaporware; in a future blog post, I also plan on saying why this is so challenging.