Generalizing APIs : ezyang’s blog

Generalizing APIs

Edit. ddarius pointed out to me that the type families examples were backwards, so I’ve flipped them to be the same as the functional dependencies.

Type functions can be used to do all sorts of neat type-level computation, but perhaps the most basic use is to allow the construction of generic APIs, instead of just relying on the fact that a module exports “mostly the same functions”. How much type trickery you need depends on properties of your API—perhaps most importantly, on the properties of your data types.

Suppose I have a single function on a single data type:

defaultInt :: Int

and I would like to generalize it. I can do so easily by creating a type class:

class Default a where
  def :: a

Abstraction on a single type usually requires nothing more than vanilla type classes.

Suppose I have a function on several data types:

data IntSet
insert :: IntSet -> Int -> IntSet
lookup :: IntSet -> Int -> Bool

We’d like to abstract over IntSet and Int. Since all of our functions mention both types, all we need to do is write a multiparameter type class:

class Set c e where
  insert :: c -> e -> c
  lookup :: c -> e -> Bool

instance Set IntSet Int where ...

If we’re unlucky, some of the functions will not use all of the data types:

empty :: IntSet

In which case, when we attempt to use the function, GHC will tell us it can’t figure out what instance to use:

No instance for (Set IntMap e)
  arising from a use of `empty'

One thing to do is to introduce a functional dependency between IntSet and Int. A dependency means something is depending on something else, so which type depends on what? We don’t have much choice here: since we’d like to support the function empty, which doesn’t mention Int anywhere in its signature, the dependency will have to go from IntSet to Int, that is, given a set (IntSet), I can tell you what it contains (an Int).:

class Set c e | c -> e where
  empty :: c
  insert :: c -> e -> c
  lookup :: c -> e -> Bool

Notice that this is still fundamentally a multiparameter type class, we’ve just given GHC a little hint on how to pick the right instance. We can also introduce a fundep in the other direction, if we need to allow a plain e. For pedagogical purposes, let’s assume that our boss really wants a “null” element, which is always a member of a Set and when inserted doesn’t do anything:

class Set c e | c -> e, e -> c where
  empty :: c
  null :: e
  insert :: c -> e -> c
  lookup :: c -> e -> Bool

Also notice that whenever we add a functional dependency, we preclude ourselves from offering an alternative instance. The following is illegal with the last typeclass for Set:

instance Set IntSet Int where ...
instance Set IntSet Int32 where ...
instance Set BetterIntSet Int where ...

This will report a “Functional dependencies conflict.”

Functional dependencies are somewhat maligned because they interact poorly with some other type features. An equivalent feature that was recently added to GHC is associated types (also known as type families or data families.)

Instead of telling GHC how automatically infer one type from the other (via the dependency), we create an explicit type family (also known as a type function) which provides the mapping:

class Set c where
  data Elem c :: *
  empty :: c
  null :: Elem c
  insert :: c -> Elem c -> c
  lookup :: c -> Elem c -> Bool

Notice that our typeclass is no longer multiparameter: it’s a little like as if we introduced a functional dependency from c -> e. But then, how does it know what the type of null should be? Easy: it makes you tell it:

instance Set IntSet where
  data Elem IntSet = IntContainer Int
  empty = emptyIntSet
  null = IntContainer 0

Notice on the right hand side of data is not a type: it’s a data constructor and then a type. The data constructor will let GHC know what instance of Elem to use.

In the original version of this article, I had defined the type class in the opposite direction:

class Key e where
  data Set e :: *
  empty :: Set e
  null :: e
  insert :: Set e -> e -> Set e
  lookup :: Set e -> e -> Bool

Our type function goes the other direction, and we can vary the implementation of the container based on what type is being used, which may not be one that we own. This is one primary use case of data families, but it’s not directly related to the question of generalizing APIs, so we leave it for now.

IntContainer looks a lot like a newtype, and in fact can be made one:

instance Set IntSet where
  newtype Elem IntSet = IntContainer Int

If you find wrapping and unwrapping newtypes annoying, in some circumstances you can just use a type synonym:

class Set c where
  type Elem c :: *

instance Set IntSet where
  type Elem IntSet = Int

However, this rules out some functions you might like to write, for example, automatically specializing your generic functions:

x :: Int
x = null

GHC will error:

Couldn't match expected type `Elem e'
       against inferred type `[Int]'
  NB: `Container' is a type function, and may not be injective

Since I could have also written:

instance Set BetterIntSet where
  type Elem BetterIntSet = Int

GHC doesn’t know which instance of Set to use for null: IntSet or BetterIntSet? You will need for this information to be transmitted to the compiler in another way, and if this happens completely under the hood, you’re a bit out of luck. This is a distinct difference from functional dependencies, which conflict if you have a non-injective relation.

Another method, if you have the luxury of defining your data type, is to define the data type inside the instance:

instance Set RecordMap where
  data Elem RecordMap = Record { field1 :: Int, field2 :: Bool }

However, notice that the type of the new Record is not Record; it’s Elem RecordMap. You might find a type synonym useful:

type Record = Elem RecordMap

There is not too much difference from the newtype method, except that we avoided adding an extra layer of wrapping and unwrapping.

In many cases, we would like to stipulate that a data type in our API has some type class:

instance Ord Int where ...

One low tech way to enforce this is add it to all of our function’s type signatures:

class Set c where
  data Elem c :: *
  empty :: c
  null :: Ord (Elem c) => Elem c
  insert :: Ord (Elem c) => c -> Elem c -> c
  lookup :: Ord (Elem c) => c -> Elem c -> Bool

But an even better way is to just add a class constraint on Set with flexible contexts:

class Ord (Elem c) => Set c where
  data Elem c :: *
  empty :: c
  null :: Elem c
  insert :: c -> Elem c -> c
  lookup :: c -> Elem c -> Bool

We can make functions and data types generic. Can we also make type classes generic?

class ToBloomFilter a where
  toBloomFilter :: a -> BloomFilter

Suppose that we decided that we want to allow multiple implementations of BloomFilter, but we would still like to give a unified API for converting things into whatever bloom filter you want.

Not directly, but we can fake it: just make a catch all generic type class and parametrize it on the parameters of the real type class:

class BloomFilter c where
  data Elem c :: *

class BloomFilter c => ToBloomFilter c a where
  toBloomFilter :: a -> c

Step back for a moment and compare the type signatures that functional dependencies and type families produce:

insertFunDeps :: Set c e => c -> e -> c
insertTypeFamilies :: Set c => c -> Elem c -> c

emptyFunDeps :: Set c e => c
emptyTypeFamilies :: Set c => c

So type families hide implementation details from the type signatures (you only use the associated types you need, as opposed to Set c e => c where the e is required but not used for anything—this is more obvious if you have twenty associated data types). However, they can be a bit more wordy when you need to introduce newtype wrappers for your associated data (Elem). Functional dependencies are great for automatically inferring other types without having to repeat yourself.

(Thanks Edward Kmett for pointing this out.)

What to do from here? We’ve only scratched the surface of type level programming, but for the purpose of generalizing APIs, this is essentially all you need to know! Find an API you’ve written that is duplicated across several modules, each of which provide different implementations. Figure out what functions and data types are the primitives. If you have many data types, apply the tricks described here to figure out how much type machinery you need. The go forth, and make thy API generic!

illissius says:

August 13, 2010 at 10:48 am

“How do we assert that Container a is a Monoid? We have to add it to all of our function signatures, unfortunately”

Actually, you can use a constraint on the class:

class Monoid (Container e) => Set e where
data Container e :: *
…

Yes, you can refer to Container like that even though it looks like it would be out of scope; and yes, GHC supports this. It’s equality constraints (Container e ~ Something) in that position which are as of yet unimplemented.

(I’m also not sure whether to call this “class constraints”, “superclass constraints”, or what, but that’s just terminology.)

Edward Z. Yang says:

August 13, 2010 at 10:57 am

Oh man, that’s awesome! I’ll have to update my post.

Anonymous says:

August 13, 2010 at 11:02 am

This works when I test it locally (with FlexibleContexts, and FlexibleInstances if you want to give an instance):

class Monoid (Container e) => Set e where
data Container e :: *

Does this lead to problems?

August 13, 2010 at 11:27 am

Hmm, I see one possible problem: if you want to use a higher rank class constraint (for example, if you generalized a rank-2 monad), that syntax is not expressive enough. :-(

JCAB says:

August 13, 2010 at 4:28 pm

—
class Elem c => GenericMonoid c where
mappend :: Elem c -> Elem c -> Elem c
—

Wouldn’t that be “Set c =>”?

Maybe I just don’t understand what you mean in that section of the article…

August 13, 2010 at 4:36 pm

Hey JCAB, you’re right. I rewrote that section, hopefully it makes more sense now.

Towards generic APIs for the open world « Developers.BlogNotions - Thoughts from Industry Experts says:

October 25, 2010 at 2:17 pm

[…] this sense subtype polymorphism is closed. This post is inspired in part by the excellent article Generalizing APIs by Edward Z. Yang. For this post I will use Scala, my current language of choice for most of the […]