Idiomatic algebraic data types in Python with dataclasses and Union
October 14, 2020Greetings from 2024! An official pattern matching PEP has been accepted https://peps.python.org/pep-0636/ and is available in Python 3.10. Class patterns are tested using isinstance, with no inheritance structure necessary, making the pattern described in this post 100% forward compatible to real pattern matching.
One of the features I miss most in non-Haskell programming languages is algebraic data types (ADT). ADTs fulfill a similar role to objects in other languages, but with more restrictions: objects are an open universe, where clients can implement new subclasses that were not known at definition time; ADTs are a closed universe, where the definition of an ADT specifies precisely all the cases that are possible. We often think of restrictions of a bad thing, but in the case of ADTs, the restriction of being a closed universe makes programs easier to understand (a fixed set of cases to understand, as opposed to a potentially infinite set of cases) and allows for new modes of expression (pattern matching). ADTs make it really easy to accurately model your data structures; they encourage you to go for precise types that make illegal states unrepresentable. Still, it is generally not a good idea to try to manually reimplement your favorite Haskell language feature in every other programming language you use, and so for years I’ve suffered in Python under the impression that ADTs were a no go.
Recently, however, I have noticed that a number of new features in Python 3 have made it possible to use objects in the same style of ADTs, in idiomatic Python with virtually no boilerplate. The key features:
- A structural static type checking system with mypy; in particular, the ability to declare
Uniontypes, which let you represent values that could be one of a fixed set of other types, and the ability to refine the type of a variable by performing anisinstancecheck on it. - The dataclasses library, which allows you to conveniently define (possibly immutable) structures of data without having to write boilerplate for the constructor.
The key idea: define each constructor as a dataclass, put the constructors together into an ADT using a Union type, and use isinstance tests to do pattern matching on the result. The result is just as good as an ADT (or better, perhaps; their structural nature bears more similarity to OCaml’s polymorphic variants).
Here’s how it works. Let’s suppose that you want to define an algebraic data type with two results:
data Result
= OK Int
| Failure String
showResult :: Result -> String
showResult (OK result) = show result
showResult (Failure msg) = "Failure: " ++ msg
First, we define each constructor as a dataclass:
from dataclasses import dataclass
@dataclass(frozen=True)
class OK:
result: int
@dataclass(frozen=True)
class Failure:
msg: str
Using the automatically generated constructors from dataclasses, we can construct values of these dataclasses using OK(2) or Failure("something wrong"). Next, we define a type synonym for the union of these two classes:
Result = Union[OK, Failure]
Finally, we can do pattern matching on Result by doing isinstance tests:
def assert_never(x: NoReturn) -> NoReturn:
raise AssertionError("Unhandled type: {}".format(type(x).__name__))
def showResult(r: Result) -> str:
if isinstance(r, OK):
return str(r.result)
elif isinstance(r, Failure):
return "Failure: " + r.msg
else:
assert_never(r)
assert_never is a well known trick for doing exhaustiveness checking in mypy. If we haven’t covered all cases with enough isinstance checks, mypy will complain that assert_never was given a type like UnhandledCtor when it expected NoReturn (which is the uninhabited type in Python).
That’s all there is to it. As an extra bonus, this style of writing unions is compatible with the structured pattern matching PEP, if it actually gets accepted. I’ve been using this pattern to good effect in our recent rewrite of PyTorch’s code generator. If you have the opportunity to work in a statically typed Python codebase, give this style of code a try!
An alternative to explicit pattern matching is to use functools.singledispatch, e.g.:
@singledispatch def showResult(r: NoReturn) -> NoReturn: raise AssertionError(“Unhandled type: {}".format(type(x).name))
@showResult.register def show_ok(r: OK) -> str: return str(r.result)
@showResult.register def show_failure(r: Failure) -> str: return “Failure: " + r.msg
Except that you want Result to be polymorphic. If you define Ok for int, what will you do for all other types? Maybe some kind of magic with a TypeVar is possible? Moreover you cannot do isinstance(OK(5), Result), try it.
Now about this sentence: “The result is just as good as an ADT (or better, perhaps; their structural nature bears more similarity to OCaml’s polymorphic variants).” You need to study.
Yes, but in a fully statically typed program, you shouldn’t need to do so, since you should statically know from context that something is a Result, without having to refine it manually with an isinstance test. (So for example, don’t do something like
Union[Result, Result2], you want a tagged union for this case.)One thing that mypy-style type checking can’t do for you is type-driven metaprogramming, ala type classes, which is honestly pretty useful. But there are other ways (e.g., object oriented programming) to get what you want in a language like Python.
instanceofis the so-called visitor pattern.> Moreover you cannot do isinstance(OK(5), Result), try it.
In Python 3.10, you will be able to write the Union using a new syntax with ‘|’ which will also be supported by isinstance:
Result = OK | Failure
isinstance(OK(3), Result) => True
See PEP 604 for more on this.
Very interesting, assert_never is what I was looking for!
What about serialization, let’s say in JSON? There is not “tag” to include, and including the class name seems not clean (as the class name may be an implementation detail).
thanks