Python : ezyang's blog

Python

Idiomatic algebraic data types in Python with dataclasses and Union October 14, 2020

Greetings from 2024! An official pattern matching PEP has been accepted https://peps.python.org/pep-0636/ and is available in Python 3.10. Class patterns are tested using isinstance, with no inheritance structure necessary, making the pattern described in this post 100% forward compatible to real pattern matching.

One of the features I miss most in non-Haskell programming languages is algebraic data types (ADT). ADTs fulfill a similar role to objects in other languages, but with more restrictions: objects are an open universe, where clients can implement new subclasses that were not known at definition time; ADTs are a closed universe, where the definition of an ADT specifies precisely all the cases that are possible. We often think of restrictions of a bad thing, but in the case of ADTs, the restriction of being a closed universe makes programs easier to understand (a fixed set of cases to understand, as opposed to a potentially infinite set of cases) and allows for new modes of expression (pattern matching). ADTs make it really easy to accurately model your data structures; they encourage you to go for precise types that make illegal states unrepresentable. Still, it is generally not a good idea to try to manually reimplement your favorite Haskell language feature in every other programming language you use, and so for years I’ve suffered in Python under the impression that ADTs were a no go.

Rapidly prototyping scripts in Haskell October 18, 2010

I’ve been having some vicious fun over the weekend hacking up a little tool called MMR Hammer in Haskell. I won’t bore you with the vagaries of multimaster replication with Fedora Directory Server; instead, I want to talk about rapidly prototyping scripts in Haskell—programs that are characterized by a low amount of computation and a high amount of IO. Using this script as a case study, I’ll describe how I approached the problem, what was easy to do and what took a little more coaxing. In particular, my main arguments are:

Keyword arguments in Haskell September 13, 2010

Keyword arguments are generally considered a good thing by language designers: positional arguments are prone to errors of transposition, and it’s absolutely no fun trying to guess what the 37 that is the third argument of a function actually means. Python is one language that makes extensive use of keyword arguments; they have the following properties:

Functions are permitted to be a mix of positional and keyword arguments (a nod to the compactness of positional arguments),
Keywords are local to any given function; you can reuse a named function argument for another function,
In Python 3.0, you can force certain arguments to only be specifiable with a keyword.

Does Haskell have keyword arguments? In many ways, they’re much less necessary due to the static type system: if you accidentally interpose an Int and Bool your compiler will let you know about it. The type signature guides you!

Bug boogie: Git and symlinks June 2, 2010

Git is very careful about your files: unless you tell it to be explicitly destructive, it will refuse to write over files that it doesn’t know about, instead giving an error like:

Untracked working tree file ‘foobar’ would be overwritten by merge.

In my work with Wizard, I frequently need to perform merges on working copies that have been, well, “less than well maintained”, e.g. they untarred a new version of the the source tree on the old directory and forgot to add the newly added files. When Wizard goes in and tries to automatically upgrade them to the new version the proper way, this results in all sorts of untracked working tree file complaints, and then we have to go and manually check on the untracked files and remove them once they’re fine.

Mutation sleuthing in Python March 19, 2010

Python is a language that gives you a lot of rope, in particular any particular encapsulation scheme is only weakly enforced and can be worked around by a sufficiently savvy hacker. I fall into the “my compiler should stop me from doing stupid things” camp, but I’ll certainly say, dynamic capabilities sure are convenient. But here’s the rub: the language must show you where you have done something stupid.

In this case, we’d like to see when you have improperly gone and mutated some internal state. You might scoff and say, “well, I know when I change my state”, but this is certainly not the case when you’re debugging an interaction between two third party libraries that you did not write. Specifically I should be able to point at a variable (it might be a local variable, a global variable, or a class/instance attribute) and say to Python, “tell me when this variable changes.” When the variable changes, Python should tell me who changed the variable (via a backtrace) and what the variable changed to. I should be able to say, “tell me when this variable changed to this value.”

Writing generator friendly code March 1, 2010

I’ve come a long ways from complaining to the html5lib list that the Python version gratuitously used generators, making it hard to port to PHP. Having now drunk the laziness kool-aid in Haskell, I enjoy trying to make my code fit the generator idiom. While Python generators have notable downsides compared to infinite lazy lists (for example, forking them for multiple use is nontrivial), they’re pretty nice.

Unfortunately, the majority of code I see that expects to see lists isn’t robust enough to accept generators too, and it breaks my heart when I have to say list(generator). I’ll forgive you if you’re expecting O(1) accesses of arbitrary indexes in your internal code, but all too often I see code that only needs sequential access, only to botch it all up by calling len(). Duck typing won’t save you there.