Years ago, Nadav Rotem related to me this story about why basic block procedures in Swift are not as good as they seem. Nelson Elhage reminded me about this on Twitter and so I thought this should be put into the public record.
Basic block procedures make certain optimizations more difficult. Consider this program:
block j3 (%y1, %y2) { ... }
block j1 () { jump j3(%x1, %x2) }
block j2 () { jump j3(%x3, %x4) }
Is this program easier or more difficult to optimize than the traditional SSA with phi-nodes formulation?
Read more...
Greetings from 2024! An official pattern matching PEP has been accepted https://peps.python.org/pep-0636/ and is available in Python 3.10. Class patterns are tested using isinstance, with no inheritance structure necessary, making the pattern described in this post 100% forward compatible to real pattern matching.
One of the features I miss most in non-Haskell programming languages is algebraic data types (ADT). ADTs fulfill a similar role to objects in other languages, but with more restrictions: objects are an open universe, where clients can implement new subclasses that were not known at definition time; ADTs are a closed universe, where the definition of an ADT specifies precisely all the cases that are possible. We often think of restrictions of a bad thing, but in the case of ADTs, the restriction of being a closed universe makes programs easier to understand (a fixed set of cases to understand, as opposed to a potentially infinite set of cases) and allows for new modes of expression (pattern matching). ADTs make it really easy to accurately model your data structures; they encourage you to go for precise types that make illegal states unrepresentable. Still, it is generally not a good idea to try to manually reimplement your favorite Haskell language feature in every other programming language you use, and so for years I’ve suffered in Python under the impression that ADTs were a no go.
Read more...

If this is your first time reading about PyTorch internals, you might want to check out my PyTorch internals post first. In this post, I want to talk about one particular part of PyTorch’s internals: the dispatcher. At a first glance, the dispatcher is just a glorified if statement: based on some information about the tensor inputs, decide what piece of code should be called. So why should we care about the dispatcher?
Read more...
For the longest time, I thought of implicit parameters and dynamic scoping were basically the same thing, since they both can be used to solve similar problems (e.g., the so called “configuration problem” where you need to plumb down some configuration deep into a nested body of function definitions without defining them all explicitly). But implicit parameters have a reputation of being something you shouldn’t use (use reflection instead), whereas dynamic scoping via the reader monad is a useful and well understood construct (except for the bit where you have to monadify everything). Why the difference?
Read more...
I’ve recently been working on a revamp of how we specify tensor shape formulas in PyTorch. As part of this process, I classified every single operator in PyTorch by its shaping behavior; yes, that’s all 1364 of them (this includes each variant of an operator; e.g., inplace and out= keyword variants). During the process, I tried to come up with categories to help classify what operators did. One of the surprises from the process was discovering that shaping behaviors that I previously thought were uncommon, actually showed up a bit more often than one might have expected.
Read more...
vmap is an interface popularized by JAX which offers you a vectorizing map. Semantically, a vmap is exactly equivalent to a map in Haskell; the key difference is that operations run under a vmap are vectorized. If you map a convolution and a matrix multiply, you will have one big loop which repeatedly calls convolution and matrix multiply for each entry in your batch. If you vmap a convolution and matrix multiply, you’ll call the batched versions of convolution and matrix multiply once. Unless you have a fuser, on most modern deep learning frameworks, calling the batched implementations of these operations will be much faster.
Read more...
This post is a long form essay version of a talk about PyTorch internals, that I gave at the PyTorch NYC meetup on May 14, 2019.

Hi everyone! Today I want to talk about the internals of PyTorch.

This talk is for those of you who have used PyTorch, and thought to yourself, “It would be great if I could contribute to PyTorch,” but were scared by PyTorch’s behemoth of a C++ codebase. I’m not going to lie: the PyTorch codebase can be a bit overwhelming at times. The purpose of this talk is to put a map in your hands: to tell you about the basic conceptual structure of a “tensor library that supports automatic differentiation”, and give you some tools and tricks for finding your way around the codebase. I’m going to assume that you’ve written some PyTorch before, but haven’t necessarily delved deeper into how a machine learning library is written.
Read more...
Some notes collected from a close read of Conal Elliot’s Compiling to Categories and The Simple Essence of Automatic Differentiation.
A colleague of mine was trying to define a “tree structure” of tensors, with the hope of thereby generalizing the concept to also work with tensors that have “ragged dimensions.” Let’s take a look:
Suppose we have a (2, 3) matrix:
tensor([[1, 2, 3],
[4, 5, 6]])
One way to think about this is that we have a “tree” of some sort, where the root of the tree branches to two subnodes, and then each subnode branches to three nodes:
Read more...
Long time readers of mine may be aware that I used a ThinkPad X61T for the past decade. After the hinge on my second instance of the machine, I decided it was finally time to get a new laptop. And I had one particular model on my eye, after Simon Peyton Jones showed me his new laptop at the last Haskell Implementor’s Workshop: the Microsoft Surface Book 2. It fits my primary requirement for a laptop: it’s a convertible laptop into tablet mode with a digitizer pen. The pen is not Wacom branded but it has an eraser end and can magnetically attach to the laptop (no enclosure for the pen, but I think that for modern hardware that constraint is unsatisfiable.) Furthermore, there is a Linux enthusiast community around the device, which made me feel that it would be more likely I could get Linux to work. So a few weeks ago, I took the plunge, and laid down three grand for my own copy. It has worked out well, but in the classic Linux style, not without a little bit of elbow grease.
Read more...
My name is Rahul Muttineni, CTO of TypeLead, working on building services around a language named Eta. To get started, I'll give an overview of how the project started, and where it is now.
It started as a HSOC project. It was called GHCVM; back then we had plans of making it both on JVM and CLR... we don't think about CLR anymore. I was mentored by Edward Kmett. We got pretty good response on this, so Jo and I decided to take the risk and work on this full time.
Read more...