ezyang’s blog

the arc of software bends towards understanding

Semantic Import Versioning in the wild

The best and worst thing about semantic import versioning is that it makes BC-breaking changes hard.

In the past few days, Russ Cox has made a splash in a series of white papers describing Go and Versioning. In them, he coins a new term, Semantic Import Versioning, distilling it to the following principle:

If an old package and a new package have the same import path, the new package must be backwards compatible with the old package.

I am very happy Russ has come up with a good name for semantic import versioning, because this concept has been out there for quite a long time, but without a concise name or formulation of its the design. In fact, I would even say that semantic import versioning is inevitable when you take on the premise that you will never break user code. It is so inevitable, that semantic import versioning is already practiced in the wild in a variety of places. Here are a few examples:

  • REST APIs often are versioned with explicit version numbers in the request (e.g., in the URI) to let clients specify what version of the API they want. If a client wishes to upgrade to a new version of the API, they must rewrite their API requests to a new URL. REST APIs are forced to semantic import versioning because the traditional mechanism for avoiding breakage, version bounds, are unavailable in this setting.
  • Stripe's REST API pins each of their customers to the version of their API at the time they subscribed; even if Stripe makes a BC-breaking change in the future, the API for a given customer never changes. In this case, the semantic import is still there, but it is implicit (associated with a customer account) rather than explicit (in the client code); consequently, Stripe is willing to break BC a lot more frequently than would otherwise be acceptable for a REST API. Stripe's blog post points out a very important aspect of maintaining libraries under semantic import versioning, which is that you need to put in the engineering effort to sustainably manage all of the semantic imports available to users.
  • Semantic import versioning is widely practiced in programming languages, in the form of language standards/epochs. In C++, the setting of -std=c++xx specifies a particular semantic version to be "imported". It would be unheard of for a compiler to unilaterally break backwards compatibility of -std=c++11 in a new revision of the compiler; similarly, a user must explicitly migrate to a new language standard to take advantage of any new features. Rust epochs have a similar tenor. The choice between Python 2 and Python 3 is another form of semantic import versioning.
  • Semantic imports don't have to just specify a number. Feature flags, such as {-# LANGUAGE #-} pragmas in GHC Haskell, let users opt into BC-breaking changes at their use-sites.
  • In the deep learning world, ONNX models declare a semantic import to a particular version of an operator set. Operator semantics can evolve in BC-compatible ways without bumping the version, but to take a BC-breaking change, you must update the import statement.

One insight I draw from these examples is that what we call an "import version" is really a specification for some series of implementations. To someone who has spent a lot of time thinking about module systems, this is really a step in the right direction: program against interfaces, not implementations.

Another thing we can observe from these examples are the real world consequences of semantic import versioning. One particular effect stands out: semantic import versioning is challenging for maintainers, because it pressures them to maintain multiple major release branches simultaneously (after all, who wants to use pkg/v2 only to have it immediately unmaintained when pkg/v3 comes out). In the traditional release branch model, where one creates a release branch for each major version, only the most well-staffed software development teams can afford to maintain multiple, active release branches (backporting patches is a lot of work!) The friction involved with managing multiple implementations means that less well staffed projects will have a strong pressure to never break backwards compatibility.

This may not sound like a such a bad thing to the "don't break my stuff" grumps in the audience, but a lot of bugs and security problems have stemmed from being literally unable to outlaw harmful and dangerous APIs with BC-breaking changes. The danger of moving the calculus further towards preserving backwards compatibility is a further entrenchment of bad "first try" APIs. So while I do not deny that a genius of Russ's framing is to describe semantic versioning as part of the package path, it also sets up a bad expectation for the feasibility of BC-breaking changes, when what we should be doing is improving the state of tooling so that making a BC-breaking change is "no big deal." To me, the most promising way to reduce the friction of a BC-breaking change is to organize your software development so that a single codebase, under a single build, implements multiple specifications (v1, v2 and v3). As we saw from the examples, compilers can manage this (GCC supports multiple C++ versions), but traditional programming languages make it hard for libraries to do the same thing.

I don't now exactly how to solve this problem, but I do have a few ideas:

  1. Treat specifications as data. This means you can write code that operates over a specification, and for example, automatically generate the boilerplate necessary to forward from one implementation to another.
  2. Don't ask programmers to manually write diffs. I would never ask you to make a source code change by writing a diff by hand, just because this is the best representation for a VCS to store. Instead, you would just make the edit, and expect the system to figure it out. BC-breaking changes to APIs follow the same principle; it is much simpler and easy to understand if you just make the change, rather than write a description of the change
  3. Package level modularity. In a traditional package management system, I can't release a single bundle of source code which presents multiple "package interfaces". Even in vgo, even if I have a shared codebase implementing v1 and v2, I still have to make two releases to publish a new version of code. This is backwards; there is no reason a single unit of code cannot provide multiple interfaces, and package tools should make this possible.

These are maybe a bit too radical to expect Go to adopt them, but perhaps the next generation of programming languages will explore this design space further.

9 Responses to “Semantic Import Versioning in the wild”

  1. […] Semantic Import Versioning in the wild 4 by panic | 0 comments on Hacker News. […]

  2. […] Semantic Import Versioning in the wild 8 by panic | 0 comments on Hacker News. […]

  3. Maxim Ivanov says:

    One of the most widely used examples of “semantic import versioning” you forgot to mention is shared libraries. /lib64/libz.so.1, libncurses.so.5, libc.so.6, …

  4. Hi Maxim, shared library versioning is a subtle one, which holds some similarities but not others (making it a bad analogy for semantic import versioning, but an interesting existing system to compare against.) The similarities: a compiled binary will be dynamically linked against a particular SOVERSION of a shared library (just like a semantic import), SOVERSIONs of a shared library are expected to always be ABI compatible (no BC-breaking changes.)

    The differences: although shared library versioning exhibits semantic import versioning at the binary level, it does NOT exhibit semantic imports at the source code level, since you never actually pin against a particular SOVERSION. Additionally, libz.so.1 and libz.so.2 generally will have conflicting symbol names, so you can’t load them simultaneously, and/or implement libz.so.1 in terms of libz.so.2.

  5. Daniel says:

    What you may be missing in the vgo command is that v1 can reference v2 because they are name differently. That, along with alias, provides a powerful way to have a single unit of of code that is represented in two ways.

    See https://research.swtch.com/vgo-import under “Automatic API Updates”.

    Also, your point about not asking developers to choose the semver just like you don’t ask developers to write diffs is valid. This is exactly what rsc proposes the “(v)go release” command do in https://research.swtch.com/vgo-cmd under “Preparing New Versions (go release)”.

    So I think you’re spot on in your analysis and in agreement with rsc in his and what is planned.

  6. Yeah, I pushed publish on this post before Russ’s latest post :)

    I am aware that in vgo v1 can be implemented in terms of v2. But I was under the impression that you still have to essentially do a rewrite in order to make such a thing exist. That seems like too much for most package maintainers to want to do, except in extreme cases.

  7. Anonymous says:

    It sounds very similar to the ideas Rich Hickey described in one of his talks: https://www.youtube.com/watch?v=oyLBGkS5ICk

    It is very true that we should expect (and demand!) this problem to be solved in the next generation languages. I hope to see more discussion across different communities about it.

  8. David Collier-Brown says:

    This also avoids up a whole collection of NP-complete problems Russ described in https://research.swtch.com/version-sat, which actually date back to Multics (plus Solaris, plus glibc) https://leaflessca.wordpress.com/2017/02/12/dll-hell-and-avoiding-an-np-complete-problem/

  9. Justin Bailey says:

    What a great post. I love the phrase ‘further entrenchment of bad “first try” APIs’. Exactly the problem with maintaing backwards compat (along with the cost in terms of effort!).

    I’ve always been extremely leery of the “feature flag” idea (where new features, etc., are hidden by conditionals in the code). It always seemed like a way to produce a giant spaghetti mess of code – I’d much rather deploy code that has the feature or does not, and not have to determien at run-time which logic executed. Your second point (“don’t make me write a diff”) really sums that up.

    Great thoughts and really appreciated.

Leave a Comment