ezyang's blog

the arc of software bends towards understanding

Posts

Why you should maintain a personal LLM coding benchmark

Do you use an LLM for coding? Do you maintain a personal benchmark based on problems you have posed the LLM? The purpose of this blog post is to convince you should do this: that you can do so with marginal effort on top of your day-to-day vibe coding and that you will get both short and long term benefits from making your own personal benchmark exist.


I started thinking about benchmarks for coding in part with my frustration with the discourse around LLMs in the public squares I frequent (Reddit and Twitter). People often want to know “what’s the best model” or “what’s the best coding IDE”? One might imagine that the way to answer this question would be to test the models on a variety of problems from real world uses of the LLM for coding, and then compare how well various systems do on this. Indeed, whenever a new SOTA model releases, the lab will usually tell you about the model’s performance against a few well known coding benchmarks. Problem solved?

Read more...

New Years resolutions for PyTorch in 2025

In my previous two posts “`Ways to use torch.compile <http://blog.ezyang.com/2024/11/ways-to-use-torch-compile/>`_” and “`Ways to use torch.export <http://blog.ezyang.com/2024/12/ways-to-use-torch-export/>`_”, I often said that PyTorch would be good for a use case, but there might be some downsides. Some of the downsides are foundational and difficult to remove. But some… just seem like a little something is missing from PyTorch. In this post, here are some things I hope we will end up shipping in 2025!

Read more...

Ways to use torch.export

Previously, I discussed the value proposition of torch.compile. While doing so, I observed a number of downsides (long compile time, complicated operational model, lack of packaging) that were intrinsic to torch.compile’s API contract, which emphasized being able to work on Python code as is, with minimal intervention from users. torch.export occupies a different spot in the tradeoff space: in exchange for more upfront work making a model exportable, it allows for use of PyTorch models in environments where using torch.compile as is would be impossible.

Read more...

Ways to use torch.compile

On the surface, the value proposition of torch.compile is simple: compile your PyTorch model and it runs X% faster. But after having spent a lot of time helping users from all walks of life use torch.compile, I have found that actually understanding how this value proposition applies to your situation can be quite subtle! In this post, I want to walk through the ways to use torch.compile, and within these use cases, what works and what doesn’t. By the way, some of these gaps are either served by export, or by missing features we are actively working on, those will be some other posts!

Read more...

Tensor programming for databases, with first class dimensions

Tensor libraries like PyTorch and JAX have developed compact and accelerated APIs for manipulating n-dimensional arrays. N-dimensional arrays are kind of similar to tables in database, and this results in the logical question which is could you setup a Tensor-like API to do queries on databases that would be normally done with SQL? We have two challenges:

  • Tensor computation is typically uniform and data-independent. But SQL relational queries are almost entirely about filtering and joining data in a data-dependent way.
  • JOINs in SQL can be thought of as performing outer joins, which is not a very common operation in tensor computation.

However, we have a secret weapon: first class dimensions were primarily designed to as a new frontend syntax that made it easy to express einsum, batching and tensor indexing expressions. They might be good for SQL too.

Read more...

What's different this time? LLM edition

One of the things that I learned in grad school is that even if you’ve picked an important and unsolved problem, you need some reason to believe it is solvable–especially if people have tried to solve it before! In other words, “What’s different this time?” This is perhaps a dreary way of shooting down otherwise promising research directions, but you can flip it around: when the world changes, you can ask, “What can I do now that I couldn’t do before?”

Read more...

Interactive scraping with Jupyter and Puppeteer

One of the annoying things about scraping websites is bouncing back and forth between the browser where you are using Dev Tools to work out what selectors you should be using to scrape out data, and your actual scraping script, which is usually some batch program that may have to take a few steps before the step you are debugging. A batch script is fine once your scraper is up and running, but while developing, it’s really handy to pause the scraping process at some page and fiddle around with the DOM to see what to do.

Read more...

PyTorch Developer Podcast

I’m launching a new podcast, the PyTorch Developer Podcast. The idea is to be a place for the PyTorch dev team to do bite sized (10-20 min) topics about all sorts of internal development topics in PyTorch. For now, it’s just me monologuing for fifteen minutes about whatever topic I decide. The plan is to release an episode daily, five days a week, until I run out of things to say (probably not for a while, I have SO MANY THINGS TO SAY). I don’t edit the podcasts and do minimal planning, so they’re a bit easier to do than blog posts. Check it out! There’s two episodes out already, one about how we do Python bindings for our C++ objects and another about history and constraints of the dispatcher. If there are any topics you’d like me to cover, give a shout.

Read more...

Rage bug reporting

At Facebook, we have an internal convention for tooling called “rage”. When something goes wrong and you want to report a bug, the tool developer will typically ask you to give them a rage. For a command line tool, this can be done by running a rage subcommand, which will ask about which previous CLI invocation you’d like to report, and then giving you a bundle of logs to send to the developer.

Read more...

The PyTorch open source process

PyTorch is a fairly large and active open source project, and sometimes we have people come to us and ask if there are any lessons from how we run PyTorch that they could apply to their own projects. This post is an attempt to describe some of the processes as of 2021 that help PyTorch operate effectively as an open source project. I won’t claim that everything we do necessarily the best way to go about doing things, but at the very least, everything I describe here is working in practice.

Read more...