ezyang’s blog

the arc of software bends towards understanding

Why you should maintain a personal LLM coding benchmark

Do you use an LLM for coding? Do you maintain a personal benchmark based on problems you have posed the LLM? The purpose of this blog post is to convince you should do this: that you can do so with marginal effort on top of your day-to-day vibe coding and that you will get both […]

  • April 4, 2025