Debugging compilers with optimization fuel : ezyang’s blog

Debugging compilers with optimization fuel

Today I would like to describe how I pin down compiler bugs, specifically, bugs tickled by optimizations, using a neat feature that Hoopl has called optimization fuel. Unfortunately, this isn’t a particularly Googleable term, so hopefully this post will get some Google juice too. Optimization fuel was originally introduced by David Whalley in 1994 in a paper Automatic isolation of compiler errors. The basic idea is that all optimizations performed by the compiler can be limited (e.g. by limiting the fuel), so when we suspect the optimizer is misbehaving, we binary search to find the maximum amount of fuel we can give the compiler before it introduces the bug. We can then inspect the offending optimization and fix the bug. Optimization fuel is a feature of the new code generator, and is only available if you pass -fuse-new-codegen to GHC.

The bug

The bug shows up when I attempt to build GHC itself with the new code generator. Building GHC is a great way to ferret out bugs, since it has so much code in it, it manages to cover a lot of cases:

"inplace/bin/ghc-stage1" (...) -o compiler/stage2/build/FastString.o
ghc-stage1: panic! (the 'impossible' happened)
  (GHC version 7.1 for i386-unknown-linux):
       RegAllocLinear.makeRegMovementGraph

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

We quickly grep the codebase to find the relevant error, which is in compiler/nativeGen/RegAlloc/Linear/JoinToTargets.hs:

-- | Construct a graph of register\/spill movements.
--
--  Cyclic components seem to occur only very rarely.
--
--  We cut some corners by not handling memory-to-memory moves.
--  This shouldn't happen because every temporary gets its own stack slot.
--
makeRegMovementGraph :: RegMap Loc -> RegMap Loc -> [(Unique, Loc, [Loc])]
makeRegMovementGraph adjusted_assig dest_assig
 = let
        mkNodes src vreg
         = expandNode vreg src
         $ lookupWithDefaultUFM_Directly
                dest_assig
                (panic "RegAllocLinear.makeRegMovementGraph")
                vreg

   in       [ node  | (vreg, src) <- ufmToList adjusted_assig
                , node <- mkNodes src vreg ]

But the source code doesn’t particularly suggest what the problem might be. Time to start using optimization fuel!

Binary search

We can modify the amount of optimization fuel GHC has, for running optimizations, by changing the value of -dopt-fuel. The first thing we do if see the bug is present with zero optimization fuel:

$ "inplace/bin/ghc-stage1" (...) -o compiler/stage2/build/FastString.o -dopt-fuel=0

Great, it worked! We pick some large number to start our binary search at (and pass -fforce-recomp, so GHC actually compiles the program.)

$ "inplace/bin/ghc-stage1" (...) -o compiler/stage2/build/FastString.o -dopt-fuel=1000 -fforce-recomp
ghc-stage1: panic! (the 'impossible' happened)
  (GHC version 7.1 for i386-unknown-linux):
       RegAllocLinear.makeRegMovementGraph

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

I then binary search (test 500, if that fails test 750, etc), until I find the point at which adding one to the fuel causes the failure.

$ "inplace/bin/ghc-stage1" (...) -o compiler/stage2/build/FastString.o -dopt-fuel=709 -fforce-recomp
$ "inplace/bin/ghc-stage1" (...) -o compiler/stage2/build/FastString.o -dopt-fuel=710 -fforce-recomp
ghc-stage1: panic! (the 'impossible' happened)
  (GHC version 7.1 for i386-unknown-linux):
       RegAllocLinear.makeRegMovementGraph

Viewing the culprit

How do we convince GHC to tell us what optimization it did with the 710st bit of fuel? My favorite method is to dump out the optimized C-- from both runs, and then do a diff. We can dump the C-- to a file using -ddump-opt-cmm -ddump-to-file, and then diff reveals:

@@ -10059,7 +10059,6 @@
         }
     c45T:
         _s3es::I32 = I32[Sp + 4];
-        _s3eu::I32 = I32[Sp + 0];
         // deleted: if (0) goto c460;
         // outOfLine should follow:
         _s3er::I32 = 0;
@@ -10093,1354 +10092,3 @@
         jump (I32[Sp + 0]) ();
 }

The optimization is deleting an assignment. Is this valid? Here is the full code, with some annotations:

FastString.$whashStr_entry()
        { [const 131081;, const 0;, const 15;]
        }
    c45T:
        _s3es::I32 = I32[Sp + 4];
        _s3eu::I32 = I32[Sp + 0]; // deleted assignment
        _s3er::I32 = 0;
        _s3ex::I32 = 0;
        goto c463;
    c460:
        R1 = FastString.$whashStr_closure;
        jump (I32[BaseReg - 4]) ();
    c463:
        if (I32[GHC.Types.Bool_closure_tbl + ((_s3er::I32 == _s3es::I32) << 2)] & 3 >= 2) goto c46d;
        // uh oh, assignment used here
        _s3IC::I32 = %MO_S_Rem_W32(%MO_UU_Conv_W8_W32(I8[_s3eu::I32 + (_s3er::I32 << 0)]) + _s3ex::I32 * 128,
                                   4091);
        _s3er::I32 = _s3er::I32 + 1;
        _s3ex::I32 = _s3IC::I32;
        goto c463;
    c46d:
        R1 = _s3ex::I32;
        Sp = Sp + 8;
        jump (I32[Sp + 0]) ();
}

Seems not: the variable is used in MO_S_Rem_W32: that’s no good. We conclude that the bug is in an optimization pass, and it is not the case that the register allocator failed to handle a case that our optimization is now tickling.

Fixing the bug

With this information, we can also extract the program fragment that caused this bug:

hashStr  :: Ptr Word8 -> Int -> Int
hashStr (Ptr a#) (I# len#) = loop 0# 0#
   where
    loop h n | n GHC.Exts.==# len# = I# h
             | otherwise  = loop h2 (n GHC.Exts.+# 1#)
          where !c = ord# (indexCharOffAddr# a# n)
                !h2 = (c GHC.Exts.+# (h GHC.Exts.*# 128#)) `remInt#` 4091#

We can also see how our pipeline is processing the program, and observe precisely where in the process the bad optimization was made:

==================== Post Proc Points Added ====================
{offset
  c43r:
      _s3es::I32 = I32[(old + 8)];
      _s3eu::I32 = I32[(old + 12)];
      if (Sp - <highSp> < SpLim) goto c43y; else goto c43u;

==================== Post spills and reloads ====================
{offset
  c43r:
      _s3es::I32 = I32[(old + 8)];
      _s3eu::I32 = I32[(old + 12)];
      if (Sp - <highSp> < SpLim) goto c43y; else goto c43u;

==================== Post rewrite assignments ====================
{offset
  c43r:
      _s3es::I32 = I32[(old + 8)];
      if (Sp - <highSp> < SpLim) goto c43y; else goto c43u;

Since this is a spurious instance of code removal, we look for all mentions of emptyGraph in the rewrite assignments optimization pass:

usageRewrite :: BwdRewrite FuelUniqSM (WithRegUsage CmmNode) UsageMap
usageRewrite = mkBRewrite3 first middle last
    where first  _ _ = return Nothing
          middle :: Monad m => WithRegUsage CmmNode O O -> UsageMap -> m (Maybe (Graph (WithRegUsage CmmNode) O O))
          middle (Plain (CmmAssign (CmmLocal l) e)) f
                     = return . Just
                     $ case lookupUFM f l of
                            Nothing    -> emptyGraph
                            Just usage -> mkMiddle (AssignLocal l e usage)
          middle _ _ = return Nothing
          last   _ _ = return Nothing

This looks like it should be an unobjectionable case of dead assignment elimination coupled with liveness analysis, but for some reason, the backwards facts are not being propagated properly. In fact, the problem is that I attempted to optimize the Hoopl dataflow function, and got it wrong. (Fixpoint analysis is tricky!) After reverting my changes, the unsound optimization goes away. Phew.

John Regehr says:

June 27, 2011 at 12:12 pm

This is a nice post Edward, thanks.

Do you have a sense for how much of the time a fuel-based compiler debugging session goes awry because the search finds an optimization that merely enables the buggy one, instead of actually being the buggy one?

I did an experiment where I did a binary search on GCC versions in order to find the first one that introduced a given bug. This turned out to be not such a great idea because very often, the result of the binary search contained an incidental change that exposed a previously introduced bug.

nominolo says:

June 27, 2011 at 6:22 pm

Your statement “Optimization fuel is a feature of the new code generator” is only sort of correct. I’m pretty sure the Core-to-Core passes do use optimization fuel (or at least used to).

It is also worth mentioning that using optimisation fuel can introduce bugs itself, e.g., when it inhibits essential optimizations from being applied.

Edward Z. Yang says:

June 27, 2011 at 7:49 pm

John: I’ve done serious fuel-based debugging several times and have never had it lead me astray, but the sample size is rather small. As nominolo points out, if essential “optimizations” are skipped, fuel will cause the program to stop working (this was the case for the new code generator before I made essential optimizations fuel invariant.)

Even if a particular optimization only unmasks an earlier introduced bug, it can still be useful. In particular, optimizations in one block usually affect only that block, so I can massively narrow down the search space.

nominolo: Oh really? At the very least, the C– passes are the only ones that refer to it as “fuel.”

June 28, 2011 at 9:58 am

OK, I may be mistaken. I thought I had seen it in the source code before the new code gen was merged (some time in 2008, I believe). Looking through the history, though, it looks like I was wrong.

Terry A. Davis says:

June 28, 2011 at 8:25 pm

Echo(ON); echoes characters going into Lex().

LTrace(ON); prints tolkens going out of Lex().

PTrace(ON); prints intermediate code going out of parser.

OTrace(ON); prints intermediate code after the 10 optimization passes… Thewy are not optional.

CTrace(ON); unassembles output code from compiler.

OTracePassMask(); lets you control which optimization pass tresults are displayed.

#exe {SysDbg(ON);} lets you turn on and off the debug flag inside code.

IsSysDbgt() lets you read the debug flag in the compiler. When adding a feature, you can make it only present when SysDbg is ON.

I wrote a 64-bit assembler, compiler and unassembler. If you say a binary search, that was the unassembler. The instructions are defined in /LT/Compiler/OpCodes.TXZ

Norman Ramsey says:

July 18, 2011 at 1:55 pm

Edward,

In the case of a panic, which GHC itself can detect is a bug, it is awfully nice to have GHC do the binary search itself. We did this in Quick C– and it was a great boon. (It was a little more useful for us because our compiler driver and regression-test driver were the same driver, so we could isolate any bug-inducing fault, not only panic-inducing faults.)

July 18, 2011 at 7:02 pm

Yeah. I think it would not be too difficult to add this feature to our test drivers, but I’m not really sure how I would put it into GHC itself (our options passing machinery is not exactly modular, though you could probably make it work?)