Finding Compiler Bugs With Formal Verification

June 30, 2025

Author:

The most important property of a compiler–and one that you rely on every time you compile your code–is “semantics preservation." It's a fancy way of saying that the compiler preserves the meaning of your program. The compiled program (in low-level bytecode or machine code) that comes out of the compiler should do exactly what the input program does.

A similar property holds when considering the various optimizations a compiler may perform during compilation. A compiler is free to reorder assignments, change loops structure, etc., so long as the overall behavior of the program isn’t changed (and ideally the program ends up operating faster).

Compiler bugs are especially pernicious because they silently break this important property. The program that can come out of the compiler may be almost right, but the behavior has been changed in subtle ways. We've documented the many ways that compilers in the web3 ecosystem have gone wrong. The most recent bug we uncovered was in an optimization pass: in attempting to save some bit manipulating operations, the LLVM optimizer (not so subtly) changed the input program.

If we want to lower the impact of optimizer bugs one simple approach is to simply disable all optimizations. However, as the web3 ecosystem grows more complex and mature, this approach becomes less and less workable. Metered execution in the form of gas provides a real financial incentive for efficient code, and code size limits on the EVM are becoming a constraint for many developers. We clearly want the benefits of optimizations while lowering the impact of the inevitable bugs that may creep in.

While formally verified compilers do exist, no such compiler exists for the web3 ecosystem yet, and building and verifying such a compiler would be a multi-year effort. However, we can still use formal verification to gain assurance that optimizations are not changing our programs.

Equivalence Checking

At the outset, we said that a compiler must preserve the meaning of the original program. The same is true when considering optimizations: the program produced by the compiler with optimizations turned on should behave exactly the same as it would with optimizations turned off.

If we have a tool that compares two programs and indicates whether they are equivalent we can then validate compiler optimizations. If the programs before and after optimizations have the same behavior then, by definition, the optimizations didn’t change program behavior.

We’ve been working on just such an equivalence checker--it's named Concord. Given two programs, Concord uses the existing Certora Prover infrastructure to prove that the programs are equivalent on all inputs, meaning that the programs produce the same results while having the same observable side effects.

Concord has many potential applications, but the one we're especially excited about is compiler optimization validation. While our solution is still very much in development and has not yet advanced past the prototype stage, we’ve already used it to uncover an optimization bug in the Vyper compiler.

The Bug

Before proceeding we want to emphasize the following points:

The bug we identified is in the experimental Venom optimization pipeline.
The code which triggered this optimization issue did not use this optimization setting when it was deployed on-chain.
The bug has since been fixed by the Vyper developers.

Let’s first look at the code in question, which is taken from Vyper compiler’s test cases and is derived from the Curve Finance codebase:

While we have elided many of the low-level mathematical details, this function computes an approximate solution to an equation using Newtonian approximation. Upon each iteration of the loop, the approximation for y is refined using a derivative, and then the absolute change in the approximation is computed (diff). If the change in the approximation is small enough, the function returns the approximation y.

When we applied Concord to this function it flagged an input where the non-optimized version of this function would traverse the loop body and then jump back to the top of the loop, whereas the optimized version would instead return y on that same input. The details for how the equivalence checker found this input are beyond the scope of this writeup. To give a rough breakdown: Concord compiles this code with and without optimizations, instruments the resulting bytecode to record whether the loop exits/backjumps/reverts, invokes each version of the code with the same inputs, and then finally asserts that the loops performed the same action (back jumped, exited, etc.).

To reiterate, this reasoning is done within the existing Certora Prover and is thus entirely symbolic: when considering the function and loop behavior, we are reasoning about all possible inputs to the function newton_y and all possible behaviors of the loop.

After Concord flagged the difference in behaviors, we manually inspected the optimized and non-optimized versions of this code. A major change performed by the Venom optimization pipeline is moving local variables from locations in EVM memory to the EVM stack. To illustrate this change, let’s first look at the unoptimized version of the if y > y_prev check:

R942 = tacM[0x1a0]

R943 = tacM[0x120]

B944 = R943 > R942

if B944 goto 13008_1019_0_0_0_0 else goto 12978_1019_0_0_0_0

This code is presented in the TAC intermediate representation used by the Certora Prover. The variables that begin with R and B represent values on the stack. tacM[0x1a0] represents that we read from EVM memory at offset 0x1a0 and similarly for 0x120. This code then compares these two values, jumping to two different locations in the code depending on the result. The location 0x1a0 in memory corresponds to the local variable y_prev, and 0x120 holds y. Let’s look now at the code we jump to if the condition is true, i.e., y > y_prev:

R952 = tacM[0x120]

R953 = tacM[0x1a0]

R954 = R952 - R953

tacM[0x300:0x300+32] = R954

Here we can see that y_prev (aka the value in 0x1a0) is subtracted from y (the value in 0x120 and then stored into the slot 0x300, which you may have correctly guessed corresponds to the variable diff. The else branch (when y_prev >= y) is similar.

Let’s now look at the decompilation of the same code in the optimized version, where local variables have been moved from locations in memory to stack variables. Here is the same conditional check that is supposed to correspond to y > y_prev.

B879 = R874 > R874

if B879 goto 11178_1009_0_0_0_0 else goto 11032_1009_0_0_0_0

There are two things to notice here: first, we no longer see any memory accesses as the local variables have been indeed moved to the stack. However, we are comparing the same stack variable to itself! The condition we compute here is necessarily false, so let’s look at the else branch of the conditional to get a clearer idea of what’s happening:

R893 = R874 - R874

R893 is meant to hold the value of diff, which is supposed to be y_prev - y, but we are instead subtracting the same value from itself. In other words, diff is always 0. However, if diff is always 0, then the check diff < max(convergence_limit, y // 10**14) is always true, and thus the loop will always return during the first iteration. It is exactly this divergent behavior – one version of the loop returning and the other continuing – that Concord detected.

The underlying bug in the Venom optimizer relates to how local variables are mapped to stack locations. Here y and y_prev have been conflated to live in the same stack location, giving the incorrect result. When we reached out to the Vyper developers with our findings, they confirmed the validity of this bug and that it has since been fixed in Vyper 0.4.2.

Conclusion

Although this optimization bug has since been fixed, our finding demonstrates Concord's utility for validating compiler optimizations. As we mentioned previously, it's still very much under active development, but we’ll be sharing updates as it grows more robust and ready for primetime use.

In the meantime, we don’t want to discourage anyone from using compiler optimizations, but as with every part of the software development stack, it's worth not taking anything for granted and doing rigorous testing to validate your compiler hasn’t introduced subtle changes to your code.

June 30, 2025

Equivalence Checking

The Bug

Conclusion

Get every blog post delivered