August 14, 2023

Decompiling Vyper Programs for Formal Verification

The Vyper programming language provides a clean memory and control abstraction. In comparison to Solidity, it is considered highly attractive for DeFi programming. Unfortunately, Vyper programs still contain critical, logical errors. A Vyper compiler error caused losses of $70M just last month.

We describe the Certora Verification toolkit for preventing billion-dollar logical mistakes in Vyper programs that could have prevented last month’s exploit. Through the use of the toolkit, the user expresses high level desired properties of the Vyper smart contract. The computer then analyzes the executable version of the contract to find input behaviors that violate the properties or even mathematically prove their absence. The main idea is to automatically recover low-level memory structure from the bytecode and then encode the precise semantics of the decompiled code into a mathematical constraint system. We then harness existing solvers to construct a solution to mathematical constraints indicating a bug in the code or proof that the constraints are infeasible. This means that the program is guaranteed to satisfy the requirements.

We show the application of the tool to identify errors and prove realistic code.

The Certora Verification Pipeline Revisited

The Certora Verification Pipeline has already been discussed in our whitepaper. For this blog, we provide a summary for Vyper users. The image below shows the phases involved in the Certora Prover.

  1. The source program is compiled into an EVM (Ethereum Virtual Machine) executable using the Vyper compiler.
  2. The user describes desired properties in a CVL (Certora Verification Language) specification.
  3. The EVM is decompiled into an intermediate language called TAC (Three Address Code).
  4. The TAC program is compiled into an SMT2 logical formula.
  5. Either an SMT solver finds bugs in the code, or the rules are guaranteed to hold.

Challenges of Reasoning About Vyper Bytecode

Although both produce EVM bytecode, the outputs of the Vyper and Solidity compilers can be very different. In particular, the Vyper compiler will extensively copy values between different cells in memory. Ultimately, when the Certora Prover translates an EVM bytecode program into logical formulae, memory is modeled using a technique called “array theory”, which is notoriously expensive to include. In this case, the extensive use of memory by Vyper poses significant scalability challenges to formal verification of Vyper smart contracts.

To address these scalability concerns, Certora’s R&D team extended the existing, proprietary memory analysis for EVM bytecode to handle the use cases generated by the Vyper compiler. This analysis can recover some high-level details about the layout of memory in EVM bytecode programs, and prove invariants which are used to optimize our application of array theory and increase the scalability of our automated verification.

A real-world example

We want to verify the validity of a sort function written in Vyper. This code is taken from the curve-crypto-contract repository of CurveFi.

First, we add an external function to access this code for verification. This is necessary because the Certora Prover interacts with smart contracts by simulating arbitrary messages sent from an unknown sender. Adding this function is essential, as internal functions are not part of the public ABI of a contract and they are not directly accessible to our prover.

The following CVL rule checks that the sort function returns the elements in the descending order

However this is only one of many properties we care about. We also want to know that the sort function returns the same elements that are given to the function. To avoid reasoning about permutations, the following rule uses a free variable and states that the two polynomials are equal, which implies that a0,a1,a2 and b0,b1,b2 are permutations of each other.

Note

The last equality uses full unbounded integer arithmetic as all numbers are cast to mathint.

Status and Current Limitations

Current Vyper language support is at a pre-alpha level. Most of the language’s features are supported. Many non-trivial properties of programs can be proven, including

  • Properties of containing all basic language syntax
  • Properties of complex data structures in both memory and storage
  • Properties that require reasoning about interactions of multiple contracts

Current work in progress includes and is not limited to (and in no particular order):

  • Currently, only Vyper 0.3.7 and 0.3.9 are supported
  • First-class support for dynamic arrays in storage
  • Certain advanced features of CVL (in particular, ghost state and hooks) are not yet supported
  • A variety of pending internal analytic improvements are in flight
  • Efforts continue around performance optimization of our Vyper analysis

Related Tools

Formal verification is a mature field of computer science with excellent techniques and open-source software. The Viper Verification Structure also enables formal verification of Vyper contracts. The Certora Prover enables more scalable verification by

  1. Static program analysis of bytecode, reducing the inherent complexity of constraint solving.
  2. Making modular specification optional, therefore reducing the amount of manual labor required to verify code formally.
  3. We deploy, divide and conquer multiple SMT solvers to scale formal verification to larger programs.

Also, the Certora Prover protects against compiler bugs.

Acknowledgments

We wish to thank Benjamin Greenwald who worked to add support for Vyper to the Certora prover.

Get every blog post delivered

Certora Logo
logologo
Terms of UsePrivacy Policy