The Holy Grail - Proving Against the Unknown for DeFi Protocols
Author:
Andrew FerraiouloEditor:
Uri KirsteinCertora recently raised the bar for DeFi security by formally verifying a key property of the Euler V2 Vault implementation. In this post we talk about how formal verification can offer strong assurance about DeFi protocols, find rare bugs unknown to developers, and even involve specifications that are simpler to understand than exploits. As an example of this, we discuss in detail our recent proof of Euler’s “Holy Grail” invariant: that accounts on the Euler V2 protocol stay healthy as long as prices don’t change. We show how this invariant could have ruled out the prior hack on the Euler V1 system and how it rules out the unknown for the security-first Euler V2. Our full specifications are available here.
One challenge of securing smart contracts is that security is fundamentally asymmetric: attackers need only to find one unknown bug, whereas system defenders need to rule out all possible attack vectors. This challenge relates well to a concept from cognitive science called the Johari Window. In social science it’s a way to help people better understand their relationships with others and how others see them – there is information that is/isn’t known about us by both ourselves and others. We can draw an analogy to this in security: there are behaviours of our software that are known to us (or not), and there are behaviors of our software that are known to others like white-hat hackers, and finally there is the riskiest category – the ones unknown to all!
One of the most commonly done security practices in DeFi is to have white-hat hackers audit the protocol’s code and attempt to uncover vulnerabilities. This is valuable because it helps uncover the hidden area of facade: the problems known to the auditors but not the system designers. However smart contracts and their interactions are complex, and there is the unknown – bugs too subtle for auditors to uncover from manual review or even extensive testing.
However, formal verification lets us address the unknown in a principled and actionable way. Formal verification is the act of proving or disproving the correctness of a system with respect to a certain property, using formal methods of mathematics. By contrast to manual audits or testing techniques, formal verification gives us strong assurance that a property always holds for all possible states, inputs, and interactions. In addition to this assurance, when a proof fails we also get a concrete counterexample from the Certora Prover, which may be a rarely triggered vulnerability – the unknown unknowns that even whitehat hackers may not find.
As shown in the figure below, formal verification can be applied anywhere in the design cycle. For example, a system design can be verified using a high-level verification tool such as TLA+, or the implementation can be verified against specifications using a prover such as Certora's. Verification of the implementation generates a verification report which complements the work of human auditors. During development, contracts can be re-verified as a part of continuous integration on each change, and relatedly upgradable contracts can be protected with CI prior to and after deployment.
Image: formal verification and the DeFi system design process
However, the assurance offered by formal verification is only as strong as the specification – to truly rule out the unknown, we need to develop comprehensive specifications that describe key system invariants rather than ones that simply reiterate implementation details.
Before we dig into the details of the key security properties of lending protocols, let's recap how these protocols work. The lending protocol is itself an account on some underlying asset (ERC20) contract. Lenders deposit some amount of this asset into the protocol (transferring this money to the protocol) in exchange for shares in the protocol. By agreeing to lend this money, their shares accumulate interest. Borrowers deposit some amount of collateral (a different ERC20 asset) in exchange for some amount of the underlying asset which they are borrowing. The borrowers accumulate debt interest, and they must eventually repay what they borrow and this debt. The collateral is used to incentivize repayment, and generally borrowers must overcollateralize and deposit a higher value of collateral than the value of what they have borrowed.
Two key security properties of lending protocols are solvency and account health. Solvency is when protocol participants can all get their money back out; in other words the protocol has enough assets to cover all lenders. Account health ensures that all borrower accounts are overcollateralized. A key metric for account health is its loan-to-value (LTV), which is the ratio of how much has been borrowed to how much collateral was deposited. An account is healthy (and overcollateralized) when this value is less than 1. Some protocols or governors may be more strict and define an even lower LTV for an account to be considered healthy.
The “Holy Grail” property of the Euler vault relates to account health: no action within a lending protocol should make a user become unhealthy. That is, a borrower cannot make themselves unhealthy (for example by taking on debt more than their collateral), nor should some other actor be able to make a borrower unhealthy. There are two known behaviors which are exceptions to this, an account can become unhealthy:
However, an account that is unhealthy cannot act to worsen its health (e.g. by borrowing more).
What is the relationship between these two properties? Solvency pertains to the overall assets of the protocol whereas the holy grail rule pertains to the position of individual accounts. A secure lending protocol should enforce both of these properties, and indeed the implementation of one may be related to the other. However, in general one does not imply the other. We can see this by considering contrived systems:
Because of this, we proved both solvency and the holy grail rule for the Euler V2 vault.
It is worth noting that both of these properties deal with the output behaviors of the system rather than the implementation details, and there are multiple system designs that can achieve both. A key practice of secure specification design is to distance the specification from the implementation details – broader, more abstract properties dealing in behaviors have a better chance of catching unknown bugs than ones that restate the implementation details.
Our proof strategy for the holy grail rule is based on induction on the transitions in the protocol’s state transition system. As the base case, we prove that all accounts are healthy initially. We then show that if an arbitrary account is healthy in an arbitrary configuration then after one transition it remains healthy. These transitions include the public functions of the protocol.
In a bit more detail, what we prove for an individual transition is: neither the borrower, nor any other actor, can turn the borrower’s position from healthy to unhealthy. Aside from the two exceptions noted above: 1) price changes, 2) interest accrual.
Let’s see how we can specify this in Certora’s verification language, CVL.
This code segment is simplified a bit from our full specifications which are available here, but it illustrates the key concepts. First we declare variables that represent an arbitrary EVM state, arbitrary function arguments, and an arbitrary account. We then save whether or not this account was healthy initially, execute any vault function, then check the account health again. We then prove that if the account was healthy initially, it must be healthy afterwards.
Because the Euler V2 system design and implementation were heavily focused on security, we were able to prove the Holy Grail rule without necessitating any changes to the implementation.
Why was this property called the Holy Grail? It’s closely related to the vulnerability that allowed a sophisticated and complex attack on the Euler V1 system, and could have prevented it. The attack is described more fully in this Coinbase article. Briefly, the attack relied on:
Fundamentally the underlying vulnerability was due to an incorrect health check on a feature that was added late in the development cycle.
Here were the steps to the attack:
Step 1: The attacker initiates a flash loan for $210M from AAVE
Step 2: The attacker deposits the proceeds from the flash loan as collateral to an Euler Vault.
Step 3: The attacker requests shares to be minted leveraging their collateral
Step 4: The attacker calls donateToReserves giving their collateral to the Vault’s reserves
Step 5: The attacker uses a liquidator contract they control to liquidate their user account. The liquidation is profitable to the liquidator contract (which they control)
Step 6: The liquidator contract withdraws the proceeds from liquidation
Step 7: The attacker pays off the flash loan using the proceeds from their own liquidation at a profit
We can see that the vulnerable donateToReserves function does not enforce the holy grail rule because it can end in a state where the donor is unhealthy (and the function does not revert).
The Euler V2 design however does enjoy the Holy Grail property. Let’s see how its system design enforces this.
Step 1: Users don’t directly call the vault. They execute a call or batch on the EVC which requests the EVC to perform one or more vault operations on their behalf. Inside the call/batch implementation, the EVC sets an onBehalfOfAccount address to track the caller on behalf of which the EVC is operating.
Step 2: The EVC invokes the requested vault operation(s) on behalf of the user account. It does so by making an EVM-level call operation invoking this method on the Vault.
Step 3: The Vault enqueues health checks for any accounts that have balances that could change as a result of the operation. These checks are enqueued to be performed later because of EVK’s batch feature – multiple vault operations can be done in a sequence called a batch. The batch is allowed to temporarily make accounts unhealthy as long as by the time the whole batch is done, the needed health checks succeed.
Step 4: The vault executes the requested operation(s) (e.g. deposit, withdraw, redeem, etc.) which may affect the balances of one or more accounts
Step 5: The control flow returns to the EVC’s call/batch/permit method (remember: we entered the vault via a low-level call op from EVC)
Step 6: The EVC executes status checks for all the accounts that have had a status check enqueued by the vault. (In more detail: this is done as a part of a call to restoreExecutionContext) By deferring health checks until the very end of the call the risk of doing the health check too early is also mitigated – the health check happens after all the effects of the operation have happened and the call will revert if any accounts have become unhealthy.
Here’s why this call graph enforces our property. Imagine the user issues a call that borrows beyond their collateral which would cause their LTV to exceed 1. The vault will enqueue a status check for the borrowing account. When this enqueued status check is executed (in step 6), the check will fail reverting the whole call.
You might be wondering: why don’t we just check that the health check was called for each function? There are a few reasons for this. Firstly, this is overly conservative as not all vault operations can move accounts towards unhealthy, and we want to avoid extra checks which cost gas. Second, we do not want to rely on human audit to specify which accounts might be affected by each vault call. By placing more trust in a broader specification, we have a greater chance in going beyond what humans alone can uncover. As a result, our property would catch some other serious bugs as well:
Next, let’s go into a bit more detail on our verification setup and some assumptions we make.
Verification tools like the Certora prover which are based on Satisfiability Modulo Theories (SMT) are highly automated, but can run into timeouts where the tool will either take too long to provide a solution or never be able to do so. This is because under-the-hood, they are based on an NP-hard problem! One of the key ways to address this is to do modular verification where the proof goal is split into multiple sub-problems. Often this is done by proving a smaller intermediate theorem and then using or assuming this in another.
There are two key ways we do modular verification for the Euler Vault Kit. The first is to leverage Euler’s design which is already modular – the top level contract is split into modules for different parts of the functionality. We verify each of these modules separately. The second is that we verified Euler’s Ethereum Vault Connector (EVC) and then leveraged these assumptions to prove the holy grail rule about the vault.
The verification setup is illustrated below. We actually verify a slightly different contract from each module under verification – instead we construct a contract that inherits from: the module of interest, the risk manager, and an abstract verification harness. The risk manager is needed for each module because it implements the relevant health check functionality. The abstract verification harness is useful to help us group together and reuse definitions, harness functions, and assumptions.
Our specification also relies on a few assumptions and some properties we proved about other system components. First, we rely on some assumptions about the Ethereum Vault Connector also developed by Euler and which we verified separately:
We assume the protocol governor is fully trusted and will choose reasonable price oracles, underlying assets and unit of account assets. We also assume the following properties of LTV values which are enforced in Governance.setLTV, and which we also proved hold separately.
We also assume all oracle quotes (the value of a single token) are less than 2**230 to avoid overflows; this is highly unlikely to be exceeded in practice.
We were able to give strong assurance that Euler V2 keeps its accounts healthy through formal verification. Formal verification can rule out unknown vulnerabilities, but verification is only as good as the specifications. Writing good specifications takes time and effort; it’s best to start this early in the design process. Specifications that describe overall system behaviors are more likely to catch unknown bugs than ones that restate implementation details. Existing formal verification tools such as the Certora Prover can also verify key properties such as account health and solvency with a high degree of automation.
[1]: Credit: Doug Hoyte, Co-Founder Euler