Why testing is not enough for million-dollar code

Nurit Dor, Shelly Grossman, Lior Oppenheim, Mooly Sagiv, and James Wilcox

January 20, 2020

All testing techniques boil down to checking whether code behaves as expected on a chosen set of inputs. Testing plays a central role in software quality assurance across the industry, and indeed, many bugs are detected through testing. However, for safety-critical, mission-critical, or high-value code, testing alone is insufficient, since bugs may happen in rare scenarios or on maliciously crafted inputs. Bugs that require many unlikely events to occur in succession are difficult to find by testing: there are just too many potential behaviors to test. In critical domains, rare bugs cannot be ignored, since they can lead to lost life, property, or money. In these domains, formal verification can provide complete coverage of the input space, giving guarantees beyond what is possible from testing alone.

One such domain is Ethereum smart contracts, which are often only a few lines of code, but have many behaviors and execute in an adversarial environment. Writing smart contracts involves low-level event-based programming, and high execution costs on the blockchain require programmers to use sophisticated algorithms and data structures in their contracts for performance.

Smart Contracts with Rare Bugs

Here’s an example of an interesting bug in a smart contract. (This example is based on a bug found in an early version of the Maker MCD system, but has been simplified to make this blog post clearer.) The contract in question implements a reverse auction where bidders offer to take decreasing prize amounts for a fixed payment. The bidder who has offered to take the lowest prize value is the winner. The auction terminates after a fixed amount of time, or if no one submits a new winning bid for one hour. Upon termination, the system mints an amount of tokens equal to the winning bid’s prize value, and transfers it to the winner. The full code appears below.

Let’s take a deeper look and demonstrate how a malicious user can exploit this contract to mint an unbounded supply of tokens without taking any risk.

Suppose that the system starts with supply equal to 99 and no auctions. A malicious user can legitimately trigger the system to create a new auction. The system calls the new function of the Auction contract, defined below, with the arguments:


Auction #1 is started with an initial prize of 2**256-1, a fixed payment of 5, and an end_time in one day. Now, the system waits for bid operations to reduce the prize. Since the auction starts with an unrealistic initial prize of 2**256-1, any bid specifying a prize below that can be given as argument to bid and sets the caller as the new winner.

The attacker can now call:

Auction.bid(1, 2**256-100)

Since the prize 2**256-100 is the smallest number seen so far, the attacker pays the payment and becomes the current, tentative winner of the auction. The auction’s current bid expiry is one hour. If in one hour someone else provides a smaller bid then the attacker will be paid back thus losing nothing and may try again. If in one hour no one provides a smaller bid then at the earliest possible time, 1 hours and 1 second, the attacker calls:


This operation fulfills the auction and mints 2**256-100 tokens, equal to the latest prize, and transfers them to the attacker. In addition, the supply of tokens increases from 99 to 2**256-1 (MAX_UINT), causing the system to not be able to mint more tokens.

One way to describe this bug is that the contract assumes that the first bid will reduce the price to a "reasonable" amount. This is a severe bug, since it can lead to unbounded minting. This bug is also challenging to catch by testing without knowing it in advance, since many parameters have to align in subtle ways in order to trigger it. Indeed, the bug was missed by testing in practice and was uncovered by Certora using formal methods. (The bug was independently found by manual auditing. See Maker's blog post for a description of the bugs found by different techniques.)

contract Auction {
  function new(uint id, uint payment) authorized {
     require(auction[id].end_time == 0); //check id in not occupied
     auction[id] = Auction(2**256-1,payment,this,0,now+1 day);
                // arguments: prize, payment, winner, bid_expiry, end_time
  function bid(uint id, uint b) {
     require(b<auction[id].prize); // prize can only decrease
     // new winner pays by repaying last winner

     // update new winner with new prize
     auction[id].prize = b;
     auction[id].winner = msg.sender;
     auction[id].bid_expiry = now + 1 hour;
  function close(uint id) {
     require(auction[id].bid_expiry != 0
         && (auction[id].bid_expiry < now || auction[id].end_time < now));
     Token.mint(auction[id].winner, auction[id].prize);
     delete auction[id];

contract Token {
  function mint(address who, uint amount) authorized {
     balances[who] = safeAdd(balances[who],amount);
     supply = safeAdd(supply,amount);

Beyond tests, towards formal reasoning

Formal specifications and exhaustive testing offer a far more reliable way to uncover bugs like this long before they are released and create the risk of catastrophic outcomes. The first step in identifying such rare bugs is specifying the most highly desired properties of the code. One such desired property of most smart contracts is The Bounded Supply Property: "Nobody should be able to mint an unbounded amount of tokens". Tools like the Certora Prover and Verisol can automatically uncover rare bugs by comparing the desired behavior and the existing behavior for all possible scenarios. This intuitive property can be formalized in the Specify input language of the Certora Prover as follows:

rule bounded_supply() {
    uint256 _supply = gemTotalSupply(); // gem supply before

    // the next 3 lines invoke an arbitrary public function on an arbitrary input
    method f;
    calldataarg arg;

    uint256 supply_ = gemTotalSupply(); // gem supply after

    assert _supply != supply_ => supply_ < 2**256-1,
           "Cannot increase to MAX_UINT256";

The rule bounded_supply says that no public entrypoint to the contract increases the total supply to MAX_UINT. The rule works by first fetching the current total supply, then invoking an arbitrary public function of the contract on an arbitrary input, and then checking that the new total supply has not been increased to MAX_UINT. (More precisely, it checks that if the supply has changed, then it has not been set to MAX_UINT, which handles the case where the initial total supply was already MAX_UINT.)

When checking the contract with this bounded_supply property, the Certora Prover produces a description of a violation of the specification within a few seconds.

Output of Certora Prover for checking this rule on several methods from the auction contract. The deal() function is marked in red to indicate it is a violation. The details of the violation are also shown as an assignment of values to variables in the rule, but this part is not intended to be understood deeply by readers.

The red row in the table says that the public deal() function can violate the assertion in the specification that checks that the supply does not become MAX_UINT, and provides concrete test inputs to demonstrate the violation.


As Dijkstra famously said, "Program testing can be used to show the presence of bugs, but never to show their absence!" This quote is often used as a motivation to formally verify that the code does not have bugs using mathematical theorems. Formal reasoning is effective at detecting rare bugs which are hard to be detected by humans. This requires: (1) clear definitions of the code intends to do and (2) techniques for comparing program behaviors to desired behaviors. If you are concerned that the stakes are just too high to rely entirely on testing and manual code audits, then please contact Certora at info@certora.com to find out how we can help.