Certora Bug Disclosure

Non-deterministic Solidity Transactions

Thomas Bernardi, Nurit Dor, Anastasia Fedotov, Shelly Grossman, Alexander Nutz, Lior Oppenheim, Or Pistiner, Mooly Sagiv, John Toman, James Wilcox

November 10, 2020

John Toman of the Certora development team discovered a previously unknown code generation bug in the Solidity compiler. This bug allows malicious bytes/string storage fields to deceive Solidity into including unrelated information from memory into arrays generated via abi.encodePacked and similar methods.

Background

Bytes and Strings in Storage

As described in our previous post bytes arrays and strings use a specialized representation to save storage data. In particular, if the array/string is less than 31 elements (i.e. bytes) long, then Solidity will store the length of the array in the same storage slot as the element data itself. The least-significant byte stores the length of the array times two (for reasons explained below), while the remaining 31 bytes hold the array data.

This "packed" representation is in contrast to how arrays are normally laid out in storage: the length of the array is stored in one slot, while the array elements are stored in contiguous slots beginning at a location computed via a hash function. Note that for bytes arrays of length greater than 31, this representation is also used, albeit with a slight modification. Instead of storing the length of the bytes array directly, the Solidity compiler will store the length times two plus one. This addition ensures code can determine disambiguate which layout is used (i.e., "packed" vs "standard"): if the least significant bit is not set, then the packed representation must be used. However, if the least-significant bit is set, then the array data lies elsewhere in storage.

Understanding the Bump Allocator

As described in our post on memory corruption, Ethereum memory is a single, contiguous array of bytes. The Solidity compiler gives structure to this code by using a "free pointer" which is monotonically incremented throughout the lifetime of the contract. In particular, when allocating a new memory object, the compiler generates code which reads the current value of the free pointer, increments it by a value large enough to hold the allocated object, and then updates the free pointer. The old value of the free pointer (i.e., before incrementing) is then used as a "pointer" to a segment of Ethereum memory reserved exclusively for the allocated memory object. In other words, after allocation the region of memory reserved by incrementing the free pointer may not be used for any other purpose than storing the allocated object.

However, the region of memory allocated by Solidity is not necessarily unused. Solidity often uses all memory after the free pointer as a "scratch" area. This scratch area is used to construct the inputs for inter-contract calls, log messages, etc. Unlike a managed language like Java, "allocated" memory is not necessarily zero, and can in fact initially hold totally arbitrary data. However, the Solidity compiler usually generates code to initialize freshly allocated data segment to contain default values.

An Exception: abi methods

As mentioned above, the bump allocator will pre-compute the size of an in-memory object and increment the free pointer by that amount. However, unlike (regular) arrays or structs, the length of an array created with methods like abi.encodePacked are not pre-computed. Instead, the array is constructed "in-place". For calls like abi.encodePacked (including encodeWithSelector, encodeWithSignature, etc.) the compiler generates (roughly) the following pseudo-code:

var ptr = freePointer + 32; // reserve 32 bytes for the length field
for(x in argumentsToAbiFunction) {
   copy(src=x, dst=ptr)
   ptr += sizeOf(x)
}
val lengthInBytes = (ptr - freePointer) - 32; // don't count the length field!
memory[freePointer] = lengthInBytes;
freePointer = ptr;

Here copy is a pseudo-operation which copies the representation of x (one of the arguments to encodePacked) to the region of memory beginning at ptr. Note that the length of the generated array is computed by simply subtracting the current value of the free pointer from ptr which points immediately after the last byte copied into the generated array. After the length is computed, the free pointer is finally updated. Crucially, the memory of the generated array is not initialized by Solidity, it assumes that the copy operation will have overwritten any data left over from previous usage as scratch memory.

The Bug: From Storage to Memory via abi

When Solidity determines that an argument to an abi function is a bytes array or string in storage it will test the least significant bit of the storage slot of the bytes array. Under pragma experimental ABIEncoderV2, if the bit is not set (indicating the bytes array is packed) then the entire slot is divided by two, and then masked with 0x7f, i.e., 127. Effectively, this operation extracts the length stored in the least-significant byte the slot, capping the length at 127 (not 31). The upper 31 bytes of the storage slot are then copied into memory at ptr, and finally ptr is incremented by the value yielded from masking with 127, i.e., sizeOf(x) for a packed bytes array is (slot >> 1) & 0x7f). If the storage field has been maliciously crafted to include a value greater than 62 in its least significant byte, then uninitialized memory will included in the array generated by the call to abi: the value of ptr will be "over incremented". For example, if the least significant byte has a value of 254, then ptr will be incremented by 127 bytes. However, as mentioned above, only 31 bytes of data are copied from storage, leaving the remaining 96 bytes uninitialized and potentially containing whatever values were previously written to scratch memory.

Impact and Resolution

The Solidity compiler team has acknowledged the validity of this bug and (as of writing) is working on a resolution, and planning to release a fixed version soon. Triggering this bug requires a contract to maliciously set its own storage to include an incorrectly encoded least significant byte. Although we are unaware of any contracts that currently exploit this bug, a malicious contracts can introduce non-determinism into its own execution or into trusted library code via delegatecalls. Note that using well-known upgrading techniques, a malicious actor can first corrupt the storage of a contract and then "upgrade" it to a apparently trusthworthy or benign contract that still exhibits this behavior.