John Toman, VP of R&D at Certora, discovered a previously unknown code generation bug in the Solidity compiler. This bug allows maliciously crafted byte buffers to subvert the ABI deserialization routines and access unexpected memory and/or introduce non-determinism.
In the EVM, data is exchanged between contracts with unstructured buffers of bytes. To support structured types such as structs and arrays, the Ethereum community has developed the ABI specification, which defines a serialization format used to encode structured types into byte buffers. The receiver can deserialize the buffer to reconstruct the original typed data. Crucially, the serialized buffer does not contain any information about what type of data is encoded in the buffer. Instead, the receiver must know a priori what type of data is expected to be encoded in the buffer. If a buffer does not conform to its expected encoding type, deserialization should revert.
The ABI specification makes a distinction between dynamic and static types.
A dynamic type is any type that (transitively) contains a dynamically sized array;
static types encompass all other types. Dynamic and static types are encoded differently.
Values of static types are written directly into the current position in the buffer.
In contrast, dynamic types are encoded by writing an offset into the current position in the buffer.
The actual data for the dynamic type can be found later in the buffer, as indicated by the offset.
The offset is not an absolute index within the encoded buffer.
Instead, it is relative to the beginning of the encoding of the "parent" object.
For example, consider encoding the struct
s type defined below beginning at buffer index
a is written directly at index
b is encoded by writing an offset
o to the next index (
i + 32).
o is relative to
i is the beginning index of the encoding of the parent type,
The actual data for the array
b is written at byte buffer index
i + o.
See the figure below.
The Solidity compiler does not trust that a serialized buffer is correctly formed.
At every step, the deserialization process should check that the encoding is well formed,
and should revert if it discovers otherwise.
For example, if the length of an encoded array of
uint is greater
than the number of bytes remaining in buffer being decoded, the deserialization process reverts.
These checks also apply to offsets generated by dynamic types:
if an offset would result in an out-of-bounds byte buffer index, deserialization should revert.
This check is implemented using pointer arithmetic and comparison operations.
At the beginning the deserialization process,
the code computes the "end pointer" of the encoded buffer, which is one past the last byte of the buffer.
Then, to check whether a dynamic type offset is valid,
the code first checks that the offset is less than
2**64 (to prevent overflow).
Then the code computes a pointer to the location specified by the offset
and checks if this pointer is less than the end pointer.
If not, the code reverts.
Finally, the expected size of the encoded data is computed,
and the code checks that there are enough bytes remaining in the buffer for this to make sense.
Again, if the expected size goes past the end of the buffer, the code reverts.
There are two related code generation bugs.
First, the check that offsets are less than
2**64 is omitted within a dynamic type.
For example, when encoding the type
uint, the checks are omitted for the inner
In principle, maliciously constructed buffers could include large offsets
designed to cause pointer arithmetic overflow,
which would allow "decoding" arbitrary regions of memory.
The second error is that the expected size checks are performed incorrectly. When decoding a statically sized array of dynamic types, instead of checking the correct size of the statically sized array, the deserialization code only verifies there are at least 32 bytes remaining in the buffer. A maliciously constructed buffer could cause the deserialization code to read bytes from memory past the end of the buffer. A related bug occurs when decoding an array of dynamic types: the decoding routine only checks that there are enough bytes in the buffer for the length field of the encoded array. A malicious buffer can cause the decoder to incorrectly think that arbitrary bytes from memory past the end of the buffer are offsets. Both of these bugs are exacerbated by the first error above, since overflow checks on these fake offsets are omitted.
The bug's reproduction for Solidity Compiler versions 0.8.3 and earlier is available at this link: https://github.com/Certora/MemoryIsolationPOC.
This bug affects any code that decodes serialized data in EVM memory. In particular, it can undermine code that relies on actors to perform totally independent computation: a malicious actor can craft malformed buffers that deceive the deserialization routine into sharing results stored elsewhere in the transaction's memory. The Solidity team has acknowledged this bug, and is currently developing a fix.