John Toman of the Certora development team discovered a previously unknown bug in the Solidity code generation which causes the Solidity compiler to write garbage into persistent storage on some writes. Although we have not confirmed any security implications of this bug, it can lead to storage being incorrectly set to non-zero, significantly increasing contract execution costs. In addition, in certain circumstances this same bug may also cause freshly allocated bytes array elements to be non-zero values. The Solidity compiler team confirmed the bug and plan to release the fixed version on Monday, October 19th 2020.
As mentioned in our previous post, Solidity uses a bump allocator, placing newly allocated objects (structs, arrays, etc.) into sequential blocks of memory. Unlike languages with a dedicated heap (e.g., Java or Python), and like languages with unsafe memory (e.g., C, C++), data from one object may be accidentally read or written through a different object's pointer (this was the source of the overflow bug disclosed previously).
To ensure memory integrity the Solidity compiler stores the length of an array in a 32-byte word before the memory segment containing the array elements. For example, a uint256 array of length 3 actually occupies 4 contiguous 32-byte words, the first word stores the length (in this case three), followed by 3 words for each element.
A zero length array occupies single word consisting of the length field, which holds just holds zero; in this case there is no element data following the length field; the subsequent word in memory will contain nondeterministic data (zero, garbage data left in memory from a hash, or another allocated object).
Very much like in memory, arrays in storage mostly follow the same pattern as in memory: elements are laid out in contiguous slots in storage, with the length stored in a separate slot. However, there is a special case for bytes or string of length 31 or lower: in this case the Solidity compiler will pack the element data of the array in the upper (i.e., most-significant) bytes of the length storage slot, leaving the least-significant byte to hold the length of the array. Effectively, the Solidity compiler attempts to save space (and therefore gas) by using only a single storage slot for short byte arrays. For longer arrays (length 32 or greater), the length and the data are stored separately.
Given that the choice of storage layout for bytes arrays in memory is variable, it is not surprising that the code to copy from memory to storage is rather complex. The Solidity compiler generates multiple conditionals over the length of the array to select the correct copying algorithm.
In particular, the Solidity compiler generates a check to see if the length of the array is less than 32. If not, the generated code falls back on the unpacked representation. Unfortunately, it is only after Solidity checks if the length is greater than 32 does it check if the length is zero, which at that point is an impossibility. If, however, the length is 31 or less, the generated code will enter a branch that uses the packed representation. Recall that memory on the EVM is read in 32-byte chunks; thus a single read at the beginning of the array element segment is sufficient to read the entirety of the array contents. Accordingly, the Solidity compiler generates code that increments the array pointer by 32-bytes (skipping the length field), and then generates an unconditional read from that position. The data read from this read is then packed together with the length and stored into memory.
The bug occurs in the case where the array being copied is of length 0. The Solidity compiler never checks whether there is data for it to read after the length field; as mentioned above the check for length zero *only* occurs when the length is already established to be 32 or greater. In other words, the Solidity compiler (mistakenly) assumes there must be at least one byte of data. However, as mentioned above, if the array length is zero, the bytes immediately following the length field are totally arbitrary. Thus, the read and subsequent store of the "array data" in fact stores some meaningless 31-bytes followed by the length in the least significant byte (i.e., 0).
As a result, contracts that empty out bytes arrays or store empty strings will not necessarily receive the gas savings of setting a storage slot to zero, and may in fact be charged large amounts of gas for setting a slot to a non-zero value.
In addition to the gas implications described above, the corrupted array contents can be observed if the storage array is extended using the no argument version of push. Recall that the no argument version pushes the "default" value onto the array, which for bytes arrays is a 0. The code generated by the solidity compiler assumes that data stored in the upper 31 bytes of the bytes slot is zero and therefore simply increments the length portion of the packed representation. However, due to the corruption described above, the newly "added" element may in fact be non-zero, which can further corrupt future computations.
We have released an executable proof of concept that demonstrates this bug.
The above bug illustrates the importance of Certora's bytecode level verification; we verify what you actually run, which may be surprisingly different from what you intend.
We thank the Solidity compiler team for helpful discussions about this issue, and for quickly releasing a fix. We would also like to thank Christian Reitwiessner for bringing the implications for push to our attention.