EVM semantics - writeup (#450)

* More on semantics

* fixing typos

* Gas purchase rule

* Small fix
This commit is contained in:
ledgerwatch 2020-04-14 13:49:38 +01:00 committed by GitHub
parent 73b897b2b5
commit a228495d03
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 127 additions and 0 deletions

View File

@ -215,6 +215,10 @@ func (st *StateTransition) TransitionDb() (ret []byte, usedGas uint64, failed bo
vmerr error
)
if contractCreation {
// The reason why we don't increment nonce here is that we need the original
// nonce to calculate the address of the contract that is being created
// It does get incremented inside the `Create` call, after the computation
// of the contract's address, but before the execution of the code.
ret, _, st.gas, vmerr = evm.Create(sender, st.data, st.gas, st.value)
} else {
// Increment the nonce for the next transaction

119
docs/evm_semantics.md Normal file
View File

@ -0,0 +1,119 @@
Corresponding code is in the folder `semantics`.
## EVM without opcodes (Ether transfers only)
We start with looking at a very restricted version of EVM, having no opcodes. That means only Ether transfers are possible.
Even that seemingly simple case already has a relatively complex semantics.
First, we would like to define what kind semantics we are looking for. Ethereum is a state machine, which has a global state, and
transactions that trigger some state transition. There are also some state transitions that happen at the end of each block. These are
related to miner and ommer rewards. The transition from one state to another is deterministic. Given the initial state, some extra
environment (recent header hashes, current block number, timestamp of the current block), and the transaction object, there is only
one possible valid end state of the transition. For this reason, the state transitions are often described as a state
transition function:
`STATE_end = STATE_TR_FUNC(STATE_init, env, tx)`
But this is not the only way to describe state transitions.
And it is not always the most convinient way. In our approach, we will instead describe state transitions as a set of boolean expressions.
They are boolean in the sense that they can be evaluated to either `true` or `false`, but they can rely on variety of types.
These expressions will only tell us whether certain combination of the initial state, environment, transaction object, and the end state,
are all part of a valid state transition. However, they do not have a goal of helping us to determine (compute) the end state from
the initial state, environment, and the transaction object. So they look more like this:
`EXPRESSION(STATE_init, env, tx, STATE_end) == true?`
Moreover, this representation allows for some non determinsm, which means that there could be some extra "oracle" input that helps the
evaluation:
`EXPRESSION(STATE_init, env, tx, STATE_end, ORACLE_input) == true?`
From our experience with EVM and Ethereum, we know that each transaction can normally be decomposed into series of intermediate transitions.
These intermediate transitions could be various internal accounting steps of Ethereum (like purchasing gas for Ether, or gas refunds), or
steps of the EVM interpreter. In other words, the decomposion could look omething like this:
`EXPRESSION_1(STATE_init, env, tx, STATE_1, ORACLE_input1) == E1`
`EXPRESSION_2(STATE_1, env, tx, STATE_2, ORACLE_input2) == E2`
`EXPRESSION_3(STATE_2, env, tx, STATE_3, ORACLE_input3) == E3`
`...`
`EXPRESSION_N(STATE_N-1, env, tx, STATE_end, ORACLE_inputN) == EN`
How do these expressions re-compose back into the `EXPRESSION`?. First of all, we need to add another condition:
`E1 && E2 && E3 && ... && EN == true?`
Obviously, oracle input of the `EXPRESSION` could simply be a collection of all individual oracle inputs.
Next observation we make is that the intermediate transitions are usually quite "local", which means that if we compare
the initial state and the end state of each of those transitions, the difference between can be described succintly. This
can be understood as the length of the description having a pre-determined upper bound. Since there is no upper bound on
the size of the state, we need a way of specifying part of the state the state transition modifies. At this point, we
choose to represent the state as an array of accounts. Here, we use "array" in the context of theory of arrays. In this
theory, there are two main functions defined for an array: `select` and `store`. Expression `select(a, i)` evaluates to the
`i`-th element of the array `a`. Expression `store(a, i, v)` evaluates to a new array which is equal to the array `a`
except for the value in the index `i`, where `select(store(a, i, v), i) == v`. If the state is an array of accounts,
what is an account? We define it as a structure consisting of 5 fields: balance, nonce, code length, code, and storage.
Balance, nonce, and code length are integers. Code is an array of integers (one integer per byte of bytecode).
Storage is also an array of integers. Since the length of the storage is not observable from within the EVM, we do not need
to represent it.
Transaction is a structure. The field which we will be concerned with first is the sender. Although we are used to think of
sender as being an address, which is 20-byte string, for the purpose of semantic description, it will be an integer,
usually quite large, in the range `[0..2^160)`. These integers are used as indices in the state array. Therefore,
if `tx.sender` is the variable used for transaction sender, and `state` is the variable used for the state,
sender's balance is `select(state, tx.sender).balance`, and nonce is `select(state, tx.sender).nonce`. In Ethereum,
a transaction is considered to be valid to be applied to a state (and hence included into a block), if two conditions
are satisfied:
1. `select(state, tx.sender).nonce = tx.nonce`
2. `select(state, tx.sender).balance >= tx.gasLimit * tx.gasPrice`
If these two conditions are satisfied, and we assume that the transaction made it into a block, the sender must
"purchase" the gas. It is important to do it here, because once gas is purchased, it can only be given back after
the transaction's full execution, either in the case of no error, or by running into a `REVERT` opcode in the top-most
execution context. Therefore, at this point, the state needs to be modified to subtract the cost of the
purchased gas. It is important, because during the execution, the balance of the sender's account can be observed by
a `BALANCE` opcode, and situations where subtraction was made and not made can be distinguished, and lead to
different outcomes. In order to describe such state transition, we will use the notation of substitution rules
(these can always be transformed into logical expressions). Our rules will consist of three parts: pattern,
substitution, and guard.
In the pattern, we describe the initial state of the transition. Pattern matching is a powerful resource, and it
comes with these aspects:
1. It allows matching a part of a sequence while ignoring the rest of that sequence.
2. It can match types, letting us avoid the type-reflection functions in the semantics.
3. It can bind variables, to be used in the other two parts of the rule - substitution and guard.
We hope that the destructuring aspect of pattern matching will not be required for our purposes.
In the substitution, we describe how the part matched by the pattern in the initial state will look like
in the end state. Substitution can use any variables bound by the pattern.
The guard is a boolean expression. It contains at least one variable bound by the pattern, and specifies the
condition under which the entire rule is applicable. If the guard evaluates to `false`, and, consequently,
the rule is not applicable, there can be another rule that is applicable. Since EVM is deterministic by design,
we expect that in a correct semantics, at most one rule is applicable in any given configuration.
Our gas purchasing transition can be expressed by the following rule:
```
state, tx ==> store(state, tx.sender, account{
balance: prevbalance - cost,
nonce: prevaccount.nonce,
codelength: prevaccount.codelength,
code: prevaccount.code,
storage: prevaccount.storage
}), tx.gasLimit, tx
WHERE prevaccount = select(state, tx.sender)
prevbalance = prevaccount.balance
cost = tx.gasLimit * tx.gasPrice
WHEN prevaccount.nonce = tx.nonce AND prevbalance >= cost
```
We have introduced `WHERE` section to define some variable for convinience, and placed the guard into the `WHEN` section.
Pattern and substitution are on the first line, separated by the arrow `==>`. One needs to notice that the substution
contains three-element sequence whereas the pattern contains only two elements. If the gas purchase is successful,
the "gas" object gets introduced. It will be useful for the subsequent rules, because now we can make all of them
simply require the presence of that "gas" object, instead of repeating the guards of the gas purchasing rule.

View File

@ -14,6 +14,8 @@ Z3_sort contract_code_sort;
Z3_constructor account_constructor;
// Account type
Z3_sort account_sort;
// Ethereum state type (array sort of accounts)
Z3_sort state_sort;
// Maximum number of terms in the sequence - constant for now to avoid dynamic memory allocation
#define MAX_TERMS 1024*1024
@ -82,6 +84,8 @@ void init() {
Z3_constructor account_constructors[1];
account_constructors[0] = account_constructor;
account_sort = Z3_mk_datatype(ctx, Z3_mk_string_symbol(ctx, "account"), 1, account_constructors);
// state is the sort of accounts
state_sort = Z3_mk_array_sort(ctx, int_sort, account_sort);
}
// Initialises the sequence with given state root and transaction data