mirror of
https://gitlab.com/pulsechaincom/erigon-pulse.git
synced 2024-12-22 03:30:37 +00:00
save snippet to add Geth to dev net
This commit is contained in:
parent
752f76847d
commit
96ea5438d9
@ -7,8 +7,9 @@ On a high level, Ethereum state is a collection of accounts. An account can be e
|
||||
(also known as "Externally Owned Account", or EOA), or a smart contract.
|
||||
|
||||
### Content of an account
|
||||
Type `Account` [core/types/accounts/account.go](../../core/types/accounts/account.go) lists the main components
|
||||
of an account's content (not identifier):
|
||||
|
||||
Type `Account` [core/types/accounts/account.go](../../core/types/accounts/account.go) lists the main components of an
|
||||
account's content (not identifier):
|
||||
|
||||
1. Nonce
|
||||
2. Balance
|
||||
@ -16,81 +17,96 @@ of an account's content (not identifier):
|
||||
4. Code hash
|
||||
|
||||
#### Nonce
|
||||
|
||||
Number of the type `uint64`.
|
||||
|
||||
For non-contract accounts, nonce is important in two contexts. Firstly, all transactions signed by an account, have to
|
||||
appear in the ledger in the order of strictly increasing nonces (without gaps). This check is performed by member function `preCheck` of
|
||||
type `StateTransition` [core/state_transition.go](../../core/state_transition.go)
|
||||
Secondly, if a transaction signed by an account is sent to no
|
||||
particular address (with intention of creating a contract), and it ends up creating a new smart contract account, the address of such newly
|
||||
created smart contract account is calculated based on the current nonce of the "creator" account.
|
||||
For smart contract accounts, nonce is important when `CREATE` opcode is executed on behalf of such account. This computation is
|
||||
performed in the member function `Create` of the type `EVM` [core/vm/evm.go](../../core/vm/evm.go). Note the difference between the
|
||||
member function `Create2`, where the address of the newly created contract is independent of the nonce. For contract accounts,
|
||||
appear in the ledger in the order of strictly increasing nonces (without gaps). This check is performed by member
|
||||
function `preCheck` of type `StateTransition` [core/state_transition.go](../../core/state_transition.go)
|
||||
Secondly, if a transaction signed by an account is sent to no particular address (with intention of creating a contract)
|
||||
, and it ends up creating a new smart contract account, the address of such newly created smart contract account is
|
||||
calculated based on the current nonce of the "creator" account. For smart contract accounts, nonce is important
|
||||
when `CREATE` opcode is executed on behalf of such account. This computation is performed in the member
|
||||
function `Create` of the type `EVM` [core/vm/evm.go](../../core/vm/evm.go). Note the difference between the member
|
||||
function `Create2`, where the address of the newly created contract is independent of the nonce. For contract accounts,
|
||||
the nonce is important in the context of creating new contracts via `CREATE` opcode.
|
||||
|
||||
#### Balance
|
||||
Number of the type `*big.Int`. Since balance is denominated in wei, and there 10^18 wei in each Ether, the maximum balance that needs to be
|
||||
representable is 110'000'000 (roughly amount of mainnet Ether in existence, but can be more for testnets and private networks),
|
||||
multiplied by 10^18, which exceeds the capacity of `uint64`.
|
||||
|
||||
Number of the type `*big.Int`. Since balance is denominated in wei, and there 10^18 wei in each Ether, the maximum
|
||||
balance that needs to be representable is 110'000'000 (roughly amount of mainnet Ether in existence, but can be more for
|
||||
testnets and private networks), multiplied by 10^18, which exceeds the capacity of `uint64`.
|
||||
|
||||
#### Root
|
||||
|
||||
Binary 32-byte (256-bit) string.
|
||||
|
||||
By root here one means the Merkle root of the smart contract storage, organised into a tree. Non-contract accounts cannot have storage,
|
||||
therefore root makes sense only for smart contract accounts. For non-contract accounts, the root field is assumed to be equal to the
|
||||
Merkle root of an empty tree, which is hard-coded in the variable `EmptyRoot` in
|
||||
[turbo/trie/trie.go](../../turbo/trie/trie.go). For contract accounts, the root is computed using member function `Hash` of
|
||||
type `Trie` [turbo/trie/trie.go](../../turbo/trie/trie.go), once the storage of the contract has been organised into the tree by calling member functions
|
||||
By root here one means the Merkle root of the smart contract storage, organised into a tree. Non-contract accounts
|
||||
cannot have storage, therefore root makes sense only for smart contract accounts. For non-contract accounts, the root
|
||||
field is assumed to be equal to the Merkle root of an empty tree, which is hard-coded in the variable `EmptyRoot` in
|
||||
[turbo/trie/trie.go](../../turbo/trie/trie.go). For contract accounts, the root is computed using member function `Hash`
|
||||
of type `Trie` [turbo/trie/trie.go](../../turbo/trie/trie.go), once the storage of the contract has been organised into
|
||||
the tree by calling member functions
|
||||
`Update` and `Delete` on the same type.
|
||||
|
||||
#### Code hash
|
||||
|
||||
Binary 32-byte (256-bit) string.
|
||||
|
||||
Hash of the bytecode (deployed code) of a smart contract. The computation of the code hash is performed in the `SetCode` member function
|
||||
of the type `IntraBlockState` [code/state/intra_block_state.go](../../core/state/intra_block_state.go). Since a non-contract account has no bytecode, code hash only
|
||||
makes sense for smart contract accounts. For non-contract accounts, the code hash is assumed to be equal to the hash of `nil`, which is
|
||||
hard-coded in the variable `emptyCode` in [code/state/intra_block_state.go](../../core/state/intra_block_state.go)
|
||||
Hash of the bytecode (deployed code) of a smart contract. The computation of the code hash is performed in the `SetCode`
|
||||
member function of the type `IntraBlockState` [code/state/intra_block_state.go](../../core/state/intra_block_state.go).
|
||||
Since a non-contract account has no bytecode, code hash only makes sense for smart contract accounts. For non-contract
|
||||
accounts, the code hash is assumed to be equal to the hash of `nil`, which is hard-coded in the variable `emptyCode`
|
||||
in [code/state/intra_block_state.go](../../core/state/intra_block_state.go)
|
||||
|
||||
### Address - identifier of an account
|
||||
Accounts are identified by their addresses. Address is a 20-byte binary string, which is derived differently for smart contract and
|
||||
non-contract accounts.
|
||||
|
||||
For non-contract accounts, the address is derived from the public key, by hashing it and taking lowest 20 bytes of the 32-byte hash value,
|
||||
as shown in the function `PubkeyToAddress` in the file [crypto/crypto.go](../../crypto/crypto.go)
|
||||
Accounts are identified by their addresses. Address is a 20-byte binary string, which is derived differently for smart
|
||||
contract and non-contract accounts.
|
||||
|
||||
For smart contract accounts created by a transaction without destination, or by `CREATE` opcode, the address is derived from the
|
||||
address and the nonce of the creator, as shown in the function `CreateAddress` in the file [crypto/crypto.go](../../crypto/crypto.go)
|
||||
For non-contract accounts, the address is derived from the public key, by hashing it and taking lowest 20 bytes of the
|
||||
32-byte hash value, as shown in the function `PubkeyToAddress` in the file [crypto/crypto.go](../../crypto/crypto.go)
|
||||
|
||||
For smart contract accounts created by `CREATE2` opcode, the address is derived from the creator's address, salt (256-bit argument
|
||||
supplied to the `CREATE2` invocation), and the code hash of the initialisation code (code that is executed to output the actual,
|
||||
deployed code of the new contract), as shown in the function `CreateAddress2` in the file [crypto/crypto.go](../../crypto/crypto.go)
|
||||
For smart contract accounts created by a transaction without destination, or by `CREATE` opcode, the address is derived
|
||||
from the address and the nonce of the creator, as shown in the function `CreateAddress` in the
|
||||
file [crypto/crypto.go](../../crypto/crypto.go)
|
||||
|
||||
In many places in the code, sets of accounts are represented by mappings from account addresses to the objects representing
|
||||
the accounts themselves, for example, field `stateObjects` in the type `IntraBlockState` [core/state/intra_block_state.go](../../core/state/intra_block_state.go).
|
||||
Member functions of the type `IntraBlockState` that are for querying and modifying one of the components of an accounts, are all accepting
|
||||
address as their first argument, see functions `GetBalance`, `GetNonce`, `GetCode`, `GetCodeSize`, `GetCodeHash`, `GetState` (this
|
||||
one queries an item in the contract storage), `GetCommittedState`, `AddBalance`, `SubBalance`, `SetBalance`, `SetNonce`,
|
||||
`SetCode`, `SetState` (this one modifies an item in the contract storage) [core/state/intra_block_state.go](../../core/state/intra_block_state.go).
|
||||
For smart contract accounts created by `CREATE2` opcode, the address is derived from the creator's address, salt (
|
||||
256-bit argument supplied to the `CREATE2` invocation), and the code hash of the initialisation code (code that is
|
||||
executed to output the actual, deployed code of the new contract), as shown in the function `CreateAddress2` in the
|
||||
file [crypto/crypto.go](../../crypto/crypto.go)
|
||||
|
||||
In many places in the code, sets of accounts are represented by mappings from account addresses to the objects
|
||||
representing the accounts themselves, for example, field `stateObjects` in the
|
||||
type `IntraBlockState` [core/state/intra_block_state.go](../../core/state/intra_block_state.go). Member functions of the
|
||||
type `IntraBlockState` that are for querying and modifying one of the components of an accounts, are all accepting
|
||||
address as their first argument, see functions `GetBalance`, `GetNonce`, `GetCode`, `GetCodeSize`, `GetCodeHash`
|
||||
, `GetState` (this one queries an item in the contract storage), `GetCommittedState`, `AddBalance`, `SubBalance`
|
||||
, `SetBalance`, `SetNonce`,
|
||||
`SetCode`, `SetState` (this one modifies an item in the contract
|
||||
storage) [core/state/intra_block_state.go](../../core/state/intra_block_state.go).
|
||||
|
||||
Organising Ethereum State into a Merkle Tree
|
||||
--------------------------------------------
|
||||
Ethereum network produces checkpoints of the Ethereum State after every block. These checkpoints come in a form of 32-byte binary
|
||||
string, which is the root hash of the Merkle tree constructed out of the accounts in the state. This root hash is often
|
||||
referred to as "State root". It is part of block header, and is contained in the field `Root` of the type
|
||||
Ethereum network produces checkpoints of the Ethereum State after every block. These checkpoints come in a form of
|
||||
32-byte binary string, which is the root hash of the Merkle tree constructed out of the accounts in the state. This root
|
||||
hash is often referred to as "State root". It is part of block header, and is contained in the field `Root` of the type
|
||||
`Header` [core/types/block.go](../../core/types/block.go)
|
||||
|
||||
Prior to Byzantium release, the state root was also part of every transaction receipt, and was contained in the field `PostState`
|
||||
Prior to Byzantium release, the state root was also part of every transaction receipt, and was contained in the
|
||||
field `PostState`
|
||||
of the type `Receipt` [core/types/receipt.go](../../core/types/receipt.go).
|
||||
|
||||
To keep the Merkle Patricia trie of Ethereum state balanced, all the keys (either addresses of Ethereum accounts and contracts,
|
||||
or storage positions within contract storage) are converted into their respective hashes using `Keccak256` hash function.
|
||||
`PlainStateBucket` stores state with keys before hashing, `CurrentStateBucket` store same data but keys are hashed.
|
||||
To keep the Merkle Patricia trie of Ethereum state balanced, all the keys (either addresses of Ethereum accounts and
|
||||
contracts, or storage positions within contract storage) are converted into their respective hashes using `Keccak256`
|
||||
hash function.
|
||||
`PlainStateBucket` stores state with keys before hashing, `CurrentStateBucket` store same data but keys are hashed.
|
||||
|
||||
### Hexary radix "Patricia" tree
|
||||
|
||||
Ethereum uses hexary (radix == 16) radix tree to guide the algorithm of computing the state root. For the purposes of
|
||||
illustrations, we will use trees with radix 4 (because radix 16 requires many more items for "interesting" features
|
||||
to appear). We start from a set of randomly looking keys, 2 bytes (or 8 quaternary digits) each.
|
||||
illustrations, we will use trees with radix 4 (because radix 16 requires many more items for "interesting" features to
|
||||
appear). We start from a set of randomly looking keys, 2 bytes (or 8 quaternary digits) each.
|
||||
|
||||
![prefix_groups_1](prefix_groups_1.png)
|
||||
To regenerate this picture, run `go run cmd/pics/pics.go -pic prefix_groups_1`
|
||||
@ -100,33 +116,35 @@ Next, we sort them in lexicographic order.
|
||||
![prefix_groups_2](prefix_groups_2.png)
|
||||
To regenerate this picture, run `go run cmd/pics/pics.go -pic prefix_groups_2`
|
||||
|
||||
Next, we introduce the notion of a prefix group. Collection of adjacent keys form a prefix group if these keys share
|
||||
the same prefix, and no other keys share this prefix. Here are the prefix groups for our example:
|
||||
Next, we introduce the notion of a prefix group. Collection of adjacent keys form a prefix group if these keys share the
|
||||
same prefix, and no other keys share this prefix. Here are the prefix groups for our example:
|
||||
|
||||
![prefix_groups_3](prefix_groups_3.png)
|
||||
To regenerate this picture, run `go run cmd/pics/pics.go -pic prefix_groups_3`
|
||||
|
||||
The entire collection of keys form one implicit prefix group, with the empty prefix.
|
||||
|
||||
Merkle Patricia tree hashing rules first remove redundant parts of each key within groups, making key-value
|
||||
pairs so-called "leaf nodes". To produce the hash of a leaf node, one applies the hash function to the two-piece RLP (Recursive Length Prefix).
|
||||
The first piece is the representation of the non-redundant part of the key. And the second piece is the
|
||||
representation of the leaf value corresponding to the key, as shown in the member function `hashChildren` of the
|
||||
type `hasher` [turbo/trie/hasher.go](../../turbo/trie/hasher.go), under the `*shortNode` case.
|
||||
Merkle Patricia tree hashing rules first remove redundant parts of each key within groups, making key-value pairs
|
||||
so-called "leaf nodes". To produce the hash of a leaf node, one applies the hash function to the two-piece RLP (
|
||||
Recursive Length Prefix). The first piece is the representation of the non-redundant part of the key. And the second
|
||||
piece is the representation of the leaf value corresponding to the key, as shown in the member function `hashChildren`
|
||||
of the type `hasher` [turbo/trie/hasher.go](../../turbo/trie/hasher.go), under the `*shortNode` case.
|
||||
|
||||
Hashes of the elements within a prefix group are combined into so-called "branch nodes". They correspond to the
|
||||
types `duoNode` (for prefix groups with exactly two elements) and `fullNode` in the file [turbo/trie/node.go](../../turbo/trie/node.go).
|
||||
To produce the hash of a branch node, one represents it as an array of 17 elements (17-th element is for the attached leaf,
|
||||
if exists).
|
||||
The positions in the array that do not have corresponding elements in the prefix group are filled with empty strings. This is
|
||||
shown in the member function `hashChildren` of the type `hasher` [turbo/trie/hasher.go](../../turbo/trie/hasher.go), under the `*duoNode` and
|
||||
types `duoNode` (for prefix groups with exactly two elements) and `fullNode` in the
|
||||
file [turbo/trie/node.go](../../turbo/trie/node.go). To produce the hash of a branch node, one represents it as an array
|
||||
of 17 elements (17-th element is for the attached leaf, if exists). The positions in the array that do not have
|
||||
corresponding elements in the prefix group are filled with empty strings. This is shown in the member
|
||||
function `hashChildren` of the type `hasher` [turbo/trie/hasher.go](../../turbo/trie/hasher.go), under the `*duoNode`
|
||||
and
|
||||
`*fullNode` cases.
|
||||
|
||||
Sometimes, nested prefix groups have longer prefixes than 1-digit extension of their encompassing prefix group, as it is the case
|
||||
in the group of items `12, 13` or in the group of items `29, 30, 31`. Such cases give rise to so-called "extension nodes".
|
||||
However, the value in an extension node is always the representation of a prefix group, rather than a leaf. To produce the hash of an extension node,
|
||||
one applies the hash function to the two-piece RLP. The first piece is the representation of the non-redundant part of the key.
|
||||
The second part is the hash of the branch node representing the prefix group. This shown in the member function `hashChildren` of the
|
||||
Sometimes, nested prefix groups have longer prefixes than 1-digit extension of their encompassing prefix group, as it is
|
||||
the case in the group of items `12, 13` or in the group of items `29, 30, 31`. Such cases give rise to so-called "
|
||||
extension nodes". However, the value in an extension node is always the representation of a prefix group, rather than a
|
||||
leaf. To produce the hash of an extension node, one applies the hash function to the two-piece RLP. The first piece is
|
||||
the representation of the non-redundant part of the key. The second part is the hash of the branch node representing the
|
||||
prefix group. This shown in the member function `hashChildren` of the
|
||||
type `hasher` [turbo/trie/hasher.go](../../turbo/trie/hasher.go), under the `*shortNode` case.
|
||||
|
||||
This is the illustration of resulting leaf nodes, branch nodes, and extension nodes for our example:
|
||||
@ -137,17 +155,17 @@ To regenerate this picture, run `go run cmd/pics/pics.go -pic prefix_groups_4`
|
||||
### Separation of keys and the structure
|
||||
|
||||
Our goal here will be to construct an algorithm that can produce the hash of the Merkle Patricia Tree of a sorted
|
||||
sequence of key-value pair, in one simple pass (i.e. without look-aheads and buffering, but with a stack).
|
||||
Another goal (perhaps more important)
|
||||
sequence of key-value pair, in one simple pass (i.e. without look-aheads and buffering, but with a stack). Another
|
||||
goal (perhaps more important)
|
||||
is to be able to split the sequence of key-value pairs into arbitrary chunks of consecutive keys, and reconstruct the
|
||||
root hash from hashes of the individual chunks (note that a chunk might need to have more than one hash).
|
||||
|
||||
Let's say that we would like to split the ordered sequence of 32 key-value pairs into 4 chunks, 8 pairs in each. We would then
|
||||
like to compute the hashes (there might be more than one hash per chunk) of each chunk separately. After that, we would like
|
||||
to combine the hashes of the chunks into the root hash.
|
||||
Let's say that we would like to split the ordered sequence of 32 key-value pairs into 4 chunks, 8 pairs in each. We
|
||||
would then like to compute the hashes (there might be more than one hash per chunk) of each chunk separately. After
|
||||
that, we would like to combine the hashes of the chunks into the root hash.
|
||||
|
||||
Our approach would be to generate some additional information, which we will call "structural information", for each chunk,
|
||||
as well as for the composition of chunks. This structural information can be a sequence of these "opcodes":
|
||||
Our approach would be to generate some additional information, which we will call "structural information", for each
|
||||
chunk, as well as for the composition of chunks. This structural information can be a sequence of these "opcodes":
|
||||
|
||||
1. `LEAF length-of-key`
|
||||
2. `LEAFHASH length-of-key`
|
||||
@ -157,21 +175,22 @@ as well as for the composition of chunks. This structural information can be a s
|
||||
6. `BRANCHHASH set-of-digits`
|
||||
7. `HASH number-of-hashes`
|
||||
|
||||
The description of semantics will require the introduction of two stacks, which always have the same length. One
|
||||
of the stacks (we call it "hash stack") contains hashes produced by opcodes. Another stack (we call it "node stack")
|
||||
contains leaf nodes, branch nodes, or extension nodes of the trie being built.
|
||||
In some cases, where the presence of a node is not required, the corresponding entry in the node stack is empty, or `nil`.
|
||||
As well as the stack, the description requires the introduction of two input sequences (or "tapes"). The first tape contains
|
||||
key-value pairs, each pair can be viewed as two opaque binary strings of arbitrary length, usually with the requirement
|
||||
that the whole sequence is sorted by the lexicographic order of the keys, and all the keys are distinct.
|
||||
The second tape contains hashes, each 32 bytes long.
|
||||
The description of semantics will require the introduction of two stacks, which always have the same length. One of the
|
||||
stacks (we call it "hash stack") contains hashes produced by opcodes. Another stack (we call it "node stack")
|
||||
contains leaf nodes, branch nodes, or extension nodes of the trie being built. In some cases, where the presence of a
|
||||
node is not required, the corresponding entry in the node stack is empty, or `nil`. As well as the stack, the
|
||||
description requires the introduction of two input sequences (or "tapes"). The first tape contains key-value pairs, each
|
||||
pair can be viewed as two opaque binary strings of arbitrary length, usually with the requirement that the whole
|
||||
sequence is sorted by the lexicographic order of the keys, and all the keys are distinct. The second tape contains
|
||||
hashes, each 32 bytes long.
|
||||
|
||||
N.B. Though there two stacks, we can sometimes just say "the stack", since they are always of the same size and are operated upon in unison.
|
||||
For example, when we say that XXX pops something from the stack, we mean that XXX pops 1 item from the hash stack and 1 item from the node stack,
|
||||
but then only one of those two items may be used later and the other may be discarded by XXX.
|
||||
N.B. Though there two stacks, we can sometimes just say "the stack", since they are always of the same size and are
|
||||
operated upon in unison. For example, when we say that XXX pops something from the stack, we mean that XXX pops 1 item
|
||||
from the hash stack and 1 item from the node stack, but then only one of those two items may be used later and the other
|
||||
may be discarded by XXX.
|
||||
|
||||
`LEAF` opcode consumes the next key-value pair from the first tape, creates a new leaf node and pushes it onto the node stack. It also
|
||||
pushes the hash of that node onto the hash stack. The operand
|
||||
`LEAF` opcode consumes the next key-value pair from the first tape, creates a new leaf node and pushes it onto the node
|
||||
stack. It also pushes the hash of that node onto the hash stack. The operand
|
||||
`length-of-key` specifies how many digits of the key become part of the leaf node. For example, for the leaf `11`
|
||||
in our example, it will be 6 digits, and for the leaf `12`, it will be 4 digits. Special case of `length-of-key`
|
||||
being zero, pushes the value onto the stack and discards the key.
|
||||
@ -179,33 +198,31 @@ being zero, pushes the value onto the stack and discards the key.
|
||||
`LEAFHASH` has almost the same semantics as `LEAF`, with the difference that it does not need to produce the leaf node,
|
||||
but only its hash (which can be more efficient in terms of allocations). It places `nil` onto the node stack.
|
||||
|
||||
`EXTENSION` opcode has a key as its operand. This key is a sequence of digits, which, in our example, can only be
|
||||
of length 1, but generally, it can be longer. The action of this opcode is to pop one item from the stack, create
|
||||
an extension node with the key provided in the operand, and the value being the item popped from the stack, and
|
||||
push this extension node onto the node stack (and push its hash onto the hash stack).
|
||||
`EXTENSION` opcode has a key as its operand. This key is a sequence of digits, which, in our example, can only be of
|
||||
length 1, but generally, it can be longer. The action of this opcode is to pop one item from the stack, create an
|
||||
extension node with the key provided in the operand, and the value being the item popped from the stack, and push this
|
||||
extension node onto the node stack (and push its hash onto the hash stack).
|
||||
|
||||
`EXTENSIONHASH` has almost the same semantics as `EXTENSION`, with the difference that it does not need to produce the
|
||||
extension node, but only its hash (which can be more efficient in terms of allocations). It places `nil` onto the node
|
||||
stack.
|
||||
|
||||
`BRANCH` opcode has a set of hex digits as its operand. This set can be encoded as a bitset, for example. The action of
|
||||
this opcode is to pop the same
|
||||
number of items from the stack as the number of digits in the operand's set, create a branch node, and push it
|
||||
onto the node stack (and push its hash onto the hash stack). Sets of digits can be seen as the horizontal
|
||||
rectangles on the picture `prefix_groups_4`.
|
||||
The correspondence between digits in the operand's set and the items popped from the stack is as follows.
|
||||
The top of the stack (the item being popped off first)
|
||||
this opcode is to pop the same number of items from the stack as the number of digits in the operand's set, create a
|
||||
branch node, and push it onto the node stack (and push its hash onto the hash stack). Sets of digits can be seen as the
|
||||
horizontal rectangles on the picture `prefix_groups_4`. The correspondence between digits in the operand's set and the
|
||||
items popped from the stack is as follows. The top of the stack (the item being popped off first)
|
||||
corresponds to the highest digit, and the item being popped off last corresponds to the lowest digit in the set.
|
||||
|
||||
`BRANCHHASH` opcode is similar to the `BRANCH` with the difference is that instead of constructing the
|
||||
branch node, it only creates its 32-byte hash. It places hash of the node onto the hash stack, and `nil` onto
|
||||
the node stack.
|
||||
`BRANCHHASH` opcode is similar to the `BRANCH` with the difference is that instead of constructing the branch node, it
|
||||
only creates its 32-byte hash. It places hash of the node onto the hash stack, and `nil` onto the node stack.
|
||||
|
||||
`HASH` opcode takes specified number of hashes from the input sequence (tape) of hashes, and places them on
|
||||
the hash stack. It also places the same number of `nil` entries onto the node stack. The first item consumed ends up
|
||||
the deepest on the stack, the last item consumed ends up on the top of the stack.
|
||||
`HASH` opcode takes specified number of hashes from the input sequence (tape) of hashes, and places them on the hash
|
||||
stack. It also places the same number of `nil` entries onto the node stack. The first item consumed ends up the deepest
|
||||
on the stack, the last item consumed ends up on the top of the stack.
|
||||
|
||||
This is the structural information for the chunk containing leaves from `0` to `7` (inclusive):
|
||||
|
||||
```
|
||||
LEAF 5
|
||||
LEAF 5
|
||||
@ -219,8 +236,9 @@ LEAF 6
|
||||
BRANCH 0123
|
||||
LEAF 5
|
||||
```
|
||||
After executing these opcodes against the chunk, we will have 2 items on the stack, first representing the branch
|
||||
node (or its hash) for the prefix group of leaves `0` to `6`, and the second representing one leaf node for the leaf
|
||||
|
||||
After executing these opcodes against the chunk, we will have 2 items on the stack, first representing the branch node (
|
||||
or its hash) for the prefix group of leaves `0` to `6`, and the second representing one leaf node for the leaf
|
||||
`7`. It can be observed that if we did not see what the next key after the leaf `7` is, we would not know the operand
|
||||
for the last `LEAF` opcode. If the next key started with the prefix `101` instead of `103`, the last opcode could have
|
||||
been `LEAF 4` (because leaves `7` and `8` would have formed a prefix group).
|
||||
@ -255,20 +273,21 @@ HASH 1
|
||||
BRANCH 0123
|
||||
```
|
||||
|
||||
These opcodes are implemented by the type `HashBuilder` (implements the interface `structInfoReceiver`) in [turbo/trie/hashbuilder.go](../../turbo/trie/hashbuilder.go)
|
||||
These opcodes are implemented by the type `HashBuilder` (implements the interface `structInfoReceiver`)
|
||||
in [turbo/trie/hashbuilder.go](../../turbo/trie/hashbuilder.go)
|
||||
|
||||
### Multiproofs
|
||||
|
||||
Encoding structural information separately from the sequences of key-value pairs and hashes allows
|
||||
describing so-called "multiproofs". Suppose that we know the root hash of the sequence of key-value pairs for our
|
||||
example, but we do not know any of the pairs themselves. And we ask someone to reveal keys and value for
|
||||
the leaves `3`, `8`, `22` and `23`, and enough information to prove to us that the revealed keys and values
|
||||
indeed belong to the sequence. Here is the picture that gives the idea of which hashes need to be provided
|
||||
together with the selected key-value pairs.
|
||||
Encoding structural information separately from the sequences of key-value pairs and hashes allows describing
|
||||
so-called "multiproofs". Suppose that we know the root hash of the sequence of key-value pairs for our example, but we
|
||||
do not know any of the pairs themselves. And we ask someone to reveal keys and value for the leaves `3`, `8`, `22`
|
||||
and `23`, and enough information to prove to us that the revealed keys and values indeed belong to the sequence. Here is
|
||||
the picture that gives the idea of which hashes need to be provided together with the selected key-value pairs.
|
||||
![prefix_groups_8](prefix_groups_8.png)
|
||||
To regenerate this picture, run `go run cmd/pics/pics.go -pic prefix_groups_8`
|
||||
|
||||
And here is the corresponding structural information:
|
||||
|
||||
```
|
||||
HASH 2
|
||||
LEAF 5
|
||||
@ -302,117 +321,121 @@ In order to devise an algorithm for generating the structural information, we re
|
||||
![prefix_groups_3](prefix_groups_3.png)
|
||||
To regenerate this picture, run `go run cmd/pics/pics.go -pic prefix_groups_3`
|
||||
|
||||
It can then be readily observed that the first item in any prefix group has this property that its common prefix
|
||||
with the item immediately to the right (or empty string if the item is the very last) is longer than its common
|
||||
prefix with the item immediately to the left (or empty string if the item is the very first).
|
||||
Analogously, the last item in any prefix group has the property that its common prefix with the item
|
||||
immediately to the left is longer than its common prefix with the item immediately to the right.
|
||||
It can then be readily observed that the first item in any prefix group has this property that its common prefix with
|
||||
the item immediately to the right (or empty string if the item is the very last) is longer than its common prefix with
|
||||
the item immediately to the left (or empty string if the item is the very first). Analogously, the last item in any
|
||||
prefix group has the property that its common prefix with the item immediately to the left is longer than its common
|
||||
prefix with the item immediately to the right.
|
||||
|
||||
The algorithm proceeds in steps, one step for each key-value pair, in the lexicographic order of the keys. At each step,
|
||||
it observes two keys (sequences of digits) - current, and succeeding. Because the algorithm emits
|
||||
opcodes that manipulate the stack (technically, two stacks, but because they are always of the same lengths, we can
|
||||
just say "stack"), it keeps track of what is currently on the stack. Each prefix group which is currently being
|
||||
it observes two keys (sequences of digits) - current, and succeeding. Because the algorithm emits opcodes that
|
||||
manipulate the stack (technically, two stacks, but because they are always of the same lengths, we can just say "stack")
|
||||
, it keeps track of what is currently on the stack. Each prefix group which is currently being
|
||||
"assembled" by the algorithm, has some number of items on the stack. This is being tracked by an item in the `groups`
|
||||
slice. The index of the item in the slice is equal to the length of the prefix of the prefix group. And the `uint16`
|
||||
value of the item is the bitmask, with one bit per digit (and also per item on the stack). Whenever the algorithm
|
||||
emits an opcode that would push something on the stack, one of the items in the `groups` slice gains one extra bit
|
||||
to its bitmask. When the algorithm emits an opcode that would pop one or more things from the stack, the corresponding
|
||||
item gets removed from the `groups` slice. The slice `groups` is started off empty and it is shared between steps.
|
||||
value of the item is the bitmask, with one bit per digit (and also per item on the stack). Whenever the algorithm emits
|
||||
an opcode that would push something on the stack, one of the items in the `groups` slice gains one extra bit to its
|
||||
bitmask. When the algorithm emits an opcode that would pop one or more things from the stack, the corresponding item
|
||||
gets removed from the `groups` slice. The slice `groups` is started off empty and it is shared between steps.
|
||||
Algorithm's step can also be invoked recursively from another step, with current and preceding keys specified by the
|
||||
caller.
|
||||
|
||||
A step starts with computing the prefix of the smallest prefix group that the current key belongs to. It is either
|
||||
the common prefix of current key and the preceding key or the common prefix of current key and the succeeding key,
|
||||
whichever is longer (if they are the same length, then they are also equal, so no ambiguity there).
|
||||
If the common prefix with the succeeding key is longer, then the new prefix group is being created.
|
||||
If necessary, `groups` slice is expanded (by adding 0 items) so that it has the same length as the common prefix.
|
||||
A step starts with computing the prefix of the smallest prefix group that the current key belongs to. It is either the
|
||||
common prefix of current key and the preceding key or the common prefix of current key and the succeeding key, whichever
|
||||
is longer (if they are the same length, then they are also equal, so no ambiguity there). If the common prefix with the
|
||||
succeeding key is longer, then the new prefix group is being created. If necessary, `groups` slice is expanded (by
|
||||
adding 0 items) so that it has the same length as the common prefix.
|
||||
|
||||
The digit of the current key immediately following the max common prefix is called "extra digit".
|
||||
The sequence of digits of the current key following that extra digit is the remainder (which could be empty).
|
||||
The item in the `groups` slice corresponding to the common prefix (basically `groups[len(common prefix)]`) is modified
|
||||
to include an extra bit corresponding to the "extra digit".
|
||||
If this step of the algorithm was invoked on a key-value pair (non-recursively), then a `LEAF` (or `LEAFHASH`)
|
||||
opcode is emitted, with the operand being the length of the remainder (zero if the remainder is empty).
|
||||
If the step of the algorithm was invoked recursively, and the remainder is not empty, an `EXTENSION` (or `EXTENSIONHASH`) opcode
|
||||
is emitted instead, with the operand being the remainder.
|
||||
For example, for leaf `12`, the lengths of the common prefix with neighbours are 1 and 3. Therefore, this key will emit the opcode
|
||||
`LEAF 4`, where 4 = 8 (original length) - 3 (max common prefix length) - 1 (one digit goes to the branch node for the prefix group).
|
||||
The digit of the current key immediately following the max common prefix is called "extra digit". The sequence of digits
|
||||
of the current key following that extra digit is the remainder (which could be empty). The item in the `groups` slice
|
||||
corresponding to the common prefix (basically `groups[len(common prefix)]`) is modified to include an extra bit
|
||||
corresponding to the "extra digit". If this step of the algorithm was invoked on a key-value pair (non-recursively),
|
||||
then a `LEAF` (or `LEAFHASH`)
|
||||
opcode is emitted, with the operand being the length of the remainder (zero if the remainder is empty). If the step of
|
||||
the algorithm was invoked recursively, and the remainder is not empty, an `EXTENSION` (or `EXTENSIONHASH`) opcode is
|
||||
emitted instead, with the operand being the remainder. For example, for leaf `12`, the lengths of the common prefix with
|
||||
neighbours are 1 and 3. Therefore, this key will emit the opcode
|
||||
`LEAF 4`, where 4 = 8 (original length) - 3 (max common prefix length) - 1 (one digit goes to the branch node for the
|
||||
prefix group).
|
||||
|
||||
The following, optional, part of the step only happens if the common prefix of the current key and the preceding key is longer or equal than
|
||||
the common prefix of the current key and the succeeding key, in other words, if at least one prefix group needs to be "closed".
|
||||
Closing a prefix group means first emitting opcode `BRANCH` or `BRANCHHASH`. The value for the operand is taken from the
|
||||
item in the `groups` slice, which corresponds to the length of the prefix for this group. Once value is taken, `groups` slice is
|
||||
trimmed to remove the used item. Secondly, closing a prefix groups means invoking the step of the algorithm recursively
|
||||
(unless the group that was closed was the one with the empty prefix, which encompasses all the keys). For that recursive invocation,
|
||||
the prefix of the closed group is used as the current key, and the succeeding key simply passed on. Preceding key is found as the
|
||||
prefix of the current key of the length equal of the highest index of non-zero item in the `groups` (in other words,
|
||||
the longest prefix of a prefix group which would have something on the stack). During the recursive invocation, the slice `groups`
|
||||
The following, optional, part of the step only happens if the common prefix of the current key and the preceding key is
|
||||
longer or equal than the common prefix of the current key and the succeeding key, in other words, if at least one prefix
|
||||
group needs to be "closed". Closing a prefix group means first emitting opcode `BRANCH` or `BRANCHHASH`. The value for
|
||||
the operand is taken from the item in the `groups` slice, which corresponds to the length of the prefix for this group.
|
||||
Once value is taken, `groups` slice is trimmed to remove the used item. Secondly, closing a prefix groups means invoking
|
||||
the step of the algorithm recursively
|
||||
(unless the group that was closed was the one with the empty prefix, which encompasses all the keys). For that recursive
|
||||
invocation, the prefix of the closed group is used as the current key, and the succeeding key simply passed on.
|
||||
Preceding key is found as the prefix of the current key of the length equal of the highest index of non-zero item in
|
||||
the `groups` (in other words, the longest prefix of a prefix group which would have something on the stack). During the
|
||||
recursive invocation, the slice `groups`
|
||||
is trimmed to match the length of the preceding key that was found.
|
||||
|
||||
We will walk through the steps of the algorithm for the leaf `30`, and then for the leaf `31`.
|
||||
For `30`, the key is `33113123`. Its max common prefix with neighbours is `3311`. The common prefix with the preceding key
|
||||
is longer than with the succeeding key, therefore the prefix group `3311` is being closed. The digit immediately
|
||||
following this prefix is `3`. Since this is a non-recursive invocation, and the remainder `123` is 3 digits long,
|
||||
opcode `LEAF 3` is emitted.
|
||||
The optional part of the step happens, and he opcode `BRANCH 23` (closing the prefix group) is emitted. Slice `groups` contained
|
||||
the bit for `2` in the item `groups[4]` already, and another bit for `3` has been added, therefore we have `23` as the operand.
|
||||
Slice `groups` gets trimmed to contain only 4 items. After that, the step gets invoked recursively
|
||||
with current key being `3311`, and preceding key identified as `3` (there were no prefix group with prefix `33` or `331` yet,
|
||||
this can be figured out by checking the `groups` slice, where the highest index with non-zero item is 1).
|
||||
We will walk through the steps of the algorithm for the leaf `30`, and then for the leaf `31`. For `30`, the key
|
||||
is `33113123`. Its max common prefix with neighbours is `3311`. The common prefix with the preceding key is longer than
|
||||
with the succeeding key, therefore the prefix group `3311` is being closed. The digit immediately following this prefix
|
||||
is `3`. Since this is a non-recursive invocation, and the remainder `123` is 3 digits long, opcode `LEAF 3` is emitted.
|
||||
The optional part of the step happens, and he opcode `BRANCH 23` (closing the prefix group) is emitted. Slice `groups`
|
||||
contained the bit for `2` in the item `groups[4]` already, and another bit for `3` has been added, therefore we
|
||||
have `23` as the operand. Slice `groups` gets trimmed to contain only 4 items. After that, the step gets invoked
|
||||
recursively with current key being `3311`, and preceding key identified as `3` (there were no prefix group with
|
||||
prefix `33` or `331` yet, this can be figured out by checking the `groups` slice, where the highest index with non-zero
|
||||
item is 1).
|
||||
|
||||
In the recursive invocation of the step,
|
||||
max common prefix is `331`. The common prefix with the succeeding key is longer than with the
|
||||
preceding key, therefore a new prefix group `331` is created. Slice `groups` gets extended to 4 items, and the fourth item
|
||||
In the recursive invocation of the step, max common prefix is `331`. The common prefix with the succeeding key is longer
|
||||
than with the preceding key, therefore a new prefix group `331` is created. Slice `groups` gets extended to 4 items, and
|
||||
the fourth item
|
||||
(`group[3]`) gets item containing one bit for digit `1`. No more recursion.
|
||||
|
||||
For leaf `31` (key `33132002`), max common prefix is `331`. The common prefix with the preceding key is longer than
|
||||
with the succeeding key, therefore the prefix group `331` is being closed. This is the group that was created during the
|
||||
step for leaf `30` described above. The digit immediately following this prefix is `3`. Corresponding bit is added to
|
||||
the item `groups[3]`. Since this is a non-recursive invocation, opcode `LEAF 4` is emitted (4 is the length of the remainder `2002`).
|
||||
The optional part of the step happens, opcode `BRANCH 13` is emitted, and slice `group` is trimmed to 3 items to remove the item `groups[3]`.
|
||||
The step gets invoked recursively
|
||||
with current key being `331`, and preceding key identified as `3` (there were no prefix group with prefix `33`).
|
||||
For leaf `31` (key `33132002`), max common prefix is `331`. The common prefix with the preceding key is longer than with
|
||||
the succeeding key, therefore the prefix group `331` is being closed. This is the group that was created during the step
|
||||
for leaf `30` described above. The digit immediately following this prefix is `3`. Corresponding bit is added to the
|
||||
item `groups[3]`. Since this is a non-recursive invocation, opcode `LEAF 4` is emitted (4 is the length of the
|
||||
remainder `2002`). The optional part of the step happens, opcode `BRANCH 13` is emitted, and slice `group` is trimmed to
|
||||
3 items to remove the item `groups[3]`. The step gets invoked recursively with current key being `331`, and preceding
|
||||
key identified as `3` (there were no prefix group with prefix `33`).
|
||||
|
||||
In the recursive step, max common prefix is `3`. The common prefix with the preceding key is longer than with the
|
||||
succeeding key, therefore the prefix group `3` is being closed. The digit immediately following this prefix is `3`.
|
||||
The remainder `1` is non-empty, and since this is a recursive invocation, opcode
|
||||
`EXTENSION 1` is emitted.
|
||||
The optional part of the step happens, opcode `BRANCH 023` is emitted for the prefix group `3` being closed. Slice `groups` is
|
||||
trimmed to just 1 item. The step gets invoked recursively again,
|
||||
with current key being `3`, and preceding key empty.
|
||||
succeeding key, therefore the prefix group `3` is being closed. The digit immediately following this prefix is `3`. The
|
||||
remainder `1` is non-empty, and since this is a recursive invocation, opcode
|
||||
`EXTENSION 1` is emitted. The optional part of the step happens, opcode `BRANCH 023` is emitted for the prefix group `3`
|
||||
being closed. Slice `groups` is trimmed to just 1 item. The step gets invoked recursively again, with current key
|
||||
being `3`, and preceding key empty.
|
||||
|
||||
In the deeper recursive step, max common prefix is empty. Since the common prefix with the preceding key equals to
|
||||
the common prefix with the succeeding key (they are both empty). The optional part of the step happens, opcode `BRANCH 0123`
|
||||
In the deeper recursive step, max common prefix is empty. Since the common prefix with the preceding key equals to the
|
||||
common prefix with the succeeding key (they are both empty). The optional part of the step happens, opcode `BRANCH 0123`
|
||||
is emitted, and `groups` is trimmed to become empty. No recursive invocation follows.
|
||||
|
||||
The step of this algorithm is implemented by the function `GenStructStep` in [turbo/trie/gen_struct_step.go](../../turbo/trie/gen_struct_step.go).
|
||||
The step of this algorithm is implemented by the function `GenStructStep`
|
||||
in [turbo/trie/gen_struct_step.go](../../turbo/trie/gen_struct_step.go).
|
||||
|
||||
### Converting sequence of keys and value into a multiproof
|
||||
|
||||
One of the biggest difference between Erigon and go-ethereum is in the way the Ethereum state is persisted in
|
||||
the database. In go-ethereum, the model for persistence is Merkle Patricia tree. In Erigon, the model for
|
||||
persistence is sequence of key-value pairs, where keys are either derived from account addresses, or from
|
||||
storage indices. In this model, computing Merkle Patricia tree from part of data is a very commonly used operation.
|
||||
This operation is called "Resolution" because it normally arises from a need to look up (resolve) some keys and corresponding
|
||||
values, and later update them, thus requiring recomputation of the Merkle Patricia tree root.
|
||||
One of the biggest difference between Erigon and go-ethereum is in the way the Ethereum state is persisted in the
|
||||
database. In go-ethereum, the model for persistence is Merkle Patricia tree. In Erigon, the model for persistence is
|
||||
sequence of key-value pairs, where keys are either derived from account addresses, or from storage indices. In this
|
||||
model, computing Merkle Patricia tree from part of data is a very commonly used operation. This operation is called "
|
||||
Resolution" because it normally arises from a need to look up (resolve) some keys and corresponding values, and later
|
||||
update them, thus requiring recomputation of the Merkle Patricia tree root.
|
||||
|
||||
We can use the concept of Multiproofs to define the resolution operation. If we have a set of key-value pairs, and we need
|
||||
to "resolve" them, we effectively need to produce a multiproof for the given set of key-value pairs.
|
||||
To produce such multiproof, we can use the algorithm for generating the structural information from the sequence of keys.
|
||||
However, within the algorithm, choices need to be made between emitting `BRANCHHASH` and `BRANCH` opcodes (or, similarly,
|
||||
between `LEAF` and `LEAFHASH`, and between `EXTENSION` and `EXTENSIONHASH`). Such choices
|
||||
are conceptually simple to make - if max common prefix is also a prefix of any of the keys we are trying to resolve,
|
||||
`BRANCH` should be emitted, otherwise, `BRANCHHASH` should be emitted.
|
||||
However, in order to make these choices efficiently, the set of keys being resolved will be converted into a sorted
|
||||
list. Then, at each point when the algorithm processes a key, it maintains references to two consecutive keys from
|
||||
that sorted list - one "LTE" (Less Than or Equal to the currently processed key), and another "GT" (Greater Than the
|
||||
currently processed key). If max common prefix is also prefix of either LTE or GT, then `BRANCH` opcode is emitted,
|
||||
otherwise, `BRANCHHASH` opcode is emitted. This is implemented by the type `ResolveSet` in [turbo/trie/resolve_set.go](../../turbo/trie/resolve_set.go)
|
||||
We can use the concept of Multiproofs to define the resolution operation. If we have a set of key-value pairs, and we
|
||||
need to "resolve" them, we effectively need to produce a multiproof for the given set of key-value pairs. To produce
|
||||
such multiproof, we can use the algorithm for generating the structural information from the sequence of keys. However,
|
||||
within the algorithm, choices need to be made between emitting `BRANCHHASH` and `BRANCH` opcodes (or, similarly,
|
||||
between `LEAF` and `LEAFHASH`, and between `EXTENSION` and `EXTENSIONHASH`). Such choices are conceptually simple to
|
||||
make - if max common prefix is also a prefix of any of the keys we are trying to resolve,
|
||||
`BRANCH` should be emitted, otherwise, `BRANCHHASH` should be emitted. However, in order to make these choices
|
||||
efficiently, the set of keys being resolved will be converted into a sorted list. Then, at each point when the algorithm
|
||||
processes a key, it maintains references to two consecutive keys from that sorted list - one "LTE" (Less Than or Equal
|
||||
to the currently processed key), and another "GT" (Greater Than the currently processed key). If max common prefix is
|
||||
also prefix of either LTE or GT, then `BRANCH` opcode is emitted, otherwise, `BRANCHHASH` opcode is emitted. This is
|
||||
implemented by the type `ResolveSet` in [turbo/trie/resolve_set.go](../../turbo/trie/resolve_set.go)
|
||||
|
||||
### Extension of the structure to support contracts with contract storage
|
||||
When it is required to construct tries containing accounts as well as contract storage, and contract code, the
|
||||
set of opcodes making up the structural information need to be extended by four more. Apart from that, a new input
|
||||
sequence (tape) is added, containing the bytecodes of contracts.
|
||||
|
||||
When it is required to construct tries containing accounts as well as contract storage, and contract code, the set of
|
||||
opcodes making up the structural information need to be extended by four more. Apart from that, a new input sequence (
|
||||
tape) is added, containing the bytecodes of contracts.
|
||||
|
||||
8. `CODE`
|
||||
9. `CODEHASH`
|
||||
@ -420,61 +443,65 @@ sequence (tape) is added, containing the bytecodes of contracts.
|
||||
11. `ACCOUNTLEAFHASH length field-set`
|
||||
12. `EMPTYROOT`
|
||||
|
||||
`CODE` opcode consumes the next item in the bytecode sequence, creates a code node and pushes it onto the node stack.
|
||||
It also pushes the hash of the byte code onto the hash stack.
|
||||
`CODE` opcode consumes the next item in the bytecode sequence, creates a code node and pushes it onto the node stack. It
|
||||
also pushes the hash of the byte code onto the hash stack.
|
||||
|
||||
`CODEHASH` opcode consumes the next hash from the hash sequence, pushes it onto the hash stack, and pushes `nil` into the
|
||||
node stack.
|
||||
`CODEHASH` opcode consumes the next hash from the hash sequence, pushes it onto the hash stack, and pushes `nil` into
|
||||
the node stack.
|
||||
|
||||
`ACCOUNTLEAF` opcode is similar to `LEAF`. It consumes the next item from the key tape. The rest of the semantics
|
||||
depends on the value of the `field-set`. Field set can be respresented by a bitmask. In that case, bit 0 would
|
||||
correspond to field 0, bit 1 (number 2) - to field 1, bit 2 (number 4) - to field 2. Currently, field 0 means
|
||||
account nonce, field 1 means account balance, field 2 means contract storage, field 3 means contract code.
|
||||
correspond to field 0, bit 1 (number 2) - to field 1, bit 2 (number 4) - to field 2. Currently, field 0 means account
|
||||
nonce, field 1 means account balance, field 2 means contract storage, field 3 means contract code.
|
||||
|
||||
* If field 0 is present in the `field-set`, the opcode consumes one item from the nonce tape (tape 0), otherwise
|
||||
it assumes default nonce (zero). This becomes the nonce of the newly created account/contract node.
|
||||
* If field 1 is present in the `field-set`, the opcode consumes one item from the balance tape (tape 1), otherwise
|
||||
it assumes default balance (zero). This becomes the balance of the newly created account/contract node.
|
||||
* If field 2 is present in the `field-set`, the opcode pops a node from the node stack and a hash from the hash stack.
|
||||
This node or hash (in this order of preference) becomes the storage of the newly created contract node.
|
||||
Storage root can be empty (that would introduced by `EMPTYROOT` opcode).
|
||||
* If field 3 is present in the `field-set`, the opcode pops a code node from the node stack and a hash from the hash stack.
|
||||
This node or hash (in the order of preference) becomes the code or code hash of the newly created contract node.
|
||||
* If field 0 is present in the `field-set`, the opcode consumes one item from the nonce tape (tape 0), otherwise it
|
||||
assumes default nonce (zero). This becomes the nonce of the newly created account/contract node.
|
||||
* If field 1 is present in the `field-set`, the opcode consumes one item from the balance tape (tape 1), otherwise it
|
||||
assumes default balance (zero). This becomes the balance of the newly created account/contract node.
|
||||
* If field 2 is present in the `field-set`, the opcode pops a node from the node stack and a hash from the hash stack.
|
||||
This node or hash (in this order of preference) becomes the storage of the newly created contract node. Storage root
|
||||
can be empty (that would introduced by `EMPTYROOT` opcode).
|
||||
* If field 3 is present in the `field-set`, the opcode pops a code node from the node stack and a hash from the hash
|
||||
stack. This node or hash (in the order of preference) becomes the code or code hash of the newly created contract
|
||||
node.
|
||||
|
||||
Out of all the information collected through the tapes and the stacks (as directed by the `field-set`), an account leaf node
|
||||
is constructed and pushed onto the node stack. Its hash is pushed onto the hash stack.
|
||||
Field set is introduced to make the specification of what is an account extensible in a backwards compatible way.
|
||||
If a new field is added to the account in the future, it can be introduced without a need to re-encode the pre-existing
|
||||
structures.
|
||||
Out of all the information collected through the tapes and the stacks (as directed by the `field-set`), an account leaf
|
||||
node is constructed and pushed onto the node stack. Its hash is pushed onto the hash stack. Field set is introduced to
|
||||
make the specification of what is an account extensible in a backwards compatible way. If a new field is added to the
|
||||
account in the future, it can be introduced without a need to re-encode the pre-existing structures.
|
||||
|
||||
`ACCOUNTLEAFHASH` opcode's difference from `ACCOUNTLEAF` is that it does not push the leaf node onto the node stack,
|
||||
pushing `nil` instead. The hash of would-be account leaf node is pushed onto the hash stack.
|
||||
|
||||
`EMPTYROOT` is a way of placing a special value signifying an empty node onto the node stack. It also pushes the
|
||||
corresponding hash onto the hash stack. This opcode is introduced because there is no way of achieving its semantics
|
||||
by means of other opcodes.
|
||||
corresponding hash onto the hash stack. This opcode is introduced because there is no way of achieving its semantics by
|
||||
means of other opcodes.
|
||||
|
||||
### Merkle trie root calculation
|
||||
|
||||
**Theoretically:** "Merkle trie root calculation" starts from state, build from state keys - trie,
|
||||
on each level of trie calculates intermediate hash of underlying data.
|
||||
**Theoretically:** "Merkle trie root calculation" starts from state, build from state keys - trie, on each level of trie
|
||||
calculates intermediate hash of underlying data.
|
||||
|
||||
**Practically:** It can be implemented as "Preorder trie traversal" (Preorder - visit Root, visit Left, visit Right).
|
||||
**Practically:** It can be implemented as "Preorder trie traversal" (Preorder - visit Root, visit Left, visit Right).
|
||||
But, let's make couple observations to make traversal over huge state efficient.
|
||||
|
||||
**Observation 1:** `CurrentStateBucket` already stores state keys in sorted way.
|
||||
Iteration over this bucket will retrieve keys in same order as "Preorder trie traversal".
|
||||
**Observation 1:** `CurrentStateBucket` already stores state keys in sorted way. Iteration over this bucket will
|
||||
retrieve keys in same order as "Preorder trie traversal".
|
||||
|
||||
**Observation 2:** each Eth block - changes not big part of state - it means most of Merkle trie intermediate hashes will not change.
|
||||
It means we effectively can cache them. `IntermediateTrieHashBucket` stores "Intermediate hashes of all Merkle trie levels".
|
||||
It also sorted and Iteration over `IntermediateTrieHashBucket` will retrieve keys in same order as "Preorder trie traversal".
|
||||
**Observation 2:** each Eth block - changes not big part of state - it means most of Merkle trie intermediate hashes
|
||||
will not change. It means we effectively can cache them. `IntermediateTrieHashBucket` stores "Intermediate hashes of all
|
||||
Merkle trie levels". It also sorted and Iteration over `IntermediateTrieHashBucket` will retrieve keys in same order
|
||||
as "Preorder trie traversal".
|
||||
|
||||
**Implementation:** by opening 1 Cursor on state and 1 more Cursor on intermediate hashes bucket - we will receive data in
|
||||
order of "Preorder trie traversal". Cursors will only do "sequential reads" and "jumps forward" - been hardware-friendly.
|
||||
1 stack keeps all accumulated hashes, when sub-trie traverse ends - all hashes pulled from stack -> hashed -> new hash puts on stack - it's hash of visited sub-trie (it emulates recursive nature of "Preorder trie traversal" algo).
|
||||
**Implementation:** by opening 1 Cursor on state and 1 more Cursor on intermediate hashes bucket - we will receive data
|
||||
in order of "Preorder trie traversal". Cursors will only do "sequential reads" and "jumps forward" - been
|
||||
hardware-friendly. 1 stack keeps all accumulated hashes, when sub-trie traverse ends - all hashes pulled from stack ->
|
||||
hashed -> new hash puts on stack - it's hash of visited sub-trie (it emulates recursive nature of "Preorder trie
|
||||
traversal" algo).
|
||||
|
||||
Imagine that account with key 0000....00 (64 zeroes, 32 bytes of zeroes) changed. Here is an example sequence which can
|
||||
be seen by running 2 Cursors:
|
||||
|
||||
Imagine that account with key 0000....00 (64 zeroes, 32 bytes of zeroes) changed.
|
||||
Here is an example sequence which can be seen by running 2 Cursors:
|
||||
```
|
||||
00 // key which came from cache, can't use it - because account with this prefix changed
|
||||
0000 // key which came from cache, can't use it - because account with this prefix changed
|
||||
@ -491,34 +518,45 @@ Here is an example sequence which can be seen by running 2 Cursors:
|
||||
ff // key which came from cache, it is "next sub-trie" (1 byte shorter key), use it, jump to "next sub-trie"
|
||||
nil // db returned nil - means no more keys there, done
|
||||
```
|
||||
On practice Trie is no full - it means after account key `{30 zero bytes}0000` may come `{5 zero bytes}01` and amount of iterations will not be big.
|
||||
|
||||
On practice Trie is no full - it means after account key `{30 zero bytes}0000` may come `{5 zero bytes}01` and amount of
|
||||
iterations will not be big.
|
||||
|
||||
### Attack - by delete account with huge state
|
||||
|
||||
It's possible to create Account with very big storage (increase storage size during many blocks).
|
||||
Then delete this account (SELFDESTRUCT).
|
||||
Naive storage deletion may take several minutes - depends on Disk speed - means every Eth client
|
||||
will not process any incoming block that time. To protect against this attack:
|
||||
PlainState, HashedState and IntermediateTrieHash buckets have "incarnations". Account entity has field "Incarnation" -
|
||||
just a digit which increasing each SELFDESTRUCT or CREATE2 opcodes. Storage key formed by:
|
||||
`{account_key}{incarnation}{storage_hash}`. And [turbo/trie/trie_root.go](../../turbo/trie/trie_root.go) has logic - every time
|
||||
when Account visited - we save it to `accAddrHashWithInc` variable and skip any Storage or IntermediateTrieHashes with another incarnation.
|
||||
It's possible to create Account with very big storage (increase storage size during many blocks). Then delete this
|
||||
account (SELFDESTRUCT). Naive storage deletion may take several minutes - depends on Disk speed - means every Eth client
|
||||
will not process any incoming block that time. To protect against this attack:
|
||||
PlainState, HashedState and IntermediateTrieHash buckets have "incarnations". Account entity has field "Incarnation" -
|
||||
just a digit which increasing each SELFDESTRUCT or CREATE2 opcodes. Storage key formed by:
|
||||
`{account_key}{incarnation}{storage_hash}`. And [turbo/trie/trie_root.go](../../turbo/trie/trie_root.go) has logic -
|
||||
every time when Account visited - we save it to `accAddrHashWithInc` variable and skip any Storage or
|
||||
IntermediateTrieHashes with another incarnation.
|
||||
|
||||
Transaction processing
|
||||
----------------------
|
||||
|
||||
The main function of Ethereum software is to continuously process blocks and generate some information as the result of this processing.
|
||||
The diagram below shows schematically the main types of information being processed and generated.
|
||||
The main function of Ethereum software is to continuously process blocks and generate some information as the result of
|
||||
this processing. The diagram below shows schematically the main types of information being processed and generated.
|
||||
![processing](processing_blocks.png)
|
||||
|
||||
For an ordinary (non-mining) node,
|
||||
block headers and block bodies are coming from the outside, via the peer-to-peer network. While processing those, the node maintains
|
||||
the view of the current state of the Ethereum (bright yellow box), as well as generates the timestamped history of the changes in the state
|
||||
(this is optional). History of the state is shown as a series of dull yellow boxes. Some transactions also produce log messages, and those
|
||||
are included into transaction receipts. Receipts are also optionally persisted and are shown as green stacks of sheets. Erigon's
|
||||
default mode of operation does not persist the receipts, but recalculates them on demand. It looks up the state at the point just before
|
||||
the transaction in question (for which we would like a receipt), re-executes transaction, and re-generates the receipt. This can only
|
||||
work if the history of the state is available for the period of time including the transaction.
|
||||
For an ordinary (non-mining) node, block headers and block bodies are coming from the outside, via the peer-to-peer
|
||||
network. While processing those, the node maintains the view of the current state of the Ethereum (bright yellow box),
|
||||
as well as generates the timestamped history of the changes in the state
|
||||
(this is optional). History of the state is shown as a series of dull yellow boxes. Some transactions also produce log
|
||||
messages, and those are included into transaction receipts. Receipts are also optionally persisted and are shown as
|
||||
green stacks of sheets. Erigon's default mode of operation does not persist the receipts, but recalculates them on
|
||||
demand. It looks up the state at the point just before the transaction in question (for which we would like a receipt),
|
||||
re-executes transaction, and re-generates the receipt. This can only work if the history of the state is available for
|
||||
the period of time including the transaction.
|
||||
|
||||
[See more about blocks and transactions processing here](../../eth/stagedsync/README.md)
|
||||
|
||||
|
||||
Dev Net with Geth nodes
|
||||
-----------------------
|
||||
|
||||
- create somewhere file `erigon.dev` with content `26e86e45f6fc45ec6e2ecd128cec80fa1d1505e5507dcd2ae58c3130a7a97b48`
|
||||
- run: `geth account import ./erigon.dev`
|
||||
- add to geth flag `--miner.etherbase=67b1d87101671b127f5f8714789c7192f7ad340e`
|
||||
- follow https://geth.ethereum.org/docs/interface/private-network
|
||||
|
Loading…
Reference in New Issue
Block a user