Formal semantics: more instructions (#366)

This commit is contained in:
Igor Mandrigin 2020-02-17 19:56:30 +03:00 committed by GitHub
parent 21ce7b4c0d
commit ec559db1c8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -2,75 +2,179 @@
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://tools.ietf.org/html/rfc2119).
## The Stack
## Data Types
Witnesses are executed on a stack machine that builds tries/calculates hashes.
### Basic
The stack consists of stack items.
Each item is a pair: `(node, hash)`, where `node` is an object representing the Merkle trie node, and `hash` is 32 byte node hash value. The exact implementation details of `node`s and `hash`es are not the part of the spec.
`nil` - an empty value.
Each witness is a queue of instructions.
`Any` - any data type. MUST NOT be `nil`.
In every execution cycle a single instruction gets dequeued and a matching substitution rule gets applied to the stack.
`Int` - an integer value. We treat the domain of integers as infinite,
the overflow behaviour or mapping to the actual data types is undefined
in this spec and should be up to implementation.
In the end, when there are no more instructions left, there MUST be only one item left on the stack.
`Byte` - a single byte.
`Hash` - 32 byte value, representing a result of Keccak256 hashing.
### Composite
`()` - an empty array of arbitrary type.
`(Type...)` - an array of a type `Type`. MUST NOT be empty.
`{field:Type}` - an object with a field `field` of type `Type`.
- full notation: `type T = {field:Type}`
- inline `type TBase = T1{field:Type}|T2{field2:Type2}`
### Type Definitions
The type definitions are a bit similar to [Haskell](https://en.wikibooks.org/wiki/Haskell/Type_declarations).
The key differences are how the arrays and type fields are defined.
### Nodes
There are multiple types of nodes that can be present on the stack.
```
type Node = HashNode{raw_hash:Hash}
| ValueNode{raw_value:(Byte...)}
| AccountNode{nonce:Int balance:Int storage:nil|Node code:nil|CodeNode|HashNode}
| LeafNode{key:(Byte...) value:ValueNode|AccountNode}
| ExtensionNode{key:(Byte...) child:Node}
| BranchNode{child0:nil|Node child1:nil|Node child3:nil|Node ... child15:nil|Node}
| CodeNode{code:(Byte... )}
```
### Witness
Witness MUST have at least 1 element.
```
type WitnessHeader = {version:Int}
type Instruction = {code:Int parameter:Any...}
type Witness = (Node|Instruction...)
```
## Execution Enviroment
The witness execution environment MUST contain the following 2 elements:
- **WitnessHeader** -- a header containing the version of the witness. The `version` MUST be 1.
- **Witness** -- a witness to be executed;
- **Substitution Rules** -- a list of all possible substitution rules.
## Execution process
Initially, the Witness MUST BE an array of `Instruction`s.
Then, as substitution rules are applied to the witness, some elements of the
array are replaces with `Node`s.
The execution continues until there are no substitution rules left to execute.
Here is how the execution code might look like in Go for building a single trie.
```go
witness := GetInitialWitness()
rules := GetSubstitutionRules()
numberOfRulesApplied := 1 // initial state
for numberOfRulesApplied > 0 {
witness, numberOfRulesApplied := ApplyRules(witness, rules)
}
if len(witness) == 1 {
trie.root = witness[0]
} else {
panic("witness execution failed")
}
```
And here is an example of the execution process (we will use the set of rules
form the **Substitution Rules** section of this document):
* **Step 1**. Witness: `(HASH h1 HASH h2 BRANCH 0b101 HASH h3 BRANCH 0b11)`
* **Step 2**. Apply `HASH` substitution rules.
Witness: `(HashNode{h1} HashNode{h2} BRANCH 0b101 HashNode{h3} BRANCH 0b11)`
* **Step 3**. Apply `BRANCH` substitution rules (only once, because `BRANCH 0b11`
doesn't pass its `GUARD` statements just yet).
Witness: `(BranchNode{0: HashNode{h1} 2:HashNode{h2}} HashNode{h3} BRANCH 0b11)`
* **Step 4**. Apply `BRANCH` substitution rules again.
Witness: `(BranchNode{0: BranchNode{0: HashNode{h1} 2:HashNode{h2}} 1:HashNode{h3}})`
* **Step 5**. No more rules are applicable, the witness contains only one
element, the execution ends successfully.
## End Criteria
The execution ends when there are no substitution rules applicable for this
witness.
### Building a single trie from the witness
If we are building a single trie from the witness, then the only SUCCESS
executon is when the following are true:
- The execution state MUST match the End Criteria
- There MUST be only one item left in the witness
- This item MUST be of the type `Node`
In that case, this last item will be the root of the built trie.
Every other end state is considered a FAILURE.
### Building a Forest
We also can build a forest of tries with this approach, by adding a new
Instruction `NEW_TRIE` and adjusting the success criteria a bit:
- The execution state MUST match the End Criteria;
- The items that are left in the witness MUST follow this pattern: `(Node
NEW_TRIE ... Node)`
- Each `Node` element will be a root of a trie.
Every other end state is considered a FAILURE.
## Instructions & Parameters
A single instruction consists of substitution rules and parameters.
Each instruction MAY have one or more parameters.
The parameters values MUST be encoded in the witness.
That makes it different from the helper function parameters that MAY come from the stack or MAY come from the witness.
- `hashNode`
- `branchNode`
- `accountNode`
- `valueNode`
- `extensionNode`
## Substitution rules
Here is an example substitution rule. The current instruction is named `INSTR1`.
A substitution rule consists of 3 parts:
```
STACK(node1, hash1) STACK(node2, hash2) INSTR1(params) |=>
STACK(helper1(node1, node2), helper2(hash1, hash2))
```
`[GUARD] PATTERN |=> RESULT`
This substitution rule replaces 2 stack items `(node1, hash1)` and `(node2, hash2)`
with a single stack item `(helper1(node1, node2), helper2(hash1, hash2))`.
- to the left of the `|=>` sign:
Where `helper1` and `helper2` are pure functions defined in pseudocode (see the example below).
- optional `GUARD` statements;
```
helper1 (value1, value2) {
return value1 + value2
}
- the pattern to match against;
helper2 (hash1, hash2) {
return KECCAK(hash1 + hash2)
}
```
- result, to the right of the `|=>` sign.
The specification for a substitution rule
```
[GUARD <CONDITION> ...]
[ STACK(<node-var-name>, <hash-var-name>), ... ] <INSTRUCTION>[(<params>)] |=>
STACK(node-value, hash-value), ...
```
There MUST be one and only one substitution rule applicable to the execution state. If an instruction has multiple substitution rules, the applicability is defined by the `GUARD` statements.
The substitution rule MAY have one or more GUARD statements.
The substitution rule MAY have one or more STACK statements before the instruction.
The substitution rule MUST have exactly one instruction.
The substitution rule MAY have parameters for the instruction.
The substitution rule MUST have at least one STACK statement after the arrow.
So, the minimal substitution rule is for the `HASH` instruction that pushes one hash to the stack:
```
HASH(hashValue) |=> STACK(hashNode(hashValue), hashValue)
```
## GUARDs
### `GUARD`s
Each substitution rule can have zero, one or multiple `GUARD` statements.
Each `GUARD` statement looks like this:
@ -79,23 +183,110 @@ Each `GUARD` statement looks like this:
GUARD <CONDITION>
```
That means that for the substitution rule to be applicable the `<CONDITION>` in the guard statement must be true.
For a substitution rule to be applicable, the `<CONDITION>` in its `GUARD` statement MUST be true.
If a substitution rule has multiple guard statements, all of the conditions specified there should be satisfied.
If a substitution rule has multiple `GUARD` statements, all of them MUST BE satisfied.
## `TRAP` statements
If there are no `GUARD` statements, the substitution rule's applicability is
only defined by the PATTERN.
If no substitution rules are applicable for an instruction, then the execution MUST stop (`TRAP` instruction) and the partial results MUST NOT be used.
Example:
```
GUARD NBITSET(mask) == 2
|---- GUARD STATEMENT ---|
Implementation of the `TRAP` instruction is undefined, but it should stop the execution flow. (e.g. in Golang it might be a function returning an error or a panic).
Node(n0) Node(n1) BRANCH(mask) |=>
BranchNode{MAKE_VALUES_ARRAY(mask, n0, n1)}
```
## Instructions & Parameters
For the example rule to be applicable both facts MUST be true:
Each instruction MAY have one or more parameters.
The parameters values MUST be situated in the witness.
The parameter values MUST NOT be taken from the stack.
1. `mask` contains only 2 bits set to 1 (the rest are set to 0);
2. to the left of `BRANCH` instruction there is at least 2 `Node`s.
Fact (1) comes from the `GUARD` statement.
### PATTERN
`[NodeType(boundVar1)... NodeType(boundVarN)] Instruction[(param1... paramN)]`
A pattern MUST contain a single instruction.
A pattern MAY contain one or more `Node`s to the left of the instruction to
match.
An instruction MAY have one or more parameters.
Pattern matching is happening by the types. `Node` type means any node is
matched, some specific node type will require a specific match.
Pattern can have bound variable names for both matched nodes and instruction
parameters (if present).
Match:
```
HASH h0 HashNode{h1} HashNode{h2} BRANCH 0b11
|------------------- MATCH -----------|
HASH h0 BranchNode{0: HashNode{h1} 1: HashNode{h2}}
|----------- SUBSTITUTED -------------------|
```
No match (not enough nodes to the left of the instruction):
```
HASH h0 HASH h1 HashNode{h2} BRANCH 0b11
```
### Result
`NodeType(HELPER_FUNCTION(arguments))`
The result is a single `Node` statement that replaces the pattern in the
witness if it matches and the guards are passed.
The result MAY contain helper functions or might have in-line computation.
The result MUST have a specific node type. No generic `Node` is allowed.
Helper functions or inline computations might use bound variables from the
pattern.
Example
```
Node(n0) Node(n1) BRANCH(mask) |=>
BranchNode{MAKE_VALUES_ARRAY(mask, n0, n1)}
^ ^-- ^--- BOUND NODES
|---- BOUND INSTRUCTION PARAM
|------ HELPER CALL ------------|
|----------------- RESULT ------------------|
```
### Bringing it all together
So the full syntax is this:
```
[GUARD <CONDITION> ...] [ NodeType(bound_variable1)... ] INSTRUCTION[(param1 ...)] |=>
Node(<HELPER_FUNCTION_OR_COMPUTATION>)
```
`NodeType` is one of the types of nodes to match. Can also be `Node` to match
any non-nil node.
Substitution rules MUST be non-ambiguous. Even though, there can be multiple
substitution rules applicable to the whole witness at the same time, there MUST
be only one rule that is applicable to a certain position in the witness.
So, the minimal substitution rule is the one for the `HASH` instruction that pushes one hash to the stack:
```
HASH(hashValue) |=> HashNode{hashValue}
```
That makes it different from the helper function parameters that MAY come from the stack or MAY come from the witness.
## Helper functions
@ -106,82 +297,126 @@ Helper functions MUST have at least one argument.
Helper functions MAY have variadic parameters: `HELPER_EXAMPLE(arg1, arg2, list...)`.
Helper functions MAY contain recursion.
## Data types
INTEGER - we treat the domain of integers as infinite, the overflow behaviour or mapping to the actual data types is undefined in this spec and should be dependent on implementation.
## Execution flow
Let's look at the example.
Our example witness would be `HASH h1; HASH h2; BRANCH 0b11`.
Initial state of the stack is ` <empty> `;
---
**1. Executing `HASH h1`**: push a `hashNode` to the stack.
Witness: `HASH h2; BRANCH 0b11`
Stack: `(hashNode(h1), h1)`
---
**2. Executing `HASH h2`**: push one more `hashNode` to the stack.
Witness `BRANCH 0b11`
Stack: `(hashNode(h2), h2); (hashNode(h1), h1)`
---
**3. Executing `BRANCH 0b11`**: replace 2 items on the stack with a single `fullNode`.
Witness: ` <empty> `
Stack: `(fullNode{0: hashNode(h2), 1: hashNode(h1)}, KECCAK(h2+h1))`
---
So our stack has exactly a single item and there are no more instructions in the witness, the execution is completed.
## Modes
There are two modes of execution for this stack machine:
(1) **normal execution** -- the mode that constructs a trie;
(2) **hash only execution** -- the mode that calculates the root hashes of a trie without constructing the trie itself;
In the mode (2), a stack item MUST only contain the node hash.
Stack item, mode (1): `(node, hash)`.
Stack item, mode (2): `(hash)`.
The exact implementation details are undefined in this spec.
## Instructions
### `LEAF key raw_value`
**Substitution rules**
Replaces the instruction with a `ValueNode` wrapped with a `LeafNode`.
```
LEAF(key, raw_value) |=> LeafNode{key, ValueNode(raw_value)}
```
### `EXTENSION key`
Wraps a node to the left of the instruction with an `ExtensionNode`.
**Substitution rules**
```
Node(node) EXTENSION(key) |=> ExtensionNode{key, node}
```
### `HASH raw_hash`
Replaces the instruction with a `HashNode`.
**Substitution rules**
```
HASH(hash_value) |=> HashNode{hash_value}
```
### `CODE raw_code`
Replaces the instruction with a `CodeNode`.
```
CODE(raw_code) |=> CodeNode{raw_code}
```
### `ACCOUNT_LEAF key nonce balance has_code has_storage`
Replaces the instruction and, optionally, up to 2 nodes to the left of the
instructon with a single `AccountNode` wrapped with a `LeafNode`.
**Substitution rules**
```
GUARD has_code == true
GUARD has_storage == true
CodeNode(code) Node(storage_hash_node) ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=>
LeafNode{key, AccountNode{nonce, balance, storage_root, code}}
---
GUARD has_code == true
GUARD has_storage == true
HashNode(code) Node(storage_hash_node) ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=>
LeafNode{key, AccountNode{nonce, balance, storage_root, code}}
---
GUARD has_code == false
GUARD has_storage == true
Node(storage_root) ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=>
LeafNode{key, AccountNode{nonce, balance, storage_root, nil}}
---
GUARD has_code == true
GUARD has_storage == false
CodeNode(code) ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=>
LeafNode{key, AccountNode{nonce, balance, nil, nil, code}}
---
GUARD has_code == true
GUARD has_storage == false
HashNode(code) ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=>
LeafNode{key, AccountNode{nonce, balance, nil, nil, code}}
---
GUARD has_code == false
GUARD has_storage == false
ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=>
LeafNode{key, AccountNode{nonce, balance, nil, nil, nil}}
```
### `NEW_TRIE`
No substitution rules. This instruction is used as a "divider" when building
a forest of tries.
### `BRANCH mask`
This instruction pops `NBITSET(mask)` items from both node stack and hash stack (up to 16 for each one). Then it pushes a new branch node on the node stack that has children according to the stack; it also pushes a new hash to the hash stack.
Replaces `NBITSET(mask)` `Node`s to the left of the instruction with a single
`BranchNode` with these nodes as children according to `mask`.
**Substitution rules**
```
GUARD NBITSET(mask) == 2
GUARD NBITSET(mask) == 2
Node(n0) Node(n1) BRANCH(mask) |=>
BranchNode{MAKE_VALUES_ARRAY(mask, n0, n1)}
STACK(n0, h0) STACK(n1, h1) BRANCH(mask) |=>
STACK(branchNode(MAKE_VALUES_ARRAY(mask, n0, n1)), keccak(CONCAT(MAKE_VALUES_ARRAY(mask, h0, n1))))
---
GUARD NBITSET(mask) == 3
STACK(n0, h0) STACK(n1, h1) STACK(n2, h2) BRANCH(mask) |=>
STACK(branchNode(MAKE_VALUES_ARRAY(mask, n0, n1, n2)), keccak(CONCAT(MAKE_VALUES_ARRAY(mask, h0, n1, n2))))
Node(n0) Node(n1) Node(n2) BRANCH(mask) |=>
BranchNode{MAKE_VALUES_ARRAY(mask, n0, n1, n2)}
---
@ -191,8 +426,8 @@ STACK(branchNode(MAKE_VALUES_ARRAY(mask, n0, n1, n2)), keccak(CONCAT(MAKE_VALUES
GUARD NBITSET(mask) == 16
STACK(n0, h0) STACK(n1, h1) ... STACK(n15, h15) BRANCH(mask) |=>
STACK(branchNode(MAKE_VALUES_ARRAY(mask, n0, n1, ..., n15)), keccak(CONCAT(MAKE_VALUES_ARRAY(mask, h0, n1, ..., n15))))
Node(n0) Node(n1) ... Node(n15) BRANCH(mask) |=>
BranchNode{MAKE_VALUES_ARRAY(mask, n0, n1, ..., n15)}
```
## Helper functions
@ -221,6 +456,12 @@ MAKE_VALUES_ARRAY(mask, idx, values...) {
}
```
### `RLP(value)`
returns the RLP encoding of a value
### `NBITSET(number)`
returns number of bits set in the binary representation of `number`.
@ -245,4 +486,114 @@ returns the first value in the specified array
### `REST(array)`
returns the array w/o the first item
returns the array w/o the first item
### `KECCAK(bytes)`
returns a keccak-256 hash of `bytes`
## Serialization
The format for serialization of everything except hashes (that we know the
length of) is [CBOR](https://cbor.io). It is RFC-specified and concise.
For hashes we use the optimization of knowing the lengths, so we just read 32
bytes
### Block Witness Format
Each block witness consists of a header followed by a list of instructions.
There is no length of witness specified anywhere, the code expects to just reach `EOF`.
Serialized Witness: `(HEADER, OP1, OP2, ..., OPn-1, OPn, EOF)`
#### Encoding
##### CBOR
The parts of the key that are encoded with CBOR are marked by the `CBOR` function.
##### Keys
Keys are also using custom encryption to make them more compact.
The nibbles of a key are encoded in a following way `[FLAGS NIBBLE1+NIBBLE2 NIBBLE3+NIBBLE4 NIBBLE5... ]`
*FLAGS*
* bit 0 -- 1 if the number of nibbles were odd
* bit 1 -- 1 if the nibbles end with 0x10 (the terminator byte)
This is shown later as `ENCODE_KEY` function.
#### Header
format: `version:byte`
encoded as `[ version ]`
the current version is 1.
#### Instructions
Each instruction starts with an opcode (`uint`).
Then it might contain some data.
##### `LEAF`
format: `LEAF key:[]byte value:[]byte`
encoded as `[ 0x00 CBOR(ENCODE_KEY(key))... CBOR(value)... ]`
##### `EXTENSION`
format: `EXTENSION key:[]byte`
encoded as `[ 0x01 CBOR(ENCODE_KEY(key))... ]`
##### `BRANCH`
format: `BRANCH mask:uint32`
*mask* defines which children are present
(e.g. `0000000000001011` means that children 0, 1 and 3 are present and the other ones are not)
encoded as `[ 0x02 CBOR(mask)...]`
##### `HASH`
format: `HASH hash:[32]byte`
encoded as `[ 0x03 hash_byte_1 ... hash_byte_32 ]`
##### `CODE`
format: `CODE code:[]byte`
encoded as `[ 0x04 CBOR(code)... ]`
##### `ACCOUNT_LEAF`
format: `ACCOUNT_LEAF key:[]byte flags [nonce:uint64] [balance:[]byte]`
encoded as `[ 0x05 CBOR(ENCODE_KEY(key))... flags /CBOR(nonce).../ /CBOR(balance).../ ]`
*flags* is a bitset encoded in a single bit (see [`witness_operators_test.go`](../../trie/witness_operators_test.go) to see flags in action).
* bit 0 defines if **code** is present; if set to 1, then `has_code=true`;
* bit 1 defines if **storage** is present; if set to 1, then `has_storage=true`;
* bit 2 defines if **nonce** is not 0; if set to 0, *nonce* field is not encoded;
* bit 3 defines if **balance** is not 0; if set to 0, *balance* field is not encoded;
##### `NEW_TRIE`
format: `NEW_TRIE`
encoded as `[ 0xBB ]`