erigon-pulse/common/changeset
ledgerwatch 5ea590c18e
State cache switching writes to reads during commit (#1368)
* State cache init

* More code

* Fix lint

* More tests

* More tests

* More tests

* Fix test

* Transformations

* remove writeQueue, before fixing the tests

* Fix tests

* Add more tests, incarnation to the code items

* Fix lint

* Fix lint

* Remove shards prototype, add incarnation to the state reader code

* Clean up and replace cache in call_traces stage

* fix flaky test

* Save changes

* Readers to use addrHash, writes - addresses

* Fix lint

* Fix lint

* More accurate tracking of size

* Optimise for smaller write batches

* Attempt to integrate state cache into Execution stage

* cacheSize to default flags

* Print correct cache sizes and batch sizes

* cacheSize in the integration

* Fix tests

* Fix lint

* Remove print

* Fix exec stage

* Fix test

* Refresh sequence on write

* No double increment

* heap.Remove

* Try to fix alignment

* Refactoring, adding hashItems

* More changes

* Fix compile errors

* Fix lint

* Wrapping cached reader

* Wrap writer into cached writer

* Turn state cache off by default

* Fix plain state writer

* Fix for code/storage mixup

* Fix tests

* Fix clique test

* Better fix for the tests

* Add test and fix some more

* Fix compile error|

* More functions

* Fixes

* Fix for the tests

* sepatate DeletedFlag and AbsentFlag

* Minor fixes

* Test refactoring

* More changes

* Fix some tests

* More test fixes

* More test fixes

* Fix lint

* Move blockchain_test to be able to use stagedsync

* More fixes

* Fixes and cleanup

* Fix tests in turbo/stages

* Fix lint

* Fix lint

* Intemediate

* Fix tests

* Intemediate

* More fixes

* Compilation fixes

* More fixes

* Fix compile errors

* More test fixes

* More fixes

* More test fixes

* Fix compile error

* Fixes

* Fix

* Fix

* More fixes

* Fixes

* More fixes and cleanup

* Further fix

* Check gas used and bloom with header

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2020-12-08 09:44:29 +00:00
..
account_changeset_test.go ChangeSets dupsort (#1342) 2020-11-16 12:08:28 +00:00
account_changeset_utils.go Store transactions individually (#1358) 2020-11-22 21:25:26 +00:00
account_changeset.go ChangeSets dupsort (#1342) 2020-11-16 12:08:28 +00:00
changeset.go ChangeSets dupsort (#1342) 2020-11-16 12:08:28 +00:00
readme.md Storage encoding docs (#511) 2020-05-04 06:55:37 +01:00
storage_changeset_test.go State cache switching writes to reads during commit (#1368) 2020-12-08 09:44:29 +00:00
storage_changeset_utils.go ChangeSets dupsort (#1342) 2020-11-16 12:08:28 +00:00
storage_changeset.go ChangeSets dupsort (#1342) 2020-11-16 12:08:28 +00:00

#Changesets encoding

Storage changeset encoding

Storage encoding contains several blocks: Address hashes, Incarnations, Length of values, Values. AccountChangeSet is serialized in the following manner in order to facilitate binary search

Address hashes

There are a lot of address hashes duplication in storage changeset when we have multiple changes in one contract. To avoid it we can store only unique address hashes. First 4 bytes contains number or unique contract hashes in one changeset. Then we store address hashes with sum of key hashes from first element.

For example: for addrHash1Inc1Key1, addrHash1Inc1Key2, addrHash2Inc1Key1, addrHash2Inc1Key3 it stores 2,addrHash1,2,addrHash2,4

Incarnations

Currently, there are a few not default incarnations(!=1) in current state. That was the reason why we store incarnation only if it's not equal to fffffffe(inverted 1). First part is 4 byte that contains number of not default incarnations Then we store array of id of address hash(4 byte) plus incarnation(8 byte) For example: for addrHash1fffffffe..., addrHash1fffffffd... it stores 1,1,fffffffd

Values lengths

The default value length is 32(common.Hash), but if we remove leading 0 then average size became ~7. Because a length of value could be from 0 to 32 we need this section to be able to find quite fast value by key. It is contiguous array of accumulating value indexes like len(val0), len(val0)+len(val1), ..., len(val0)+len(val1)+...+len(val_{N-1}) To reduce cost of it we have three numbers: numOfUint8, numOfUint16, numOfUint32. They can answer to the question: How many lengths of values we can put to uint8, uint16, uint32. This number could be huge if one of the contracts was suicided during block execution. Then we could have thousands of empty values, and we are able to store them in uint8(but it depends). For example for values: "ffa","","faa" it stores 3,0,0,3,3,6

Values

Contiguous array of values.

Finally

Value Type Comment
numOfUniqueElements uint32
Address hashes [numOfUniqueElements]{[32]byte+[4]byte} [numOfUniqueElements](common.Hash + uint32)
numOfNotDefaultIncarnations uint32 mostly - 0
Incarnations [numOfNotDefaultIncarnations]{[4]byte + [8]byte} []{idOfAddrHash(uint32) + incarnation(uint64)}
Keys [][32]byte []common.Hash
numOfUint8 uint32
numOfUint16 uint32
numOfUint32 uint32
Values lengths in uint8 [numOfUint8]uint8
Values lengths in uint16 [numOfUint16]uint16
Values lengths in uint32 [numOfUint32]uint32
Values [][]byte

Account changeset encoding

AccountChangeSet is serialized in the following manner in order to facilitate binary search. Account changeset encoding contains several blocks: Keys, Length of values, Values. Key is address hash of account. Value is CBOR encoded account without storage root and code hash.

Keys

The number of keys N (uint32, 4 bytes) Contiguous array of keys (N*32 bytes)

Values lengthes

Contiguous array of accumulating value indexes: len(val0), len(val0)+len(val1), ..., len(val0)+len(val1)+...+len(val_{N-1}) (4*N bytes since the lengths are treated as uint32).

Values

Contiguous array of values.

Finally

Value Type Comment
num of keys uint32
address hashes [num of keys][32]byte [num of keys]common.Hash
values lengthes [num of keys]uint32
values [num of keys][]byte