mirror of
https://gitlab.com/pulsechaincom/erigon-pulse.git
synced 2024-12-22 03:30:37 +00:00
Change set docs (#2062)
This commit is contained in:
parent
ee13ad17fa
commit
a4ff299afb
@ -18,8 +18,8 @@ var DBSchemaVersionMDBX = types.VersionReply{Major: 2, Minor: 0, Patch: 0}
|
||||
// "Plain State" - state where keys arent' hashed. "CurrentState" - same, but keys are hashed. "PlainState" used for blocks execution. "CurrentState" used mostly for Merkle root calculation.
|
||||
// "incarnation" - uint64 number - how much times given account was SelfDestruct'ed.
|
||||
|
||||
/*PlainStateBucket
|
||||
Logical layout:
|
||||
/*
|
||||
PlainStateBucket logical layout:
|
||||
Contains Accounts:
|
||||
key - address (unhashed)
|
||||
value - account encoded for storage
|
||||
@ -45,21 +45,37 @@ Physical layout:
|
||||
const PlainStateBucket = "PLAIN-CST2"
|
||||
const PlainStateBucketOld1 = "PLAIN-CST"
|
||||
|
||||
//PlainContractCodeBucket -
|
||||
//key - address+incarnation
|
||||
//value - code hash
|
||||
var PlainContractCodeBucket = "PLAIN-contractCode"
|
||||
|
||||
/*
|
||||
AccountChangeSetBucket and StorageChangeSetBucket store PlainStateBucket changes in logical format:
|
||||
key - blockNum_u64 + key_in_plain_state
|
||||
value - value_in_plain_state_before_blockNum_changes
|
||||
|
||||
Example: If block N changed account A from value X to Y. Then:
|
||||
AccountChangeSetBucket has record: bigEndian(N) + A -> X
|
||||
PlainStateBucket has record: A -> Y
|
||||
|
||||
See also: docs/programmers_guide/db_walkthrough.MD#table-history-of-accounts
|
||||
|
||||
As you can see if block N changes much accounts - then all records have repetitive prefix `bigEndian(N)`.
|
||||
MDBX can store such prefixes only once - by DupSort feature (see `docs/programmers_guide/dupsort.md`).
|
||||
Both buckets are DupSort-ed and have physical format:
|
||||
AccountChangeSetBucket:
|
||||
key - blockNum_u64
|
||||
value - address + account(encoded)
|
||||
|
||||
StorageChangeSetBucket:
|
||||
key - blockNum_u64 + address + incarnation_u64
|
||||
value - plain_storage_key + value
|
||||
*/
|
||||
var AccountChangeSetBucket = "PLAIN-ACS"
|
||||
var StorageChangeSetBucket = "PLAIN-SCS"
|
||||
|
||||
const (
|
||||
//PlainContractCodeBucket -
|
||||
//key - address+incarnation
|
||||
//value - code hash
|
||||
PlainContractCodeBucket = "PLAIN-contractCode"
|
||||
|
||||
// AccountChangeSetBucket keeps changesets of accounts ("plain state")
|
||||
// key - encoded timestamp(block number)
|
||||
// value - encoded ChangeSet{k - address v - account(encoded).
|
||||
AccountChangeSetBucket = "PLAIN-ACS"
|
||||
|
||||
// StorageChangeSetBucket keeps changesets of storage ("plain state")
|
||||
// key - encoded timestamp(block number)
|
||||
// value - encoded ChangeSet{k - plainCompositeKey(for storage) v - originalValue(common.Hash)}.
|
||||
StorageChangeSetBucket = "PLAIN-SCS"
|
||||
|
||||
//HashedAccountsBucket
|
||||
// key - address hash
|
||||
@ -72,8 +88,8 @@ const (
|
||||
CurrentStateBucketOld2 = "CST2"
|
||||
)
|
||||
|
||||
/*AccountsHistoryBucket and StorageHistoryBucket
|
||||
History index designed to serve next 2 type of requests:
|
||||
/*
|
||||
AccountsHistoryBucket and StorageHistoryBucket - indices designed to serve next 2 type of requests:
|
||||
1. what is smallest block number >= X where account A changed
|
||||
2. get last shard of A - to append there new block numbers
|
||||
|
||||
@ -95,6 +111,8 @@ If `db.Seek(A+bigEndian(X))` returns non-last shard -
|
||||
If `db.Seek(A+bigEndian(X))` returns last shard -
|
||||
then we go to PlainState: db.Get(PlainState, A)
|
||||
|
||||
see also: docs/programmers_guide/db_walkthrough.MD#table-change-sets
|
||||
|
||||
AccountsHistoryBucket:
|
||||
key - address + shard_id_u64
|
||||
value - roaring bitmap - list of block where it changed
|
||||
|
159
docs/programmers_guide/dupsort.md
Normal file
159
docs/programmers_guide/dupsort.md
Normal file
@ -0,0 +1,159 @@
|
||||
DupSort feature explanation
|
||||
===========================
|
||||
|
||||
If KV database has no concept of "Buckets/Tables/Collections" then all keys must have "Prefix". For example to store
|
||||
Block bodies and headers need use `b` and `h` prefixes:
|
||||
|
||||
```
|
||||
b1->encoded_block1
|
||||
b2->encoded_block2
|
||||
b3->encoded_block3
|
||||
...
|
||||
h1->encoded_header1
|
||||
h2->encoded_header2
|
||||
h3->encoded_header3
|
||||
...
|
||||
```
|
||||
|
||||
Of course this is 1 byte per key overhead is not very big. But if DB provide concept of named "
|
||||
Buckets/Tables/Collections" then need create 2 tables `b` and `h` and store there key without prefixes. Physically table
|
||||
names will stored only once (not 1 per key).
|
||||
|
||||
But if do 1 step forward - and introduce concept of named "Sub-Buckets/Sub-Tables/Sub-Collections". Then in will allow
|
||||
to store physically once longer prefixes.
|
||||
|
||||
Let's look at ChangeSets. If block N changed account A from value X to Y:
|
||||
`ChangeSet -> bigEndian(N) -> A -> X`
|
||||
|
||||
- `ChangeSet` - name of Table
|
||||
- `bigEndian(N)` - name of Sub-Table
|
||||
- `A` - key inside Sub-Table
|
||||
- `X` - value inside Sub-Table
|
||||
|
||||
MDBX supports
|
||||
-------------
|
||||
|
||||
MDBX supports "tables" (it uses name DBI) and supports "sub-tables" (DupSort DBI).
|
||||
|
||||
```
|
||||
#MDBX_DUPSORT
|
||||
Duplicate keys may be used in the database. (Or, from another perspective,
|
||||
keys may have multiple data items, stored in sorted order.) By default
|
||||
keys must be unique and may have only a single data item.
|
||||
```
|
||||
|
||||
MDBX stores keys in Tree(B+Tree), and keys of sub-tables in sub-Tree (which is linked to Tree of table).
|
||||
|
||||
Find value of 1 key, still can be done by single method:
|
||||
|
||||
```
|
||||
subTableName, keyInSubTable, value := db.Get(tableName, subTableName, keyInSubTable)
|
||||
```
|
||||
|
||||
Common pattern to iterate over whole 'normal' table (without sub-table) in a pseudocode:
|
||||
|
||||
```
|
||||
cursor := transaction.OpenCursor(tableName)
|
||||
for k, v := cursor.Seek(key); k != nil; k, v = cursor.Next() {
|
||||
// logic works with 'k' and 'v' variables
|
||||
}
|
||||
```
|
||||
|
||||
Iterate over table with sub-table:
|
||||
|
||||
```
|
||||
cursor := transaction.OpenCursor(tableName)
|
||||
for k, _ := cursor.SeekDup(subTableName, keyInSubTable); k != nil; k, _ = cursor.Next() {
|
||||
// logic works with 'k1', 'k' and 'v' variables
|
||||
}
|
||||
```
|
||||
|
||||
Enough strait forward. No performance penalty (only profit from smaller database size).
|
||||
|
||||
MDBX in-depth
|
||||
-------------
|
||||
|
||||
Max key size: 2022byte (same for key of sub-Table)
|
||||
Let's look at ChangeSets. If block N changed account A from value X to Y:
|
||||
`ChangeSet -> bigEndian(N) -> A -> X`
|
||||
|
||||
- `ChangeSet` - name of Table
|
||||
- `bigEndian(N)` - name of Sub-Table
|
||||
- `A` - key inside Sub-Table
|
||||
- `X` - value inside Sub-Table
|
||||
|
||||
```
|
||||
------------------------------------------------------------------------------------------
|
||||
table | sub-table-name | keyAndValueJoinedTogether (no 'value' column)
|
||||
------------------------------------------------------------------------------------------
|
||||
'ChangeSets' |
|
||||
| {1} | {A}+{X}
|
||||
| | {A2}+{X2}
|
||||
| {2} | {A3}+{X3}
|
||||
| | {A4}+{X4}
|
||||
| ... | ...
|
||||
```
|
||||
|
||||
It's a bit unexpected, but doesn't change much. All operations are still work:
|
||||
|
||||
```
|
||||
subTableName, keyAndValueJoinedTogether := cursor.Get(subTableName, keyInSubTable)
|
||||
{N}, {A}+{X} := cursor.Seek({N}, {A})
|
||||
```
|
||||
|
||||
You need manually separate 'A' and 'X'. But, it unleash bunch of new features!
|
||||
Can iterate in sortet manner all changes in block N. Can read only 1 exact change - even if Block changed many megabytes
|
||||
of state.
|
||||
|
||||
And format of StorageChangeSetBucket:
|
||||
Loc - location hash (key of storage)
|
||||
|
||||
```
|
||||
------------------------------------------------------------------------------------------
|
||||
table | sub-table-name | keyAndValueJoinedTogether (no 'value' column)
|
||||
------------------------------------------------------------------------------------------
|
||||
'StorageChanges' |
|
||||
| {1}+{A}+{inc1} | {Loc1}+{X}
|
||||
| | {Loc2}+{X2}
|
||||
| | {Loc3}+{X3}
|
||||
| {2}+{A}+{inc1} | {Loc4}+{X4}
|
||||
| | {Loc5}+{X5}
|
||||
| | {Loc6}+{X6}
|
||||
| | ...
|
||||
```
|
||||
|
||||
Because column "keyAndValueJoinedTogether" is stored as key - it has same size limit: 551byte
|
||||
|
||||
MDBX, can you do better?
|
||||
------------------------
|
||||
|
||||
By default, for each key MDBX does store small metadata (size of data). Indices by nature - store much-much keys.
|
||||
|
||||
If all keys in sub-table (DupSort DBI) have same size - MDBX can store much less metadata.
|
||||
(Remember! that "keys in sub-table" it's "keyAndValueJoinedTogether" - this thing must have same size). MDBX called this
|
||||
feature DupFixed (can add this flag to table configuration).
|
||||
|
||||
```
|
||||
#MDB_DUPFIXED
|
||||
This flag may only be used in combination with #MDB_DUPSORT. This option
|
||||
tells the library that the data items for this database are all the same
|
||||
size, which allows further optimizations in storage and retrieval. When
|
||||
all data items are the same size, the #MDB_GET_MULTIPLE, #MDB_NEXT_MULTIPLE
|
||||
and #MDB_PREV_MULTIPLE cursor operations may be used to retrieve multiple
|
||||
items at once.
|
||||
```
|
||||
|
||||
It means in 1 db call you can Get/Put up to 4Kb of sub-table keys.
|
||||
|
||||
[mdbx docs](https://github.com/erthink/libmdbx/blob/master/mdbx.h)
|
||||
|
||||
Erigon
|
||||
---------
|
||||
|
||||
This article target is to show tricky concepts on examples. Future
|
||||
reading [here](./db_walkthrough.MD#table-history-of-accounts)
|
||||
|
||||
Erigon supports multiple typed cursors, see [AbstractKV.md](./../../ethdb/AbstractKV.md)
|
||||
|
||||
|
||||
|
@ -1,207 +0,0 @@
|
||||
Indices implementation in Erigon
|
||||
====================================
|
||||
|
||||
Indices (inverted indices) - allow search data by multiple filters.
|
||||
Here is an example: "In which blocks account X was updated? (account can be created/updated/deleted)"
|
||||
2 types of data "accounts value" and "accounts history" need to store in 1 key-value database.
|
||||
To avoid keys collision between data types - used `account` and `history` prefixes.
|
||||
To encode `created/updated/deleted` operations - used `C`, `U`, `D` markers.
|
||||
|
||||
```
|
||||
// Picture 1
|
||||
----------------------------------------------------
|
||||
key | value
|
||||
----------------------------------------------------
|
||||
'account'{account1_address} | {account1_value}
|
||||
'account'{account2_address} | {account2_value}
|
||||
... | ...
|
||||
'account'{accountN_address} | {accountN_value}
|
||||
'history'{account1_address}'C' | {block_number1}
|
||||
'history'{account1_address}'U' | {block_number2}
|
||||
'history'{account1_address}'U' | {block_number3}
|
||||
'history'{account1_address}'D' | {block_number4}
|
||||
'history'{account2_address}'C' | {block_number5}
|
||||
'history'{account2_address}'U' | {block_number6}
|
||||
... | ...
|
||||
'history'{accountN_address}'U' | {block_numberM}
|
||||
```
|
||||
|
||||
**Observation 1**: `account` and `history` prefixes repeated over and over again - wasting disk space.
|
||||
Complete solutions is: database supports "named buckets" - independent sub-databases - between buckets collisions are impossible.
|
||||
|
||||
```
|
||||
// Picture 2
|
||||
--------------------------------------------------------------
|
||||
bucket | key | value
|
||||
--------------------------------------------------------------
|
||||
'account' |
|
||||
| {account1_address} | {account1_value}
|
||||
| {account2_address} | {account2_value}
|
||||
| ... | ...
|
||||
| {accountN_address} | {accountN_value}
|
||||
'history' |
|
||||
| {account1_address}'C' | {block_number1}
|
||||
| {account1_address}'U' | {block_number2}
|
||||
| {account1_address}'U' | {block_number3}
|
||||
| {account1_address}'D' | {block_number4}
|
||||
| {account2_address}'C' | {block_number5}
|
||||
| {account2_address}'U' | {block_number6}
|
||||
| ... | ...
|
||||
| {accountN_address}'U' | {block_numberM}
|
||||
```
|
||||
Most of key-value databases (LevelDB, BadgerDB) do not provide such feature, but some do (BoltDB, LMDB)
|
||||
|
||||
**Observation 2**: Bucket 'history' again has much repeated prefixes: `{account1_address}` prefix will repeat every time account1 changed
|
||||
This is same problem as in "Observation 1" - can we use same solution for the same problem?
|
||||
Database supports "named sub-buckets" - independent sub-sub-databases - between sub-buckets collisions are impossible.
|
||||
|
||||
```
|
||||
// Picture 3
|
||||
---------------------------------------------------------------------------
|
||||
bucket | sub-bucket-name | key | value
|
||||
---------------------------------------------------------------------------
|
||||
'account' |
|
||||
| {account1_address} | | {account1_value}
|
||||
| {account2_address} | | {account2_value}
|
||||
| ... | | ...
|
||||
| {accountN_address} | | {accountN_value}
|
||||
'history' |
|
||||
| {account1_address} |
|
||||
| | 'C' | {block_number1}
|
||||
| | 'U' | {block_number2}
|
||||
| | 'U' | {block_number3}
|
||||
| | 'D' | {block_number4}
|
||||
| {account2_address} |
|
||||
| | 'C' | {block_number5}
|
||||
| | 'U' | {block_number6}
|
||||
| | ... | ...
|
||||
| {accountn_address} |
|
||||
| | 'U' | {block_numberM}
|
||||
```
|
||||
|
||||
Keys don't have repetitive data anymore (markers 'C','U','D' can be part of sub-bucket name if need).
|
||||
|
||||
All this tricks must keep data accessible: search/iterate/insert operations must be easy.
|
||||
|
||||
LMDB supports
|
||||
-------------
|
||||
|
||||
LMDB supports "buckets" (it uses name DBI) and supports "sub-buckets" (DupSort DBI).
|
||||
```
|
||||
#MDB_DUPSORT
|
||||
Duplicate keys may be used in the database. (Or, from another perspective,
|
||||
keys may have multiple data items, stored in sorted order.) By default
|
||||
keys must be unique and may have only a single data item.
|
||||
```
|
||||
|
||||
LMDB stores keys in Tree(B+Tree), and keys of sub-buckets in sub-Tree (which is linked to Tree of bucket).
|
||||
|
||||
Find value of 1 key, still can be done by single method:
|
||||
```
|
||||
subBucketName, keyInSubBucket, value := cursor.Get(subBucketName, keyInSubBucket)
|
||||
```
|
||||
|
||||
Common pattern to iterate over whole 'normal' bucket (without sub-buckets) in a pseudocode:
|
||||
```
|
||||
cursor := transaction.OpenCursor(bucketName)
|
||||
for k, v := cursor.Seek(key); k != nil; k, v = cursor.Next() {
|
||||
// logic works with 'k' and 'v' variables
|
||||
}
|
||||
```
|
||||
|
||||
Iterate over bucket with sub-buckets:
|
||||
```
|
||||
cursor := transaction.OpenCursor(bucketName)
|
||||
for k, _ := cursor.SeekDup(subBucketName, keyInSubBucket); k != nil; k, _ = cursor.Next() {
|
||||
// logic works with 'k1', 'k' and 'v' variables
|
||||
}
|
||||
```
|
||||
|
||||
Enough strait forward. No performance penalty (only profit from smaller database size).
|
||||
|
||||
LMDB in-depth
|
||||
-------------
|
||||
|
||||
Max key size: 551byte (same for key of sub-bucket)
|
||||
|
||||
Please take a look on 'Picture 3' again - it illustrates the high-level idea, but LMDB stores it different way.
|
||||
'Picture 4' shows - sub-bucket (DupSort DBI) has no "value", it does join bytes of key and value and store it as 'key':
|
||||
|
||||
```
|
||||
// Picture 4
|
||||
--------------------------------------------------------------------------------------
|
||||
bucket | sub-bucket-name | keyAndValueJoinedTogether (no 'value' column)
|
||||
--------------------------------------------------------------------------------------
|
||||
'account' |
|
||||
| {account1_address} | {account1_value}
|
||||
| {account2_address} | {account2_value}
|
||||
| ... | ...
|
||||
| {accountN_address} | {accountN_value}
|
||||
'history' |
|
||||
| {account1_address} |
|
||||
| | 'C'{block_number1}
|
||||
| | 'U'{block_number2}
|
||||
| | 'U'{block_number3}
|
||||
| | 'D'{block_number4}
|
||||
| {account2_address} |
|
||||
| | 'C'{block_number5}
|
||||
| | 'U'{block_number6}
|
||||
| | ...
|
||||
| {accountn_address} |
|
||||
| | 'U'{block_numberM}
|
||||
```
|
||||
|
||||
It's a bit unexpected, but doesn't change much. All operations are still work:
|
||||
```
|
||||
subBucketName, keyAndValueJoinedTogether := cursor.Get(subBucketName, keyInSubBucket)
|
||||
```
|
||||
|
||||
You may need manually separate 'key' and 'value'. But, it unleash bunch of new features!
|
||||
|
||||
Because column "keyAndValueJoinedTogether" is sorted and stored as key in same Tree (as normal keys).
|
||||
"value" can be used as part your query. In 1 db command we can answer more complex question:
|
||||
"Dear DB, Give me block number where account X was update and which is greater or equal than N".
|
||||
```
|
||||
{account1_address}, 'U'{block_number2} := cursor.Seek({account1_address}, 'U'{block_number1})
|
||||
// notice that in parameter we used 'block_numger1'
|
||||
// but DB had no 'U' records for this block and this account
|
||||
// then db returned value which is greater than what we requested
|
||||
// it returned 'block_number2'
|
||||
```
|
||||
|
||||
Because column "keyAndValueJoinedTogether" is stored as key - it has same size limit: 551byte
|
||||
|
||||
LMDB, can you do better?
|
||||
------------------------
|
||||
|
||||
By default, for each key LMDB does store small metadata (size of data).
|
||||
Indices by nature - store much-much keys.
|
||||
|
||||
If all keys in sub-bucket (DupSort DBI) have same size - LMDB can store much less metadata.
|
||||
(Remember! that "keys in sub-bucket" it's "keyAndValueJoinedTogether" - this thing must have same size).
|
||||
LMDB called this feature DupFixed (can add this flag to bucket configuration).
|
||||
|
||||
```
|
||||
#MDB_DUPFIXED
|
||||
This flag may only be used in combination with #MDB_DUPSORT. This option
|
||||
tells the library that the data items for this database are all the same
|
||||
size, which allows further optimizations in storage and retrieval. When
|
||||
all data items are the same size, the #MDB_GET_MULTIPLE, #MDB_NEXT_MULTIPLE
|
||||
and #MDB_PREV_MULTIPLE cursor operations may be used to retrieve multiple
|
||||
items at once.
|
||||
```
|
||||
|
||||
It means in 1 db call you can Get/Put up to 4Kb of sub-bucket keys.
|
||||
|
||||
[lmdb docs](https://github.com/ledgerwatch/lmdb-go/blob/master/lmdb/lmdb.h)
|
||||
|
||||
Erigon
|
||||
---------
|
||||
|
||||
This article target is to show tricky concepts on simple examples.
|
||||
Real way how Erigon stores accounts value and accounts history is a bit different and described [here](./db_walkthrough.MD#bucket-history-of-accounts)
|
||||
|
||||
Erigon supports multiple typed cursors, see [AbstractKV.md](./../../ethdb/AbstractKV.md)
|
||||
|
||||
|
||||
|
@ -100,7 +100,7 @@ if err != nil {
|
||||
- No internal copies/allocations. It means: 1. app must copy keys/values before put to database. 2. Data after read from db - valid only during current transaction - copy it if plan use data after transaction Commit/Rollback.
|
||||
- Methods .Bucket() and .Cursor(), can’t return nil, can't return error.
|
||||
- Bucket and Cursor - are interfaces - means different classes can satisfy it: for example `LmdbCursor` and `LmdbDupSortCursor` classes satisfy it.
|
||||
If your are not familiar with "DupSort" concept, please read [indices.md](./../docs/programmers_guide/indices.md) first.
|
||||
If your are not familiar with "DupSort" concept, please read [dupsort.md](./../docs/programmers_guide/dupsort.md) first.
|
||||
|
||||
|
||||
- If Cursor returns err!=nil then key SHOULD be != nil (can be []byte{} for example).
|
||||
|
Loading…
Reference in New Issue
Block a user