Change set docs (#2062)

2024-12-22 03:30:37 +00:00 · 2021-05-31 15:29:46 +07:00 · 2021-05-31 15:29:46 +07:00 · a4ff299afb
commit a4ff299afb
parent ee13ad17fa
4 changed files with 196 additions and 226 deletions
--- a/common/dbutils/bucket.go
+++ b/common/dbutils/bucket.go
@ -18,8 +18,8 @@ var DBSchemaVersionMDBX = types.VersionReply{Major: 2, Minor: 0, Patch: 0}
 // "Plain State" - state where keys arent' hashed. "CurrentState" - same, but keys are hashed. "PlainState" used for blocks execution. "CurrentState" used mostly for Merkle root calculation.
 // "incarnation" - uint64 number - how much times given account was SelfDestruct'ed.

-/*PlainStateBucket
-Logical layout:
+/*
+PlainStateBucket logical layout:
 	Contains Accounts:
 	  key - address (unhashed)
 	  value - account encoded for storage
@ -45,21 +45,37 @@ Physical layout:
 const PlainStateBucket = "PLAIN-CST2"
 const PlainStateBucketOld1 = "PLAIN-CST"

+//PlainContractCodeBucket -
+//key - address+incarnation
+//value - code hash
+var PlainContractCodeBucket = "PLAIN-contractCode"
+
+/*
+AccountChangeSetBucket and StorageChangeSetBucket store PlainStateBucket changes in logical format:
+	key - blockNum_u64 + key_in_plain_state
+	value - value_in_plain_state_before_blockNum_changes
+
+Example: If block N changed account A from value X to Y. Then:
+	AccountChangeSetBucket has record: bigEndian(N) + A -> X
+	PlainStateBucket has record: A -> Y
+
+See also: docs/programmers_guide/db_walkthrough.MD#table-history-of-accounts
+
+As you can see if block N changes much accounts - then all records have repetitive prefix `bigEndian(N)`.
+MDBX can store such prefixes only once - by DupSort feature (see `docs/programmers_guide/dupsort.md`).
+Both buckets are DupSort-ed and have physical format:
+AccountChangeSetBucket:
+	key - blockNum_u64
+	value - address + account(encoded)
+
+StorageChangeSetBucket:
+	key - blockNum_u64 + address + incarnation_u64
+	value - plain_storage_key + value
+*/
+var AccountChangeSetBucket = "PLAIN-ACS"
+var StorageChangeSetBucket = "PLAIN-SCS"
+
 const (
-	//PlainContractCodeBucket -
-	//key - address+incarnation
-	//value - code hash
-	PlainContractCodeBucket = "PLAIN-contractCode"
-
-	// AccountChangeSetBucket keeps changesets of accounts ("plain state")
-	// key - encoded timestamp(block number)
-	// value - encoded ChangeSet{k - address v - account(encoded).
-	AccountChangeSetBucket = "PLAIN-ACS"
-
-	// StorageChangeSetBucket keeps changesets of storage ("plain state")
-	// key - encoded timestamp(block number)
-	// value - encoded ChangeSet{k - plainCompositeKey(for storage) v - originalValue(common.Hash)}.
-	StorageChangeSetBucket = "PLAIN-SCS"

 	//HashedAccountsBucket
 	// key - address hash
@ -72,8 +88,8 @@ const (
 	CurrentStateBucketOld2 = "CST2"
 )

-/*AccountsHistoryBucket and StorageHistoryBucket
-History index designed to serve next 2 type of requests:
+/*
+AccountsHistoryBucket and StorageHistoryBucket - indices designed to serve next 2 type of requests:
 1. what is smallest block number >= X where account A changed
 2. get last shard of A - to append there new block numbers

@ -95,6 +111,8 @@ If `db.Seek(A+bigEndian(X))` returns non-last shard -
 If `db.Seek(A+bigEndian(X))` returns last shard -
 		then we go to PlainState: db.Get(PlainState, A)

+see also: docs/programmers_guide/db_walkthrough.MD#table-change-sets
+
 AccountsHistoryBucket:
 	key - address + shard_id_u64
 	value - roaring bitmap  - list of block where it changed
--- a/docs/programmers_guide/dupsort.md
+++ b/docs/programmers_guide/dupsort.md
@ -0,0 +1,159 @@
+DupSort feature explanation
+===========================
+
+If KV database has no concept of "Buckets/Tables/Collections" then all keys must have "Prefix". For example to store
+Block bodies and headers need use `b` and `h` prefixes:
+
+```
+b1->encoded_block1
+b2->encoded_block2
+b3->encoded_block3
+...
+h1->encoded_header1
+h2->encoded_header2
+h3->encoded_header3
+...
+```
+
+Of course this is 1 byte per key overhead is not very big. But if DB provide concept of named "
+Buckets/Tables/Collections" then need create 2 tables `b` and `h` and store there key without prefixes. Physically table
+names will stored only once (not 1 per key).
+
+But if do 1 step forward - and introduce concept of named "Sub-Buckets/Sub-Tables/Sub-Collections". Then in will allow
+to store physically once longer prefixes.
+
+Let's look at ChangeSets. If block N changed account A from value X to Y:  
+`ChangeSet -> bigEndian(N) -> A -> X`
+
+- `ChangeSet` - name of Table
+- `bigEndian(N)` - name of Sub-Table
+- `A` - key inside Sub-Table
+- `X` - value inside Sub-Table
+
+MDBX supports
+-------------
+
+MDBX supports "tables" (it uses name DBI) and supports "sub-tables" (DupSort DBI).
+
+```
+#MDBX_DUPSORT
+    Duplicate keys may be used in the database. (Or, from another perspective,
+    keys may have multiple data items, stored in sorted order.) By default
+    keys must be unique and may have only a single data item.
+``` 
+
+MDBX stores keys in Tree(B+Tree), and keys of sub-tables in sub-Tree (which is linked to Tree of table).
+
+Find value of 1 key, still can be done by single method:
+
+```
+subTableName, keyInSubTable, value := db.Get(tableName, subTableName, keyInSubTable)
+```
+
+Common pattern to iterate over whole 'normal' table (without sub-table) in a pseudocode:
+
+```
+cursor := transaction.OpenCursor(tableName)
+for k, v := cursor.Seek(key); k != nil; k, v = cursor.Next() {
+    // logic works with 'k' and 'v' variables
+} 
+```
+
+Iterate over table with sub-table:
+
+```
+cursor := transaction.OpenCursor(tableName)
+for k, _ := cursor.SeekDup(subTableName, keyInSubTable); k != nil; k, _ = cursor.Next() {
+    // logic works with 'k1', 'k' and 'v' variables
+} 
+```
+
+Enough strait forward. No performance penalty (only profit from smaller database size).
+
+MDBX in-depth
+-------------
+
+Max key size: 2022byte (same for key of sub-Table)
+Let's look at ChangeSets. If block N changed account A from value X to Y:  
+`ChangeSet -> bigEndian(N) -> A -> X`
+
+- `ChangeSet` - name of Table
+- `bigEndian(N)` - name of Sub-Table
+- `A` - key inside Sub-Table
+- `X` - value inside Sub-Table
+
+```
+------------------------------------------------------------------------------------------
+    table        | sub-table-name    |      keyAndValueJoinedTogether (no 'value' column)
+------------------------------------------------------------------------------------------
+  'ChangeSets'   | 
+                 | {1}                | {A}+{X}   
+                 |                    | {A2}+{X2}
+                 | {2}                | {A3}+{X3}   
+                 |                    | {A4}+{X4}
+                 | ...                | ...               
+```
+
+It's a bit unexpected, but doesn't change much. All operations are still work:
+
+```
+subTableName, keyAndValueJoinedTogether := cursor.Get(subTableName, keyInSubTable)
+{N}, {A}+{X} := cursor.Seek({N}, {A})
+```
+
+You need manually separate 'A' and 'X'. But, it unleash bunch of new features!
+Can iterate in sortet manner all changes in block N. Can read only 1 exact change - even if Block changed many megabytes
+of state.
+
+And format of StorageChangeSetBucket:
+Loc - location hash (key of storage)
+
+```
+------------------------------------------------------------------------------------------
+    table        | sub-table-name    |      keyAndValueJoinedTogether (no 'value' column)
+------------------------------------------------------------------------------------------
+'StorageChanges' | 
+                 | {1}+{A}+{inc1}     | {Loc1}+{X}
+                 |                    | {Loc2}+{X2}
+                 |                    | {Loc3}+{X3}
+                 | {2}+{A}+{inc1}     | {Loc4}+{X4}
+                 |                    | {Loc5}+{X5}
+                 |                    | {Loc6}+{X6}
+                 |                    | ...             
+ ```
+
+Because column "keyAndValueJoinedTogether" is stored as key - it has same size limit: 551byte
+
+MDBX, can you do better?
+------------------------
+
+By default, for each key MDBX does store small metadata (size of data). Indices by nature - store much-much keys.
+
+If all keys in sub-table (DupSort DBI) have same size - MDBX can store much less metadata.  
+(Remember! that "keys in sub-table" it's "keyAndValueJoinedTogether" - this thing must have same size). MDBX called this
+feature DupFixed (can add this flag to table configuration).
+
+```
+#MDB_DUPFIXED
+	 This flag may only be used in combination with #MDB_DUPSORT. This option
+	 tells the library that the data items for this database are all the same
+	 size, which allows further optimizations in storage and retrieval. When
+	 all data items are the same size, the #MDB_GET_MULTIPLE, #MDB_NEXT_MULTIPLE
+	 and #MDB_PREV_MULTIPLE cursor operations may be used to retrieve multiple
+	 items at once.
+```
+
+It means in 1 db call you can Get/Put up to 4Kb of sub-table keys.
+
+[mdbx docs](https://github.com/erthink/libmdbx/blob/master/mdbx.h)
+
+Erigon
+---------
+
+This article target is to show tricky concepts on examples. Future
+reading [here](./db_walkthrough.MD#table-history-of-accounts)
+
+Erigon supports multiple typed cursors, see [AbstractKV.md](./../../ethdb/AbstractKV.md)
+
+
+
--- a/docs/programmers_guide/indices.md
+++ b/docs/programmers_guide/indices.md
@ -1,207 +0,0 @@
-Indices implementation in Erigon
-====================================
-
-Indices (inverted indices) - allow search data by multiple filters. 
-Here is an example: "In which blocks account X was updated? (account can be created/updated/deleted)"
-2 types of data "accounts value" and "accounts history" need to store in 1 key-value database.
-To avoid keys collision between data types - used `account` and `history` prefixes.
-To encode `created/updated/deleted` operations - used `C`, `U`, `D` markers. 
-
-```
-// Picture 1  
----------------------------------------------------
-       key                      |       value
----------------------------------------------------
-'account'{account1_address}     | {account1_value}
-'account'{account2_address}     | {account2_value}
-...                             | ...
-'account'{accountN_address}     | {accountN_value}
-'history'{account1_address}'C'  | {block_number1}
-'history'{account1_address}'U'  | {block_number2}
-'history'{account1_address}'U'  | {block_number3}
-'history'{account1_address}'D'  | {block_number4}
-'history'{account2_address}'C'  | {block_number5}
-'history'{account2_address}'U'  | {block_number6}
-...                             | ...
-'history'{accountN_address}'U'  | {block_numberM}
-```
-
-**Observation 1**: `account` and `history` prefixes repeated over and over again - wasting disk space.
-Complete solutions is: database supports "named buckets" - independent sub-databases - between buckets collisions are impossible.
-
-```
-// Picture 2
--------------------------------------------------------------
-   bucket    |        key                |       value
--------------------------------------------------------------
-  'account'  |
-             | {account1_address}        | {account1_value}
-             | {account2_address}        | {account2_value}
-             | ...                       | ...
-             | {accountN_address}        | {accountN_value}
-  'history'  | 
-             | {account1_address}'C'     | {block_number1}
-             | {account1_address}'U'     | {block_number2}
-             | {account1_address}'U'     | {block_number3}
-             | {account1_address}'D'     | {block_number4}
-             | {account2_address}'C'     | {block_number5}
-             | {account2_address}'U'     | {block_number6}
-             | ...                       | ...
-             | {accountN_address}'U'     | {block_numberM}
-```
-Most of key-value databases (LevelDB, BadgerDB) do not provide such feature, but some do (BoltDB, LMDB)
-
-**Observation 2**: Bucket 'history' again has much repeated prefixes: `{account1_address}` prefix will repeat every time account1 changed
-This is same problem as in "Observation 1" - can we use same solution for the same problem?
-Database supports "named sub-buckets" - independent sub-sub-databases - between sub-buckets collisions are impossible.
-
-```
-// Picture 3
---------------------------------------------------------------------------
-    bucket   |   sub-bucket-name  |        key          |       value
---------------------------------------------------------------------------
-  'account'  |
-             | {account1_address} |                     | {account1_value}
-             | {account2_address} |                     | {account2_value}
-             | ...                |                     | ...
-             | {accountN_address} |                     | {accountN_value}
-  'history'  | 
-             | {account1_address} |
-             |                    | 'C'                 | {block_number1}
-             |                    | 'U'                 | {block_number2}
-             |                    | 'U'                 | {block_number3}
-             |                    | 'D'                 | {block_number4}
-             | {account2_address} |
-             |                    | 'C'                 | {block_number5}
-             |                    | 'U'                 | {block_number6}
-             |                    | ...                 | ...
-             | {accountn_address} |                
-             |                    | 'U'                 | {block_numberM}
-```
-
-Keys don't have repetitive data anymore (markers 'C','U','D' can be part of sub-bucket name if need).
-
-All this tricks must keep data accessible: search/iterate/insert operations must be easy.    
-
-LMDB supports 
-------------
- 
-LMDB supports "buckets" (it uses name DBI) and supports "sub-buckets" (DupSort DBI).
-```
-#MDB_DUPSORT
-    Duplicate keys may be used in the database. (Or, from another perspective,
-    keys may have multiple data items, stored in sorted order.) By default
-    keys must be unique and may have only a single data item.
-``` 
-
-LMDB stores keys in Tree(B+Tree), and keys of sub-buckets in sub-Tree (which is linked to Tree of bucket).
-
-Find value of 1 key, still can be done by single method:  
-```
-subBucketName, keyInSubBucket, value := cursor.Get(subBucketName, keyInSubBucket)
-```
-
-Common pattern to iterate over whole 'normal' bucket (without sub-buckets) in a pseudocode:
-```
-cursor := transaction.OpenCursor(bucketName)
-for k, v := cursor.Seek(key); k != nil; k, v = cursor.Next() {
-    // logic works with 'k' and 'v' variables
-} 
-```
-
-Iterate over bucket with sub-buckets: 
-```
-cursor := transaction.OpenCursor(bucketName)
-for k, _ := cursor.SeekDup(subBucketName, keyInSubBucket); k != nil; k, _ = cursor.Next() {
-    // logic works with 'k1', 'k' and 'v' variables
-} 
-```
-
-Enough strait forward. No performance penalty (only profit from smaller database size).
-
-LMDB in-depth
-------------
- 
-Max key size: 551byte (same for key of sub-bucket)
-
-Please take a look on 'Picture 3' again - it illustrates the high-level idea, but LMDB stores it different way. 
-'Picture 4' shows - sub-bucket (DupSort DBI) has no "value", it does join bytes of key and value and store it as 'key': 
-
-```
-// Picture 4
--------------------------------------------------------------------------------------
-    bucket   | sub-bucket-name    |      keyAndValueJoinedTogether (no 'value' column)
--------------------------------------------------------------------------------------
-  'account'  |
-             | {account1_address} | {account1_value}   
-             | {account2_address} | {account2_value}
-             | ...                | ...               
-             | {accountN_address} | {accountN_value}
-  'history'  | 
-             | {account1_address} |
-             |                    | 'C'{block_number1}
-             |                    | 'U'{block_number2}
-             |                    | 'U'{block_number3}
-             |                    | 'D'{block_number4}
-             | {account2_address} |
-             |                    | 'C'{block_number5}
-             |                    | 'U'{block_number6}
-             |                    | ...
-             | {accountn_address} |                
-             |                    | 'U'{block_numberM}
-```
-
-It's a bit unexpected, but doesn't change much. All operations are still work:
-```
-subBucketName, keyAndValueJoinedTogether := cursor.Get(subBucketName, keyInSubBucket)
-```
-
-You may need manually separate 'key' and 'value'. But, it unleash bunch of new features!
-
-Because column "keyAndValueJoinedTogether" is sorted and stored as key in same Tree (as normal keys). 
-"value" can be used as part your query. In 1 db command we can answer more complex question:
-"Dear DB, Give me block number where account X was update and which is greater or equal than N".
-```
-{account1_address}, 'U'{block_number2} := cursor.Seek({account1_address}, 'U'{block_number1})
-// notice that in parameter we used 'block_numger1' 
-// but DB had no 'U' records for this block and this account
-// then db returned value which is greater than what we requested 
-// it returned 'block_number2' 
-```
-
-Because column "keyAndValueJoinedTogether" is stored as key - it has same size limit: 551byte 
-
-LMDB, can you do better?
------------------------
-
-By default, for each key LMDB does store small metadata (size of data). 
-Indices by nature - store much-much keys.
-
-If all keys in sub-bucket (DupSort DBI) have same size - LMDB can store much less metadata.  
-(Remember! that "keys in sub-bucket" it's "keyAndValueJoinedTogether" - this thing must have same size).
-LMDB called this feature DupFixed (can add this flag to bucket configuration).
-
-```
-#MDB_DUPFIXED
-	 This flag may only be used in combination with #MDB_DUPSORT. This option
-	 tells the library that the data items for this database are all the same
-	 size, which allows further optimizations in storage and retrieval. When
-	 all data items are the same size, the #MDB_GET_MULTIPLE, #MDB_NEXT_MULTIPLE
-	 and #MDB_PREV_MULTIPLE cursor operations may be used to retrieve multiple
-	 items at once.
-```
-
-It means in 1 db call you can Get/Put up to 4Kb of sub-bucket keys. 
-
-[lmdb docs](https://github.com/ledgerwatch/lmdb-go/blob/master/lmdb/lmdb.h)
-
-Erigon
---------
-
-This article target is to show tricky concepts on simple examples. 
-Real way how Erigon stores accounts value and accounts history is a bit different and described [here](./db_walkthrough.MD#bucket-history-of-accounts)    
-
-Erigon supports multiple typed cursors, see [AbstractKV.md](./../../ethdb/AbstractKV.md)
-
-
-
--- a/ethdb/Readme.md
+++ b/ethdb/Readme.md
@ -100,7 +100,7 @@ if err != nil {
 - No internal copies/allocations. It means: 1. app must copy keys/values before put to database. 2. Data after read from db - valid only during current transaction - copy it if plan use data after transaction Commit/Rollback.
 - Methods .Bucket() and .Cursor(), can’t return nil, can't return error.
 - Bucket and Cursor - are interfaces - means different classes can satisfy it: for example `LmdbCursor` and `LmdbDupSortCursor` classes satisfy it. 
-  If your are not familiar with "DupSort" concept, please read [indices.md](./../docs/programmers_guide/indices.md) first.
+  If your are not familiar with "DupSort" concept, please read [dupsort.md](./../docs/programmers_guide/dupsort.md) first.


 - If Cursor returns err!=nil then key SHOULD be != nil (can be []byte{} for example).