erigon-pulse/erigon-lib/kv
milen 34c0fe29ad
metrics: swap remaining VictoriaMetrics usages with erigon-lib/metrics (#8762)
# Background

Erigon currently uses a combination of Victoria Metrics and Prometheus
client for providing metrics.

We want to rationalize this and use only the Prometheus client library,
but we want to maintain the simplified Victoria Metrics methods for
constructing metrics.

This task is currently partly complete and needs to be finished to a
stage where we can remove the Victoria Metrics module from the Erigon
code base.

# Summary of changes

- Adds missing `NewCounter`, `NewSummary`, `NewHistogram`,
`GetOrCreateHistogram` functions to `erigon-lib/metrics` similar to the
interface VictoriaMetrics lib provides
- Minor tidy up for consistency inside `erigon-lib/metrics/set.go`
around return types (panic vs err consistency for funcs inside the
file), error messages, comments
- Replace all remaining usages of `github.com/VictoriaMetrics/metrics`
with `github.com/ledgerwatch/erigon-lib/metrics` - seamless (only import
changes) since interfaces match
2023-11-20 12:23:23 +00:00
..
bitmapdb Add 'erigon-lib/' from commit '93d9c9d9fe4bd8a49f7a98a6bce0f0da7094c7d3' 2023-09-20 14:50:25 +02:00
dbutils Sentinel refactor (#8296) 2023-10-22 01:17:18 +02:00
iter Add 'erigon-lib/' from commit '93d9c9d9fe4bd8a49f7a98a6bce0f0da7094c7d3' 2023-09-20 14:50:25 +02:00
kvcache metrics: swap remaining VictoriaMetrics usages with erigon-lib/metrics (#8762) 2023-11-20 12:23:23 +00:00
kvcfg Add 'erigon-lib/' from commit '93d9c9d9fe4bd8a49f7a98a6bce0f0da7094c7d3' 2023-09-20 14:50:25 +02:00
mdbx Add RPC daemon using Silkworm (#8486) 2023-10-18 06:37:16 +07:00
membatch sys deps up (#8695) 2023-11-11 15:04:18 +03:00
membatchwithdb Medbx: add label to error messages, UpdateForkChoice: add ctx to erorrs, MemDb: increase db-limit from 512Mb to 512Gb (#8434) 2023-10-11 12:53:34 +07:00
memdb move memdb to own package - to reduce cycle deps (#8428) 2023-10-11 08:48:36 +07:00
order Add 'erigon-lib/' from commit '93d9c9d9fe4bd8a49f7a98a6bce0f0da7094c7d3' 2023-09-20 14:50:25 +02:00
rawdbv3 Add 'erigon-lib/' from commit '93d9c9d9fe4bd8a49f7a98a6bce0f0da7094c7d3' 2023-09-20 14:50:25 +02:00
remotedb Add RPC daemon using Silkworm (#8486) 2023-10-18 06:37:16 +07:00
remotedbserver mdbx: less compile warnings (#8300) 2023-09-28 08:56:24 +07:00
temporal/historyv2 Spell Check: Fix typos (#8480) 2023-10-17 11:21:27 +07:00
helpers.go Add 'erigon-lib/' from commit '93d9c9d9fe4bd8a49f7a98a6bce0f0da7094c7d3' 2023-09-20 14:50:25 +02:00
kv_interface.go metrics: swap remaining VictoriaMetrics usages with erigon-lib/metrics (#8762) 2023-11-20 12:23:23 +00:00
Readme.md Add 'erigon-lib/' from commit '93d9c9d9fe4bd8a49f7a98a6bce0f0da7094c7d3' 2023-09-20 14:50:25 +02:00
tables.go Added data model for public keys validator (#8774) 2023-11-18 19:03:56 +01:00

Ethdb package hold's bouquet of objects to access DB

Words "KV" and "DB" have special meaning here:

  • KV - key-value-style API to access data: let developer manage transactions, stateful cursors.
  • DB - object-oriented-style API to access data: Get/Put/Delete/WalkOverTable/MultiPut, managing transactions internally.

So, DB abstraction fits 95% times and leads to more maintainable code - because it looks stateless.

About "key-value-style": Modern key-value databases don't provide Get/Put/Delete methods, because it's very hard-drive-unfriendly - it pushes developers do random-disk-access which is order of magnitude slower than sequential read. To enforce sequential-reads - introduced stateful cursors/iterators - they intentionally look as file-api: open_cursor/seek/write_data_from_current_position/move_to_end/step_back/step_forward/delete_key_on_current_position/append.

Class diagram:

// This is not call graph, just show classes from low-level to high-level. 
// And show which classes satisfy which interfaces.

+-----------------------------------+   +-----------------------------------+ 
|  github.com/erigonteh/mdbx-go    |   | google.golang.org/grpc.ClientConn |                    
|  (app-agnostic MDBX go bindings)  |   | (app-agnostic RPC and streaming)  |
+-----------------------------------+   +-----------------------------------+
                  |                                      |
                  |                                      |
                  v                                      v
+-----------------------------------+   +-----------------------------------+
|       ethdb/kv_mdbx.go            |   |       ethdb/kv_remote.go          |                
|  (tg-specific MDBX implementaion) |   |   (tg-specific remote DB access)  |              
+-----------------------------------+   +-----------------------------------+
                  |                                      |
                  |                                      |
                  v                                      v    
+----------------------------------------------------------------------------------------------+
|                                       eth/kv_interface.go                                   |  
|         (Common KV interface. DB-friendly, disk-friendly, cpu-cache-friendly.                |
|           Same app code can work with local or remote database.                              |
|           Allows experiment with another database implementations.                           |
|          Supports context.Context for cancelation. Any operation can return error)           |
+----------------------------------------------------------------------------------------------+

Then:
turbo/snapshotsync/block_reader.go.go
erigon-lib/state/aggregator_v3.go

Then:
kv_temporal.go

ethdb.AbstractKV design:

  • InMemory, ReadOnly: NewMDBX().Flags(mdbx.ReadOnly).InMem().Open()

  • MultipleDatabases, Customization: NewMDBX().Path(path).WithBucketsConfig(config).Open()

  • 1 Transaction object can be used only within 1 goroutine.

  • Only 1 write transaction can be active at a time (other will wait).

  • Unlimited read transactions can be active concurrently (not blocked by write transaction).

  • Methods db.Update, db.View - can be used to open and close short transaction.

  • Methods Begin/Commit/Rollback - for long transaction.

  • it's safe to call .Rollback() after .Commit(), multiple rollbacks are also safe. Common transaction patter:

tx, err := db.Begin(true, ethdb.RW)
if err != nil {
    return err
}
defer tx.Rollback() // important to avoid transactions leak at panic or early return

// ... code which uses database in transaction
 
err := tx.Commit()
if err != nil {
    return err
}
  • No internal copies/allocations. It means: 1. app must copy keys/values before put to database. 2. Data after read from db - valid only during current transaction - copy it if plan use data after transaction Commit/Rollback.

  • Methods .Bucket() and .Cursor(), cant return nil, can't return error.

  • Bucket and Cursor - are interfaces - means different classes can satisfy it: for example MdbxCursor and MdbxDupSortCursor classes satisfy it. If your are not familiar with "DupSort" concept, please read dupsort.md

  • If Cursor returns err!=nil then key SHOULD be != nil (can be []byte{} for example). Then traversal code look as:

for k, v, err := c.First(); k != nil; k, v, err = c.Next() {
if err != nil {
return err
}
// logic
}
  • Move cursor: cursor.Seek(key)

ethdb.Database design:

  • Allows pass multiple implementations
  • Allows traversal tables by db.Walk

ethdb.TxDb design:

  • holds inside 1 long-running transaction and 1 cursor per table
  • method Begin DOESN'T create new TxDb object, it means this object can be passed into other objects by pointer, and high-level app code can start/commit transactions when it needs without re-creating all objects which holds TxDb pointer.
  • This is reason why txDb.CommitAndBegin() method works: inside it creating new transaction object, pinter to TxDb stays valid.

How to dump/load table

Install all database tools: make db-tools

./build/bin/mdbx_dump -a <datadir>/erigon/chaindata | lz4 > dump.lz4
lz4 -d < dump.lz4 | ./build/bin/mdbx_load -an <datadir>/erigon/chaindata

How to get table checksum

./build/bin/mdbx_dump -s table_name <datadir>/erigon/chaindata | tail -n +4 | sha256sum # tail here is for excluding header 

Header example:
VERSION=3
geometry=l268435456,c268435456,u25769803776,s268435456,g268435456
mapsize=756375552
maxreaders=120
format=bytevalue
database=TBL0001
type=btree
db_pagesize=4096
duplicates=1
dupsort=1
HEADER=END