This is a non functional change which consolidates the various packages
under metrics into the top level package now that the dead code is
removed.
It is a precursor to the removal of Victoria metrics after which all
erigon metrics code will be contained in this single package.
This is an update of:
https://github.com/ledgerwatch/erigon/pull/7846
which uses a local fork of victoria metrics to include the changes that
https://github.com/anshalshukla added to the original for we where
using.
It also includes code to address the duplicate metrics issue identified
here:
https://github.com/ledgerwatch/erigon/issues/8053
It has one more associated fix which is to correctly add a metadata
label to counters, these where previously labelled as gauges.
e.g.
```
# TYPE p2p_peers counter
p2p_peers 0
```
rather than
```
# TYPE p2p_peers gauge
p2p_peers 0
```
---------
Co-authored-by: Anshal Shukla <53994948+anshalshukla@users.noreply.github.com>
Co-authored-by: Anshal Shukla <shukla.anshal85@gmail.com>
I have added:
```go
{
ID: stages.BorHeimdall,
Description: "Download Bor-specific data from Heimdall",
Forward: func(firstCycle bool, badBlockUnwind bool, s *StageState, u Unwinder, tx kv.RwTx, logger log.Logger) error {
if badBlockUnwind {
return nil
}
return BorHeimdallForward(s, u, ctx, tx, borHeimdallCfg, true, logger)
},
Unwind: func(firstCycle bool, u *UnwindState, s *StageState, tx kv.RwTx, logger log.Logger) error {
return BorHeimdallUnwind(u, ctx, s, tx, borHeimdallCfg)
},
Prune: func(firstCycle bool, p *PruneState, tx kv.RwTx, logger log.Logger) error {
return BorHeimdallPrune(p, ctx, tx, borHeimdallCfg)
},
},
```
To MiningStages as well as Default as otherwise bor events are not added
when the block producer creates new blocks.
There are a couple of questions I have around this implementation:
* Is this the right place to add this
* As the state is also executed when the default stage is processed ther
is some duplicate processing for the block producing node.
* There is a duplicated call to heimdall which could be removed if the
stages share state - but its not clear if we want to do this.
* I don't think the mining stage needs to prune as this will be
replicated in the default iteration
This can be tested using the devnet with the following arguments:
```
--chain bor-devnet --bor.localheimdall --scenarios state-sync
```
This will generate sync events via an ethereum devnet which are
transmitted to bor chain and will be executed at the end of the snapshot
delay, which results in events generated from the bor chain. This tests
the whole sync, block generation, event lifecycle. As it needs to wait
for sprints to end after a sufficient delay it is quite slow to run.
The current logic is flawed, because it drops all peers that are less
synced.
It is valid to return empty responses by the eth spec.
A proper logic should penalize from the context of the sync process,
where enough "reputation" data is collected about a peer.
In order to be able to connect to erigon 2.48 peers that have
--sentry.drop-useless-peers enabled,
this adds a check to not reply with an empty headers list.
If we reply with an empty list, we're going to be considered useless and
kicked.
Once enough of erigon nodes are updated in the network past this commit,
this check should be removed,
because it is totally acceptable to return an empty list by the eth
spec.
Currently PropagateNewBlockHashes and BroadcastNewBlock
selects a subset of all sentries by taking a `Sqrt(len(sentries))`,
and then for each sentry SendMessageToRandomPeers
selects a subset of its peers by taking `Sqrt(len(peerInfos))`.
This behaviour limits the broadcast scope with a lot of peers, e.g. 100
becomes 10,
but is not great with very few peers, or if the message is very
important
to broadcast to everyone, which is the case of bor validator/proposer
nodes.
* send to all sentries in both BroadcastNewBlock and PropagateNewBlockHashes
* remove peerCountConstrained sqrt logic in SendMessageToRandomPeers
* add maxPeers provider func as a parameter to MultiClient
* default it to 10 for eth and 0 (unlimited) for bor validators
---------
Co-authored-by: Mark Holt <mark@distributed.vision>
[txpool](https://github.com/ledgerwatch/erigon-lib/blob/main/txpool/pool.go)
expects an `OnNewBlock` update only after the DB transaction is
committed.
This fixes, for example, a nonce gap mis-detection in Hive test
"engine-cancun/Blob Transactions On Block 1, Cancun Genesis".
Otterscan API search methods allow the user to inform the page size.
This PR adds an internal max (default == 25 results) to cap the page
size, regardless of what the user asks.
It also adds a `--ots.search.max.pagesize` CLI args to override this max
(either in erigon and rpcdaemon binaries).
Miracoulously, hive tests pass first try. YIPPIE.
Also for the future, I added `--experimental.modular` which enables a
secondary engine API for consensus separation.
Now block building is responsibility of the execution module.
An update to the devnet to introduce a local heimdall to facilitate
multiple validators without the need for an external process, and hence
validator registration/staking etc.
In this initial release only span generation is supported.
It has the following changes:
* Introduction of a local grpc heimdall interface
* Allocation of accounts via a devnet account generator ()
* Introduction on 'Services' for the network config
"--chain bor-devnet --bor.localheimdall" will run a 2 validator network
with a local service
"--chain bor-devnet --bor.withoutheimdall" will sun a single validator
with no heimdall service as before
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
This request implements the insertion of Bor ephemeral transactions into
snapshot indexes.
I does this by taking the block hash from the header index and passing
it to the transaction indexer to add an additional index entry per block
into the transaction hash -> block index.
The passed entries are currently contained in an in memory array which
is (32 * number of blocks / sprint size) bytes.
In addition to the functional code there is also an update to the
`dump_test.go` so that it runs `DumpBlocks` to exercise the indexing
code. To facilitate this the `InsertChain` method in `mock_sentry` has
been modified so that it can process >128 blocks.
The code in this request also includes additional bor/consensus code
with the following functions:
`CalculateSprint`
`CalculateSprintCount`
The first function is a modification of the code in erigon-lib so that
the sprints are numerically rather than lexically ordered. This code
should be migrated to erigon-lib and should have its sprint set
calculated once from its underlying map rather than this process being
repeated every calculation.
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
Co-authored-by: ledgerwatch <akhounov@gmail.com>
Co-authored-by: Enrique Jose Avila Asapche <eavilaasapche@gmail.com>
Co-authored-by: Giulio <giulio.rebuffo@gmail.com>
This PR separates ENGINE from Ethbackend. It makes it so:
1) EthBackend not a god class
2) We can abstract away engine API so that we can make it CL-like and
enable Consensus-Execution driven design
3) Objective is Json-RPC -> Engine Consensus Module -> Execution module.
The fixes here fix a couple of issues related to devnet start-up
1. macos threading and syscall error return where causing multi node
start to both not wait and fail
2. On windows creating DB's with the default 2 TB mapsize causes the os
to reserve about 4GB of committed memory per DB. This may not be used -
but is reserved by the OS - so a default bor node reserves around 10GB
of storage. Starting many nodes causes the OS page file to become
exhausted.
To fix this the consensus DB's now use the node's OpenDatabase function
rather than their own, which means that the consensus DB's take notice
of the config.MdbxDBSizeLimit.
This fix leaves one 4GB committed memory allocation in the TX pool which
needs its own MapSize setting.
---------
Co-authored-by: Alex Sharp <akhounov@gmail.com>
Added support tunnel to the devnet cmd. In order to get this to run I
made the following changes:
* Create a public function
* Added non root logging
I have also added commentary to the readme to explain the additional
command line arguments needed to integrate with diagnostics. In summary,
if you set the --diagnostics.url the devenet will wait for diagnostic
requests rather than exiting
---------
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
- breaks dependency from staged_sync to package with block_reader
implementation
- breaks dependency from snap_sync to package with block_reader
implementation
- breaks dependency from mining to txpool implementation
This is an update to the devnet code which introduces the concept of
configurable scenarios. This replaces the previous hard coded execution
function.
The intention is that now both the network and the operations to run on
the network can be described in a data structure which is configurable
and composable.
The operating model is to create a network and then ask it to run
scenarios:
```go
network.Run(
runCtx,
scenarios.Scenario{
Name: "all",
Steps: []*scenarios.Step{
&scenarios.Step{Text: "InitSubscriptions", Args: []any{[]requests.SubMethod{requests.Methods.ETHNewHeads}}},
&scenarios.Step{Text: "PingErigonRpc"},
&scenarios.Step{Text: "CheckTxPoolContent", Args: []any{0, 0, 0}},
&scenarios.Step{Text: "SendTxWithDynamicFee", Args: []any{recipientAddress, services.DevAddress, sendValue}},
&scenarios.Step{Text: "AwaitBlocks", Args: []any{2 * time.Second}},
},
})
```
The steps here refer to step handlers which can be defined as follows:
```go
func init() {
scenarios.MustRegisterStepHandlers(
scenarios.StepHandler(GetBalance),
)
}
func GetBalance(ctx context.Context, addr string, blockNum requests.BlockNumber, checkBal uint64) {
...
```
This commit is an initial implementation of the scenario running - which
is working, but will need to be enhanced to make it more usable &
developable.
The current version of the code is working and has been tested with the
dev network, and bor withoutheimdall. There is a multi miner bor
heimdall configuration but this is yet to be tested.
Note that by default the scenario runner picks nodes at random on the
network to send transactions to. this causes the dev network to run very
slowly as it seems to take a long time to include transactions where the
nonce is incremented across nodes. It seems to take a long time for the
nonce to catch up in the transaction pool processing. This is yet to be
investigated.
attempt to address next issue:
> when I'm having a lot of websocket connections the node is freezing
and then it needs like 10 mins to sync. Then if I keep pushing requests
it falls out of sync all the time
This request adds an additional logging flag to change the name of the
logfiles produced by erigon to be prefixed by a name other than
'erigon'.
This allows multiple nodes to log to the same directory without
overwriting each others files. It is requires so that the devnet when
running can maintain all of its log files in a single consolidated
directory which survives devnet restarts.
---------
Co-authored-by: Mark Holt <mark@distributed.vision>
- allow store non-canonical blocks/senders
- optimize re-org: don't update/delete most of data
- allow mark chain as `Bad` - will be not visible by eth_getBlockByHash,
but can read if have hash+num
- stage_senders: don't re-calc existing senders
- stage_tx_lookup: prune less blocks per iteration - because
random-deletes are expensive. pruning must not slow-down sync.
- prune data even if --snap.stop is set
- "prune as-much-as-possible at startup" is not very good idea: at
initialCycle machine can be cold and prune will cause big downtime, no
reason to produce much freelist in 1 tx. People may also restart erigon
- because of some bug - and it will cause unexpected downtime (usually
Erigon startup very fast). So, I just remove all `initialSync`-related
logic in pruning.
- fix lost metrics about disk write byte/sec
it's step towards saving canonical and non-canonical bodies in same
table (and txs also in same own table). to reduce write amplification
(cheaper re-orgs)
PR change: reading BaseTxNum from existing snapshots instead of DB
DB will store in field body.BaseTxNum - non-canonical TxnID
Snapshots will store only canonical TxNum in field body.BaseTxNum
## What's this PR about?
- Added states to be sent to diagnostics system for header downloader
monitor
- Added the code for sending the states through the tunnel
- Code added for updating the states in the header_algos.go file
---------
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
- always RLock all snapshots - to guarantee consistency
- introduce class View (analog of RoTx and MakeContext)
- move read methods to View object
- View object will be managed by temporal_tx
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
## What's this PR is about?
Minor fix in args usage message of support flag. The current message
says that the flag should be 'metrics.url' but it reality it should be
'metrics.urls'