Closes#8078
This change is primarily intended to support go 1.21, but as a
side-effect requires updating libp2p, which in turn triggers an update
of golang.org/x/exp which creates quite a bit of (simple) churn in the
slice sorting.
This change introduces a new `cmp.Compare` function which can be used to
return an integer satisfying the compare interface for slice sorting.
In order to continue to support mplex for libp2p, the change references
github.com/libp2p/go-libp2p-mplex instead. Please see the PR at
https://github.com/libp2p/go-libp2p/pull/2498 for the official usptream
comment that indicates official support for mplex being moved to this
location.
Co-authored-by: Jason Yellick <jason@enya.ai>
Add code to the headers state to break processing if a bor milestone
rewind is detected.
The rewind processing happens in the bor/heimdall stage - this change
just avoids unnecessary header loading
if a milestone fork is likely to be detected
---------
Co-authored-by: Anshal Shukla <shukla.anshal85@gmail.com>
Code to support react based UI for diagnostics:
* pprof, prometheus and diagnistics rationalized to use a single router
(i.e. they can all run in the same port)
* support_cmd updated to support node routing (was only first node)
* Multi content support in router tunnel (application/octet-stream &
appliaction/json)
* Routing requests changed from using http forms to rest + query params
* REST query requests can now be made against erigon base port and
diagnostics with the same url format/params
---------
Co-authored-by: dvovk <vovk.dimon@gmail.com>
Co-authored-by: Mark Holt <mark@disributed.vision>
Some peer-review changes from the last related PR.
Addition of a flag for BlobSlots - for max allowed blobs per account in
txpool.
Use BlobFee from the block to validate txs in the pool.
See also https://github.com/ledgerwatch/erigon-lib/pull/1125
This PR adds the ability to pass `input` instead of `data` in
`eth_call`.
Go-ethereum has support for the `input`-attribute (instead of the
`data`) in `eth_call` for some time. The recent go-ethereum release
v1.13.0 changed the ethclient to do a `eth_call` with the `input`
attribute rather than "data". This makes the latest go-ethereum
ethclient incompatible with erigon.
Co-authored-by: lmittmann <lmittmann@users.noreply.github.com>
This is the second part of the bor milestone release it contains the
following changes:
* Initialize services
* This is a change from the initial pull request I have moved all of the
initialization to the bor engine. To facilitate this I have just passed
in the heimdall client interface, rather than the whole engine
* Stage processing
* This is also a change from the original PR - the code is contained in
the bor heimdall stage rather than in headers - the effect should be the
same, but this needs testing
---------
Co-authored-by: Mark Holt <mark@disributed.vision>
Co-authored-by: Anshal Shukla <shukla.anshal85@gmail.com>
For example, erigon on devnet8 marked a block as bad due to
"mdbx_cursor_open: cannot allocate memory":
```
[INFO] [09-12|04:57:36.041] [NewPayload] Handling new payload height=171035 hash=0x321dea00c4853ee354bebaf8aef3e63fbe06c4508271c0db4c92b0f087aedc3b
171034
[WARN] [09-12|04:57:36.069] Could not validate block err="[3/7 BlockHashes] table: Header, mdbx_cursor_open: cannot allocate memory, stack: [kv_mdbx.go:1057 kv_mdbx.
go:1069 kv_mdbx.go:1077 memory_mutation.go:473 memory_mutation.go:502 etl.go:123 etl.go:96 block_writer.go:40 stage_blockhashes.go:49 default_stages.go:457 sync.go:425 sync.go:258 s
tageloop.go:414 backend.go:476 fork_validator.go:250 fork_validator.go:156 ethereum_execution.go:151 execution_client.go:51 chain_reader.go:252 engine_server.go:741 engine_server.go
:235 engine_server.go:600 value.go:586 value.go:370 service.go:224 handler.go:494 handler.go:444 handler.go:392 handler.go:223 handler.go:316 asm_amd64.s:1598]"
[WARN] [09-12|04:57:36.069] ethereumExecutionModule.ValidateChain: chain is invalid hash=0x321dea00c4853ee354bebaf8aef3e63fbe06c4508271c0db4c92b0f087aedc3b
```
With this PR blocks are marked as bad only on genuine protocol errors.
This is a non functional change which consolidates the various packages
under metrics into the top level package now that the dead code is
removed.
It is a precursor to the removal of Victoria metrics after which all
erigon metrics code will be contained in this single package.
This is an update of:
https://github.com/ledgerwatch/erigon/pull/7846
which uses a local fork of victoria metrics to include the changes that
https://github.com/anshalshukla added to the original for we where
using.
It also includes code to address the duplicate metrics issue identified
here:
https://github.com/ledgerwatch/erigon/issues/8053
It has one more associated fix which is to correctly add a metadata
label to counters, these where previously labelled as gauges.
e.g.
```
# TYPE p2p_peers counter
p2p_peers 0
```
rather than
```
# TYPE p2p_peers gauge
p2p_peers 0
```
---------
Co-authored-by: Anshal Shukla <53994948+anshalshukla@users.noreply.github.com>
Co-authored-by: Anshal Shukla <shukla.anshal85@gmail.com>
I have added:
```go
{
ID: stages.BorHeimdall,
Description: "Download Bor-specific data from Heimdall",
Forward: func(firstCycle bool, badBlockUnwind bool, s *StageState, u Unwinder, tx kv.RwTx, logger log.Logger) error {
if badBlockUnwind {
return nil
}
return BorHeimdallForward(s, u, ctx, tx, borHeimdallCfg, true, logger)
},
Unwind: func(firstCycle bool, u *UnwindState, s *StageState, tx kv.RwTx, logger log.Logger) error {
return BorHeimdallUnwind(u, ctx, s, tx, borHeimdallCfg)
},
Prune: func(firstCycle bool, p *PruneState, tx kv.RwTx, logger log.Logger) error {
return BorHeimdallPrune(p, ctx, tx, borHeimdallCfg)
},
},
```
To MiningStages as well as Default as otherwise bor events are not added
when the block producer creates new blocks.
There are a couple of questions I have around this implementation:
* Is this the right place to add this
* As the state is also executed when the default stage is processed ther
is some duplicate processing for the block producing node.
* There is a duplicated call to heimdall which could be removed if the
stages share state - but its not clear if we want to do this.
* I don't think the mining stage needs to prune as this will be
replicated in the default iteration
This can be tested using the devnet with the following arguments:
```
--chain bor-devnet --bor.localheimdall --scenarios state-sync
```
This will generate sync events via an ethereum devnet which are
transmitted to bor chain and will be executed at the end of the snapshot
delay, which results in events generated from the bor chain. This tests
the whole sync, block generation, event lifecycle. As it needs to wait
for sprints to end after a sufficient delay it is quite slow to run.
The current logic is flawed, because it drops all peers that are less
synced.
It is valid to return empty responses by the eth spec.
A proper logic should penalize from the context of the sync process,
where enough "reputation" data is collected about a peer.
In order to be able to connect to erigon 2.48 peers that have
--sentry.drop-useless-peers enabled,
this adds a check to not reply with an empty headers list.
If we reply with an empty list, we're going to be considered useless and
kicked.
Once enough of erigon nodes are updated in the network past this commit,
this check should be removed,
because it is totally acceptable to return an empty list by the eth
spec.
Currently PropagateNewBlockHashes and BroadcastNewBlock
selects a subset of all sentries by taking a `Sqrt(len(sentries))`,
and then for each sentry SendMessageToRandomPeers
selects a subset of its peers by taking `Sqrt(len(peerInfos))`.
This behaviour limits the broadcast scope with a lot of peers, e.g. 100
becomes 10,
but is not great with very few peers, or if the message is very
important
to broadcast to everyone, which is the case of bor validator/proposer
nodes.
* send to all sentries in both BroadcastNewBlock and PropagateNewBlockHashes
* remove peerCountConstrained sqrt logic in SendMessageToRandomPeers
* add maxPeers provider func as a parameter to MultiClient
* default it to 10 for eth and 0 (unlimited) for bor validators
---------
Co-authored-by: Mark Holt <mark@distributed.vision>
[txpool](https://github.com/ledgerwatch/erigon-lib/blob/main/txpool/pool.go)
expects an `OnNewBlock` update only after the DB transaction is
committed.
This fixes, for example, a nonce gap mis-detection in Hive test
"engine-cancun/Blob Transactions On Block 1, Cancun Genesis".
Otterscan API search methods allow the user to inform the page size.
This PR adds an internal max (default == 25 results) to cap the page
size, regardless of what the user asks.
It also adds a `--ots.search.max.pagesize` CLI args to override this max
(either in erigon and rpcdaemon binaries).
Miracoulously, hive tests pass first try. YIPPIE.
Also for the future, I added `--experimental.modular` which enables a
secondary engine API for consensus separation.
Now block building is responsibility of the execution module.
An update to the devnet to introduce a local heimdall to facilitate
multiple validators without the need for an external process, and hence
validator registration/staking etc.
In this initial release only span generation is supported.
It has the following changes:
* Introduction of a local grpc heimdall interface
* Allocation of accounts via a devnet account generator ()
* Introduction on 'Services' for the network config
"--chain bor-devnet --bor.localheimdall" will run a 2 validator network
with a local service
"--chain bor-devnet --bor.withoutheimdall" will sun a single validator
with no heimdall service as before
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
This request implements the insertion of Bor ephemeral transactions into
snapshot indexes.
I does this by taking the block hash from the header index and passing
it to the transaction indexer to add an additional index entry per block
into the transaction hash -> block index.
The passed entries are currently contained in an in memory array which
is (32 * number of blocks / sprint size) bytes.
In addition to the functional code there is also an update to the
`dump_test.go` so that it runs `DumpBlocks` to exercise the indexing
code. To facilitate this the `InsertChain` method in `mock_sentry` has
been modified so that it can process >128 blocks.
The code in this request also includes additional bor/consensus code
with the following functions:
`CalculateSprint`
`CalculateSprintCount`
The first function is a modification of the code in erigon-lib so that
the sprints are numerically rather than lexically ordered. This code
should be migrated to erigon-lib and should have its sprint set
calculated once from its underlying map rather than this process being
repeated every calculation.
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
Co-authored-by: ledgerwatch <akhounov@gmail.com>
Co-authored-by: Enrique Jose Avila Asapche <eavilaasapche@gmail.com>
Co-authored-by: Giulio <giulio.rebuffo@gmail.com>
This PR separates ENGINE from Ethbackend. It makes it so:
1) EthBackend not a god class
2) We can abstract away engine API so that we can make it CL-like and
enable Consensus-Execution driven design
3) Objective is Json-RPC -> Engine Consensus Module -> Execution module.
The fixes here fix a couple of issues related to devnet start-up
1. macos threading and syscall error return where causing multi node
start to both not wait and fail
2. On windows creating DB's with the default 2 TB mapsize causes the os
to reserve about 4GB of committed memory per DB. This may not be used -
but is reserved by the OS - so a default bor node reserves around 10GB
of storage. Starting many nodes causes the OS page file to become
exhausted.
To fix this the consensus DB's now use the node's OpenDatabase function
rather than their own, which means that the consensus DB's take notice
of the config.MdbxDBSizeLimit.
This fix leaves one 4GB committed memory allocation in the TX pool which
needs its own MapSize setting.
---------
Co-authored-by: Alex Sharp <akhounov@gmail.com>
Added support tunnel to the devnet cmd. In order to get this to run I
made the following changes:
* Create a public function
* Added non root logging
I have also added commentary to the readme to explain the additional
command line arguments needed to integrate with diagnostics. In summary,
if you set the --diagnostics.url the devenet will wait for diagnostic
requests rather than exiting
---------
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
- breaks dependency from staged_sync to package with block_reader
implementation
- breaks dependency from snap_sync to package with block_reader
implementation
- breaks dependency from mining to txpool implementation
This is an update to the devnet code which introduces the concept of
configurable scenarios. This replaces the previous hard coded execution
function.
The intention is that now both the network and the operations to run on
the network can be described in a data structure which is configurable
and composable.
The operating model is to create a network and then ask it to run
scenarios:
```go
network.Run(
runCtx,
scenarios.Scenario{
Name: "all",
Steps: []*scenarios.Step{
&scenarios.Step{Text: "InitSubscriptions", Args: []any{[]requests.SubMethod{requests.Methods.ETHNewHeads}}},
&scenarios.Step{Text: "PingErigonRpc"},
&scenarios.Step{Text: "CheckTxPoolContent", Args: []any{0, 0, 0}},
&scenarios.Step{Text: "SendTxWithDynamicFee", Args: []any{recipientAddress, services.DevAddress, sendValue}},
&scenarios.Step{Text: "AwaitBlocks", Args: []any{2 * time.Second}},
},
})
```
The steps here refer to step handlers which can be defined as follows:
```go
func init() {
scenarios.MustRegisterStepHandlers(
scenarios.StepHandler(GetBalance),
)
}
func GetBalance(ctx context.Context, addr string, blockNum requests.BlockNumber, checkBal uint64) {
...
```
This commit is an initial implementation of the scenario running - which
is working, but will need to be enhanced to make it more usable &
developable.
The current version of the code is working and has been tested with the
dev network, and bor withoutheimdall. There is a multi miner bor
heimdall configuration but this is yet to be tested.
Note that by default the scenario runner picks nodes at random on the
network to send transactions to. this causes the dev network to run very
slowly as it seems to take a long time to include transactions where the
nonce is incremented across nodes. It seems to take a long time for the
nonce to catch up in the transaction pool processing. This is yet to be
investigated.
attempt to address next issue:
> when I'm having a lot of websocket connections the node is freezing
and then it needs like 10 mins to sync. Then if I keep pushing requests
it falls out of sync all the time
This request adds an additional logging flag to change the name of the
logfiles produced by erigon to be prefixed by a name other than
'erigon'.
This allows multiple nodes to log to the same directory without
overwriting each others files. It is requires so that the devnet when
running can maintain all of its log files in a single consolidated
directory which survives devnet restarts.
---------
Co-authored-by: Mark Holt <mark@distributed.vision>
- allow store non-canonical blocks/senders
- optimize re-org: don't update/delete most of data
- allow mark chain as `Bad` - will be not visible by eth_getBlockByHash,
but can read if have hash+num
- stage_senders: don't re-calc existing senders
- stage_tx_lookup: prune less blocks per iteration - because
random-deletes are expensive. pruning must not slow-down sync.
- prune data even if --snap.stop is set
- "prune as-much-as-possible at startup" is not very good idea: at
initialCycle machine can be cold and prune will cause big downtime, no
reason to produce much freelist in 1 tx. People may also restart erigon
- because of some bug - and it will cause unexpected downtime (usually
Erigon startup very fast). So, I just remove all `initialSync`-related
logic in pruning.
- fix lost metrics about disk write byte/sec
it's step towards saving canonical and non-canonical bodies in same
table (and txs also in same own table). to reduce write amplification
(cheaper re-orgs)
PR change: reading BaseTxNum from existing snapshots instead of DB
DB will store in field body.BaseTxNum - non-canonical TxnID
Snapshots will store only canonical TxNum in field body.BaseTxNum
## What's this PR about?
- Added states to be sent to diagnostics system for header downloader
monitor
- Added the code for sending the states through the tunnel
- Code added for updating the states in the header_algos.go file
---------
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
- always RLock all snapshots - to guarantee consistency
- introduce class View (analog of RoTx and MakeContext)
- move read methods to View object
- View object will be managed by temporal_tx
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
## What's this PR is about?
Minor fix in args usage message of support flag. The current message
says that the flag should be 'metrics.url' but it reality it should be
'metrics.urls'
Blob transactions are SSZ encoded, so it had to be added to decoding.
There are 2 encoding forms: `network` and `minimal` (usual). Network
encoded blob transactions include "wrapper data" which are `kzgs`,
`blobs` and `proofs`, and decoded by `DecodeWrappedTransaction`. For
previous types of transactions the network encoding is no different.
Execution-payloads / blocks use the minimal encoding of transactions. In
the transaction-pool and local transaction-journal the network encoding
is used.
Concerns:
1. Possible performance reduction caused by these changes, not sure if
streams are better then slices. Go slices in this modifications are
read-only, so they should be referred to the same underlying array and
passed by a reference.
2. If `DecodeWrappedTransaction` and `DecodeTransaction` will create
confusion and should be merged into one function.
This is the beginning of the series of changes to make it possible to
run multiple instances of erigon inside a single process (as devnet tool
does), with the logging from these processes going to respective log
files correctly.
This is the first part where the initial infrastructure is being
established
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
this first major move separates the transient beacon state cache from
the underlying tree.
leaf updates are enforced in the setters, which should make programming
easier.
all exported methods of the raw.BeaconState should be safe to call
(without disrupting internal state)
changes many functions to consume *raw.BeaconState in perparation for
interface
beyond refactor it also:
adds a pool for the leaves of the validator ssz hash
adds a pool for the snappy writers
removed the parallel hash experiment (high memory use)