Commit Graph

1310 Commits

Author SHA1 Message Date
Alex Sharov
eb4685f3e2
mode to produce block snapshots (#9053)
`STAGES_ONLY_BLOCKS=true` may help to produce BlockSnaps by Erigon2 on
weak machines - it disabling all stages after StageSenders.
2023-12-22 05:43:30 +00:00
battlmonstr
55d37b938c
bor: spanID calculation refactoring (#9040) 2023-12-21 09:52:00 +01:00
Alex Sharov
379a5f8ea1
harness test: to use Mock for genesis write - for e35 compatibility (#9012)
`e35` doesn't write genesis state by special func anymore - and Mock
using `m.InsertBlocks` to process genesis block (as any other block).
2023-12-21 08:07:29 +07:00
Alex Sharov
1468317efd
erigon snapshots index: build bor indices (#9009) 2023-12-18 17:46:50 +07:00
milen
d23f306fc1
borheimdall: add tests for state sync events persistence (#8989)
- adds a test for persisting state sync events at the beginning of every
sprint
2023-12-15 15:07:39 +02:00
milen
1a6b83b82c
borheimdall: add test for span persistence (#8988)
1. Adds an eth/stagedsync/test package which provides a test Harness
object
2. Adds the first automated test to the bor-heimdall stage regarding
span persistence (more to come in subsequent PRs)
3. Fixes a bug in the bor-heimdall stage which was uncovered with the
test - we do not fetch span 0 when we sync straight from blockNum=0
without snapshots
4. Reorganises all mocks to be placed under ./mock sub-package within
their respective packages
2023-12-14 22:50:59 +02:00
Alex Sharov
d41d523050
Downloader: add ProhibitNewDownloads() (#8939)
"whitelisting" mechanism (list of files - stored in DB) - which
protecting us from downloading new files after upgrade/downgrade was
broken. And seems it became over-complicated with time.
I replacing it by 1 persistent flag inside downloader:
"prohibit_new_downloads.lock"
Erigon will turn downloader into this mode after
downloading/verification of first snapshots.


```
//Corner cases:
	// - Erigon generated file X with hash H1. User upgraded Erigon. New version has preverified file X with hash H2. Must ignore H2 (don't send to Downloader)
	// - Erigon "download once": means restart/upgrade/downgrade must not download files (and will be fast)
	// - After "download once" - Erigon will produce and seed new files
```

------
`downloader --seedbox` is never "prohibit new downloads"
2023-12-12 16:05:56 +07:00
Alex Sharov
7fb8f9db59
log blocks stat after downloading, before indexing (#8947) 2023-12-11 10:05:51 +00:00
Mark Holt
624e4d4201
Fix snap initialization from frozen blocks (#8908)
This fixes a bug on syncing from scratch if the start point is in a
frozen block.
2023-12-06 14:28:05 +00:00
Mark Holt
85ade6b49a
FIx outstanding know header==nil errors + reduce bor heimdall logging (#8878)
This PR has fixes for a number of instances in the bor heimdall stage
where nil headers are either ignored or inadvertently processed.

It also has a demotion of milestone related logging messages to debug
for missing blocks because the process is not at the head of the chain +
a general reduction in periodic logging to 30 secs rather than 20 to
reduce the log output on long runs.

In addition there is a refactor of persistValidatorSets to perform
validator set initiation in a seperate function. This is intended to
clarify the operation of persistValidatorSets - which is till performing
2 actions, persisting the snapshot and then using it to check the header
against synthesized validator set in the snapshot.
2023-12-01 17:52:50 +00:00
battlmonstr
bc0b727fc0
silkworm: use silkworm-go bindings (#8829) 2023-11-30 12:45:02 +01:00
milen
230b013096
metrics: separate usage of prometheus counter and gauge interfaces (#8793) 2023-11-24 16:15:12 +01:00
Pratik Patil
59909a7efe
Added TxDependency Metadata to ExtraData in Block Header in Bor for Block-STM (#8037)
This PR adds support to store the transaction dependency (generated by
the block producer) in the block header for bor. This transaction
dependency will then be used by the parallel processor
([Block-STM](https://github.com/ledgerwatch/erigon/pull/7812/)).

I have created another
[PR](https://github.com/ledgerwatch/erigon-lib/pull/1064) in the
erigon-lib repo which adds the `IsParallelUniverse()` function.
2023-11-24 10:26:33 +00:00
Alex Sharov
43b8cbbdeb
bor: don't hide ctx.Err() (#8792)
log `ctx.Err()` - it can be canceled by many reasons: timeout, etc...
2023-11-22 09:27:50 +07:00
milen
34c0fe29ad
metrics: swap remaining VictoriaMetrics usages with erigon-lib/metrics (#8762)
# Background

Erigon currently uses a combination of Victoria Metrics and Prometheus
client for providing metrics.

We want to rationalize this and use only the Prometheus client library,
but we want to maintain the simplified Victoria Metrics methods for
constructing metrics.

This task is currently partly complete and needs to be finished to a
stage where we can remove the Victoria Metrics module from the Erigon
code base.

# Summary of changes

- Adds missing `NewCounter`, `NewSummary`, `NewHistogram`,
`GetOrCreateHistogram` functions to `erigon-lib/metrics` similar to the
interface VictoriaMetrics lib provides
- Minor tidy up for consistency inside `erigon-lib/metrics/set.go`
around return types (panic vs err consistency for funcs inside the
file), error messages, comments
- Replace all remaining usages of `github.com/VictoriaMetrics/metrics`
with `github.com/ledgerwatch/erigon-lib/metrics` - seamless (only import
changes) since interfaces match
2023-11-20 12:23:23 +00:00
Alex Sharov
daacd71116
bor: PersistValidatorSets nil header check (#8791) 2023-11-20 16:27:38 +07:00
Alex Sharov
d557679a0b
bor: check nil-header (#8779) 2023-11-20 12:27:42 +07:00
Giulio rebuffo
8d8368091c
Add full support to beacon snapshots (#8665)
This PR adds beacon blocks snapshots for the following chains:

* Mainnet snapshots
* Sepolia snapshots
2023-11-13 14:10:57 +01:00
Mark Holt
509a7af26a
Discovery zero refresh timer (#8661)
This fixes an issue where the mumbai testnet node struggle to find
peers. Before this fix in general test peer numbers are typically around
20 in total between eth66, eth67 and eth68. For new peers some can
struggle to find even a single peer after days of operation.

These are the numbers after 12 hours or running on a node which
previously could not find any peers: eth66=13, eth67=76, eth68=91.

The root cause of this issue is the following:

- A significant number of mumbai peers around the boot node return
network ids which are different from those currently available in the
DHT
- The available nodes are all consequently busy and return 'too many
peers' for long periods

These issues case a significant number of discovery timeouts, some of
the queries will never receive a response.

This causes the discovery read loop to enter a channel deadlock - which
means that no responses are processed, nor timeouts fired. This causes
the discovery process in the node to stop. From then on it just
re-requests handshakes from a relatively small number of peers.

This check in fixes this situation with the following changes:

- Remove the deadlock by running the timer in a separate go-routine so
it can run independently of the main request processing.
- Allow the discovery process matcher to match on port if no id match
can be established on initial ping. This allows subsequent node
validation to proceed and if the node proves to be valid via the
remainder of the look-up and handshake process it us used as a valid
peer.
- Completely unsolicited responses, i.e. those which come from a
completely unknown ip:port combination continue to be ignored.
-
2023-11-07 08:48:58 +00:00
Giulio rebuffo
4b580dcc2f
update caplin snapshots hashes (#8663)
This PR also adds snippets to download caplin snapshots
2023-11-06 21:05:07 +01:00
ledgerwatch
1185587b20
Move validator set snapshot computation to bor_heimdall stage (#8646)
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
2023-11-06 08:24:33 +00:00
ledgerwatch
2064edc5e6
Add arguments (no-op) (#8653) 2023-11-04 17:44:34 +00:00
ledgerwatch
a77e33e7c4
Introduce extra functions for BorSpans (no-op) (#8648) 2023-11-04 10:59:07 +00:00
Alex Sharov
329d18ef6f
snapshots: reduce merge limit of blocks to 100K (#8614)
Reason: 
- produce and seed snapshots earlier on chain tip. reduce depnedency on
"good peers with history" at p2p-network.
Some networks have no much archive peers, also ConsensusLayer clients
are not-good(not-incentivised) at serving history.
- avoiding having too much files:
more files(shards) - means "more metadata", "more lookups for
non-indexed queries", "more dictionaries", "more bittorrent
connections", ...
less files - means small files will be removed after merge (no peers for
this files).


ToDo:
[x] Recent 500K - merge up to 100K 
[x] Older than 500K - merge up to 500K 
[x] Start seeding 100k files
[x] Stop seeding 100k files after merge (right before delete)

In next PR: 
[] Old version of Erigon must be able download recent hashes. To achieve
it - at first start erigon will download preverified hashes .toml from
s3 - if it's newer that what we have (build-in) - use it.
2023-11-01 23:22:35 +07:00
Giulio rebuffo
0e5af0a69c
Added beacon snapshots download (#8601) 2023-10-28 17:41:50 +02:00
Alex Sharov
0e1fa8dc3e
blocksReadAheadFunc: to calc engine.Author in background (#8499)
Bor consensus: this calc is heavy and has cache
2023-10-25 20:09:09 +07:00
Andrew Ashikhmin
38e91809f9
Revert "Move validator set snapshot computation to bor_heimdall stage… (#8580)
PR #8202 might cause Issue #8550, so reverting it until Alexey's return.

This reverts commit 2ce98f8337.
2023-10-25 14:02:31 +02:00
a
436493350e
Sentinel refactor (#8296)
1. changes sentinel to use an http-like interface

2. moves hexutil, crypto/blake2b, metrics packages to erigon-lib
2023-10-22 01:17:18 +02:00
ledgerwatch
2ce98f8337
Move validator set snapshot computation to bor_heimdall stage (#8202)
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
2023-10-20 18:31:00 +01:00
canepat
19c428dd7d
Use Erigon execution for memory mutations (#8503) 2023-10-17 22:52:52 +02:00
battlmonstr
757a91c44d
sync: fix a memory leak when header verification fails (#8431)
If HeaderDownload.VerifyHeader always returns false, the memory usage
grows at a fast pace
due to Link objects (containing headers) not deallocated even after the
link queue pruning.
2023-10-14 08:39:43 +07:00
Andrew Ashikhmin
b60642fa5a
Configure EIP-4844 parameters for Gnosis (#8464)
See https://github.com/gnosischain/specs/pull/20 &
https://github.com/gnosischain/specs/pull/24
2023-10-13 11:43:16 +02:00
Alex Sharov
6d9a4f4d94
rpcdaemon: must not create db - because doesn't know right parameters (#8445) 2023-10-12 14:11:46 +07:00
Mark Holt
3fb89cc5c4
Prune all only on unwindBlock (#8430)
TruncateCanonicalHash should only have all pruned in the case of a
badBlock or a fork unwind
2023-10-10 17:13:19 +01:00
Mark Holt
0d190ff9e9
Bor rpc config fix (#8413)
This is an additional fix for BorRo to add bor config in the constructor
- otherwise code which accesses chain config will panic.
2023-10-10 15:26:02 +01:00
Mark Holt
7deb69967f
Avoid marking blocks as bad at rewind (#8414)
Co-authored-by: Anshal Shukla <shukla.anshal85@gmail.com>
2023-10-10 19:47:51 +05:30
Anshal Shukla
076dc33232
move borfinality package out of eth (#8407)
- Move borfinality out of eth package
- Adds nil pointer check in bor_verifier
2023-10-09 19:13:31 +01:00
Giulio rebuffo
2a5c51dc57
ValidateChain not reliant on stageBodies (#8411) 2023-10-09 17:24:12 +02:00
Alex Sharov
62395c767e
move NewHashBatch to erigon-lib, remove oddb package (#8408) 2023-10-09 15:47:54 +07:00
Mark Holt
d4d5cb9561
Bor reduce logging verbosity (#8401)
Don't log every sync event action, but gather stats and print every 30
secs
2023-10-07 14:40:16 +01:00
Anshal Shukla
da4e1c1ba4
fix bor.milestone flag (#8384)
- Made milestones enabled by default, since it has now been tested on
devnets
- Small bug fix which fixes nil panic while minning
2023-10-05 17:55:32 +01:00
Anshal Shukla
c6f532d68e
Rewind fixes (#8380) 2023-10-05 10:12:47 +01:00
canepat
47690db676
Block execution using embedded Silkworm (#8353)
This introduces _experimental_ block execution run by embedded Silkworm
API library:

- new command-line option `silkworm.path` to enable the feature by
specifying the path to the Silkworm library
- the Silkworm API shared library is dynamically loaded on-demand
- currently requires to build Silkworm library on the target machine
- available only on Linux at the moment: macOS has issue with [stack
size](https://github.com/golang/go/issues/28024) and Windows would
require [TDM-GCC-64](https://jmeubank.github.io/tdm-gcc/), both need
dedicated effort for an assessment
2023-10-05 09:27:37 +07:00
Mark Holt
0bdca6c457
Metrics label fixes (#8339)
Fixes for label discrepancies in collector for summaries etc which have
a template which includes a quantile.

Initial native Prometheus client implementation of metrics - which is
currently turned off except for local testing and interface exports.
2023-10-02 17:19:02 +01:00
Jason Yellick
5654ba07c9
Upgrade libp2p (enables go 1.21 support) (#8288)
Closes #8078 

This change is primarily intended to support go 1.21, but as a
side-effect requires updating libp2p, which in turn triggers an update
of golang.org/x/exp which creates quite a bit of (simple) churn in the
slice sorting.

This change introduces a new `cmp.Compare` function which can be used to
return an integer satisfying the compare interface for slice sorting.

In order to continue to support mplex for libp2p, the change references
github.com/libp2p/go-libp2p-mplex instead. Please see the PR at
https://github.com/libp2p/go-libp2p/pull/2498 for the official usptream
comment that indicates official support for mplex being moved to this
location.

Co-authored-by: Jason Yellick <jason@enya.ai>
2023-09-29 22:11:13 +02:00
Mark Holt
f26c7b389e
Bor break loop onrewind (#8302)
Add code to the headers state to break processing if a bor milestone
rewind is detected.

The rewind processing happens in the bor/heimdall stage - this change
just avoids unnecessary header loading
if a milestone fork is likely to be detected

---------

Co-authored-by: Anshal Shukla <shukla.anshal85@gmail.com>
2023-09-27 13:17:54 +01:00
Andrew Ashikhmin
8a616b7b93
Fix e3 integration tests (#8254)
Post fix to PR #8197. [This integration test
failure](https://github.com/ledgerwatch/erigon/actions/runs/6212734514/job/16863165117)
should be fixed now.
2023-09-21 14:53:03 +02:00
Mark Holt
3b45f53f3d
Milestone stage processing (#8187)
This is the second part of the bor milestone release it contains the
following changes:

* Initialize services
* This is a change from the initial pull request I have moved all of the
initialization to the bor engine. To facilitate this I have just passed
in the heimdall client interface, rather than the whole engine
* Stage processing 
* This is also a change from the original PR - the code is contained in
the bor heimdall stage rather than in headers - the effect should be the
same, but this needs testing

---------

Co-authored-by: Mark Holt <mark@disributed.vision>
Co-authored-by: Anshal Shukla <shukla.anshal85@gmail.com>
2023-09-18 18:05:33 +01:00
Andrew Ashikhmin
17d6f86218
Don't mark blocks as bad on transient errors (#8197)
For example, erigon on devnet8 marked a block as bad due to
"mdbx_cursor_open: cannot allocate memory":
```
[INFO] [09-12|04:57:36.041] [NewPayload] Handling new payload        height=171035 hash=0x321dea00c4853ee354bebaf8aef3e63fbe06c4508271c0db4c92b0f087aedc3b
171034
[WARN] [09-12|04:57:36.069] Could not validate block                 err="[3/7 BlockHashes] table: Header, mdbx_cursor_open: cannot allocate memory, stack: [kv_mdbx.go:1057 kv_mdbx.
go:1069 kv_mdbx.go:1077 memory_mutation.go:473 memory_mutation.go:502 etl.go:123 etl.go:96 block_writer.go:40 stage_blockhashes.go:49 default_stages.go:457 sync.go:425 sync.go:258 s
tageloop.go:414 backend.go:476 fork_validator.go:250 fork_validator.go:156 ethereum_execution.go:151 execution_client.go:51 chain_reader.go:252 engine_server.go:741 engine_server.go
:235 engine_server.go:600 value.go:586 value.go:370 service.go:224 handler.go:494 handler.go:444 handler.go:392 handler.go:223 handler.go:316 asm_amd64.s:1598]"
[WARN] [09-12|04:57:36.069] ethereumExecutionModule.ValidateChain: chain is invalid hash=0x321dea00c4853ee354bebaf8aef3e63fbe06c4508271c0db4c92b0f087aedc3b
```
With this PR blocks are marked as bad only on genuine protocol errors.
2023-09-17 11:14:36 +02:00
alex.sharov
13f782cef7 TxLookup: don't fail if body not found 2023-09-12 11:02:28 +07:00