Commit Graph

700 Commits

Author SHA1 Message Date
Mark Holt
b05ffc909d
Fixes for Bor Block Production Synchronization (#9162)
This PR contains 3 fixes for interaction between the Bor mining loop and
the TX pool which where causing the regular creation of blocks with zero
transactions.

* Mining/Tx pool block synchronization
The synchronization of the tx pool between the sync loop and the mining
loop has been changed so that both are triggered by the same event and
synchronized via a sync.Cond rather than a polling loop with a hard
coded loop limit. This means that mining now waits for the pool to be
updated from the previous block before it starts the mining process.
* Txpool Startup consolidated into its MainLoop
Previously the tx pool start process was dynamically triggered at
various points in the code. This has all now been moved to the start of
the main loop. This is necessary to avoid a timing hole which can leave
the mining loop hanging waiting for a previously block broadcast which
it missed due to its delay start.
* Mining listens for block broadcast to avoid duplicate mining
operations
The mining loop for bor has a recommit timer in case blocks re not
produced on time. However in the case of sprint transitions where the
seal publication is delayed this can lead to duplicate block production.
This is suppressed by introducing a `waiting` state which is exited upon
the block being broadcast from the sealing operation.
2024-01-10 17:12:15 +00:00
battlmonstr
9c47cce62c
bor: move to polygon directory (#9174) 2024-01-09 19:20:42 +01:00
milen
e25b15b00e
remotedbserver: add support for bor snapshots (#9180) 2024-01-09 14:48:01 +00:00
ledgerwatch
459ccf8de4
[E3] Some fixes for the in-memory database when working with Caplin (… (#9164)
…testing on Sepolia) (#9151)

---------

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
2024-01-09 08:26:26 +07:00
Mark Holt
19bc328a07
Added db loggers to all db callers and fixed flag settings (#9099)
Mdbx now takes a logger - but this has not been pushed to all callers -
meaning it had an invalid logger

This fixes the log propagation.

It also fixed a start-up issue for http.enabled and txpool.disable
created by a previous merge
2023-12-31 17:10:08 +07:00
Mark Holt
79ed8cad35
E2 snapshot uploading (#9056)
This change introduces additional processes to manage snapshot uploading
for E2 snapshots:

## erigon snapshots upload

The `snapshots uploader` command starts a version of erigon customized
for uploading snapshot files to
a remote location.  

It breaks the stage execution process after the senders stage and then
uses the snapshot stage to send
uploaded headers, bodies and (in the case of polygon) bor spans and
events to snapshot files. Because
this process avoids execution in run signifigantly faster than a
standard erigon configuration.

The uploader uses rclone to send seedable (100K or 500K blocks) to a
remote storage location specified
in the rclone config file.

The **uploader** is configured to minimize disk usage by doing the
following:

* It removes snapshots once they are loaded
* It aggressively prunes the database once entities are transferred to
snapshots

in addition to this it has the following performance related features:

* maximizes the workers allocated to snapshot processing to improve
throughput
* Can be started from scratch by downloading the latest snapshots from
the remote location to seed processing

## snapshots command

Is a stand alone command for managing remote snapshots it has the
following sub commands

* **cmp** - compare snapshots
* **copy** - copy snapshots
* **verify** - verify snapshots
* **manifest** - manage the manifest file in the root of remote snapshot
locations
* **torrent** - manage snapshot torrent files
2023-12-27 22:05:09 +00:00
Alex Sharov
1468317efd
erigon snapshots index: build bor indices (#9009) 2023-12-18 17:46:50 +07:00
Giulio rebuffo
0d2aecf829
Backfill only with flag (#8913)
Caplin snapshots only enabled with caplin.backfill
2023-12-06 14:22:13 +01:00
Anshal Shukla
8d1758ceea
Add support for amoy testnet (#8674)
Co-authored-by: Mark Holt <mark@distributed.vision>
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
2023-11-30 08:19:52 +07:00
Håvard Anda Estensen
4873502818
turbo: run tests in parallel (#8738)
Tests that don't affect each other should run in parallel
2023-11-16 16:29:31 +07:00
Manav Darji
97f00a1433
headerdownload: handle tie breaker for forkchoice in pow networks (#8616)
Based on https://github.com/maticnetwork/bor/pull/871 in bor, this PR
handles import of same difficulty chains (tie breaker conditions) based
on their height and hash.

This PR also modifies an existing test to check different types of
side-chain import and how the canonical is decided.
2023-11-07 19:24:59 +00:00
Mark Holt
509a7af26a
Discovery zero refresh timer (#8661)
This fixes an issue where the mumbai testnet node struggle to find
peers. Before this fix in general test peer numbers are typically around
20 in total between eth66, eth67 and eth68. For new peers some can
struggle to find even a single peer after days of operation.

These are the numbers after 12 hours or running on a node which
previously could not find any peers: eth66=13, eth67=76, eth68=91.

The root cause of this issue is the following:

- A significant number of mumbai peers around the boot node return
network ids which are different from those currently available in the
DHT
- The available nodes are all consequently busy and return 'too many
peers' for long periods

These issues case a significant number of discovery timeouts, some of
the queries will never receive a response.

This causes the discovery read loop to enter a channel deadlock - which
means that no responses are processed, nor timeouts fired. This causes
the discovery process in the node to stop. From then on it just
re-requests handshakes from a relatively small number of peers.

This check in fixes this situation with the following changes:

- Remove the deadlock by running the timer in a separate go-routine so
it can run independently of the main request processing.
- Allow the discovery process matcher to match on port if no id match
can be established on initial ping. This allows subsequent node
validation to proceed and if the node proves to be valid via the
remainder of the look-up and handshake process it us used as a valid
peer.
- Completely unsolicited responses, i.e. those which come from a
completely unknown ip:port combination continue to be ignored.
-
2023-11-07 08:48:58 +00:00
Giulio rebuffo
4b580dcc2f
update caplin snapshots hashes (#8663)
This PR also adds snippets to download caplin snapshots
2023-11-06 21:05:07 +01:00
ledgerwatch
2064edc5e6
Add arguments (no-op) (#8653) 2023-11-04 17:44:34 +00:00
battlmonstr
d92898a508
p2p: silkworm sentry (#8527) 2023-11-02 08:35:13 +07:00
Alex Sharov
329d18ef6f
snapshots: reduce merge limit of blocks to 100K (#8614)
Reason: 
- produce and seed snapshots earlier on chain tip. reduce depnedency on
"good peers with history" at p2p-network.
Some networks have no much archive peers, also ConsensusLayer clients
are not-good(not-incentivised) at serving history.
- avoiding having too much files:
more files(shards) - means "more metadata", "more lookups for
non-indexed queries", "more dictionaries", "more bittorrent
connections", ...
less files - means small files will be removed after merge (no peers for
this files).


ToDo:
[x] Recent 500K - merge up to 100K 
[x] Older than 500K - merge up to 500K 
[x] Start seeding 100k files
[x] Stop seeding 100k files after merge (right before delete)

In next PR: 
[] Old version of Erigon must be able download recent hashes. To achieve
it - at first start erigon will download preverified hashes .toml from
s3 - if it's newer that what we have (build-in) - use it.
2023-11-01 23:22:35 +07:00
Alex Sharov
c23e5a1abf
downloader: preparations for reducing blocks merge limit (#8612) 2023-10-30 13:46:35 +07:00
Andrew Ashikhmin
38e91809f9
Revert "Move validator set snapshot computation to bor_heimdall stage… (#8580)
PR #8202 might cause Issue #8550, so reverting it until Alexey's return.

This reverts commit 2ce98f8337.
2023-10-25 14:02:31 +02:00
Andrew Ashikhmin
a226b6ca29
Fix wiring of AgraBlock into tx pool (#8555)
Fixes and simplifications to PR #8504
2023-10-23 11:03:46 +02:00
a
436493350e
Sentinel refactor (#8296)
1. changes sentinel to use an http-like interface

2. moves hexutil, crypto/blake2b, metrics packages to erigon-lib
2023-10-22 01:17:18 +02:00
ledgerwatch
2ce98f8337
Move validator set snapshot computation to bor_heimdall stage (#8202)
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
2023-10-20 18:31:00 +01:00
battlmonstr
757a91c44d
sync: fix a memory leak when header verification fails (#8431)
If HeaderDownload.VerifyHeader always returns false, the memory usage
grows at a fast pace
due to Link objects (containing headers) not deallocated even after the
link queue pruning.
2023-10-14 08:39:43 +07:00
Andrew Ashikhmin
b60642fa5a
Configure EIP-4844 parameters for Gnosis (#8464)
See https://github.com/gnosischain/specs/pull/20 &
https://github.com/gnosischain/specs/pull/24
2023-10-13 11:43:16 +02:00
Mark Holt
6f7186e0f4
Fix invalid pre-fetched header broadcast (#8442)
Fixes and issue with Polygon validators where locally mined blocks are
broadcast with invalid header hashes because the NewBlock message
constructor was removing the ReceiptHash which contributed to the header
hash.

The results in the bor header validation code not being able to
correctly identify the signer of the header - so header validation
fails.

This also likely fixes part of the bogon-block issue which was
identified by the polygon team.
2023-10-12 08:27:02 +01:00
Alex Sharov
6d9a4f4d94
rpcdaemon: must not create db - because doesn't know right parameters (#8445) 2023-10-12 14:11:46 +07:00
Alex Sharov
b8d8003618
move memdb to own package - to reduce cycle deps (#8428) 2023-10-11 08:48:36 +07:00
Mark Holt
7deb69967f
Avoid marking blocks as bad at rewind (#8414)
Co-authored-by: Anshal Shukla <shukla.anshal85@gmail.com>
2023-10-10 19:47:51 +05:30
Anshal Shukla
076dc33232
move borfinality package out of eth (#8407)
- Move borfinality out of eth package
- Adds nil pointer check in bor_verifier
2023-10-09 19:13:31 +01:00
Giulio rebuffo
2a5c51dc57
ValidateChain not reliant on stageBodies (#8411) 2023-10-09 17:24:12 +02:00
Giulio rebuffo
d90572b786
Hopefully an even faster version of mocked sentry (#8402) 2023-10-07 22:30:10 +02:00
Giulio rebuffo
0eda40a9be
Even faster mocked sentry (#8395) 2023-10-07 02:39:12 +02:00
Giulio rebuffo
1775c40f78
Even higher timeout with tests (#8394) 2023-10-06 23:43:29 +02:00
Giulio rebuffo
2294c8c66c
EthereumExecutionService in MockSentry (#8373)
Now we use the ethereum execution service directly:

* Changed sig of InsertChain
* Use of the service in case of PoS
2023-10-05 18:30:19 +02:00
canepat
47690db676
Block execution using embedded Silkworm (#8353)
This introduces _experimental_ block execution run by embedded Silkworm
API library:

- new command-line option `silkworm.path` to enable the feature by
specifying the path to the Silkworm library
- the Silkworm API shared library is dynamically loaded on-demand
- currently requires to build Silkworm library on the target machine
- available only on Linux at the moment: macOS has issue with [stack
size](https://github.com/golang/go/issues/28024) and Windows would
require [TDM-GCC-64](https://jmeubank.github.io/tdm-gcc/), both need
dedicated effort for an assessment
2023-10-05 09:27:37 +07:00
ledgerwatch
2521f1a696
Fix another case of header download hanging (non-POS) (#8356) 2023-10-03 17:34:03 +01:00
Somnath
6dd7d8fe6a
Fix some hive tests (#8331)
Changes:
Reorder some Engine API invalid checks to be closer to specs/hive tests
Move the Engine API direct method names to `interfaces`
2023-10-01 12:42:27 +02:00
Alex Sharov
e993ee984b
stage loop: allow nil in hook (#8318) 2023-09-29 09:03:19 +07:00
Mark Holt
f26c7b389e
Bor break loop onrewind (#8302)
Add code to the headers state to break processing if a bor milestone
rewind is detected.

The rewind processing happens in the bor/heimdall stage - this change
just avoids unnecessary header loading
if a milestone fork is likely to be detected

---------

Co-authored-by: Anshal Shukla <shukla.anshal85@gmail.com>
2023-09-27 13:17:54 +01:00
Somnath Banerjee
f51d9b61a0
Txpool 4844 upgrades Part 2 (#8213)
Some peer-review changes from the last related PR. 
Addition of a flag for BlobSlots - for max allowed blobs per account in
txpool.
Use BlobFee from the block to validate txs in the pool.

See also https://github.com/ledgerwatch/erigon-lib/pull/1125
2023-09-20 17:29:30 +05:30
Mark Holt
3b45f53f3d
Milestone stage processing (#8187)
This is the second part of the bor milestone release it contains the
following changes:

* Initialize services
* This is a change from the initial pull request I have moved all of the
initialization to the bor engine. To facilitate this I have just passed
in the heimdall client interface, rather than the whole engine
* Stage processing 
* This is also a change from the original PR - the code is contained in
the bor heimdall stage rather than in headers - the effect should be the
same, but this needs testing

---------

Co-authored-by: Mark Holt <mark@disributed.vision>
Co-authored-by: Anshal Shukla <shukla.anshal85@gmail.com>
2023-09-18 18:05:33 +01:00
Andrew Ashikhmin
17d6f86218
Don't mark blocks as bad on transient errors (#8197)
For example, erigon on devnet8 marked a block as bad due to
"mdbx_cursor_open: cannot allocate memory":
```
[INFO] [09-12|04:57:36.041] [NewPayload] Handling new payload        height=171035 hash=0x321dea00c4853ee354bebaf8aef3e63fbe06c4508271c0db4c92b0f087aedc3b
171034
[WARN] [09-12|04:57:36.069] Could not validate block                 err="[3/7 BlockHashes] table: Header, mdbx_cursor_open: cannot allocate memory, stack: [kv_mdbx.go:1057 kv_mdbx.
go:1069 kv_mdbx.go:1077 memory_mutation.go:473 memory_mutation.go:502 etl.go:123 etl.go:96 block_writer.go:40 stage_blockhashes.go:49 default_stages.go:457 sync.go:425 sync.go:258 s
tageloop.go:414 backend.go:476 fork_validator.go:250 fork_validator.go:156 ethereum_execution.go:151 execution_client.go:51 chain_reader.go:252 engine_server.go:741 engine_server.go
:235 engine_server.go:600 value.go:586 value.go:370 service.go:224 handler.go:494 handler.go:444 handler.go:392 handler.go:223 handler.go:316 asm_amd64.s:1598]"
[WARN] [09-12|04:57:36.069] ethereumExecutionModule.ValidateChain: chain is invalid hash=0x321dea00c4853ee354bebaf8aef3e63fbe06c4508271c0db4c92b0f087aedc3b
```
With this PR blocks are marked as bad only on genuine protocol errors.
2023-09-17 11:14:36 +02:00
Somnath Banerjee
a699f64761
Txpool upgrades for EIP-4844 Blob Transactions (#8004)
See https://github.com/ledgerwatch/erigon-lib/pull/1075
2023-09-11 09:38:58 +07:00
Giulio rebuffo
346b278a3b
Caplin: Improved logging (#8169) 2023-09-10 22:10:21 +02:00
Alex Sharov
d60940d7db
Avoid leaking more popped items (#8145) 2023-09-06 15:47:06 +07:00
Mark Holt
f2d0118a33
Bor snapshot block production (#8065)
I have added:

```go
{
	ID:          stages.BorHeimdall,
	Description: "Download Bor-specific data from Heimdall",
	Forward: func(firstCycle bool, badBlockUnwind bool, s *StageState, u Unwinder, tx kv.RwTx, logger log.Logger) error {
		if badBlockUnwind {
			return nil
		}
		return BorHeimdallForward(s, u, ctx, tx, borHeimdallCfg, true, logger)
	},
	Unwind: func(firstCycle bool, u *UnwindState, s *StageState, tx kv.RwTx, logger log.Logger) error {
		return BorHeimdallUnwind(u, ctx, s, tx, borHeimdallCfg)
	},
	Prune: func(firstCycle bool, p *PruneState, tx kv.RwTx, logger log.Logger) error {
		return BorHeimdallPrune(p, ctx, tx, borHeimdallCfg)
	},
},
```
To MiningStages as well as Default as otherwise bor events are not added
when the block producer creates new blocks.

There are a couple of questions I have around this implementation:

* Is this the right place to add this
* As the state is also executed when the default stage is processed ther
is some duplicate processing for the block producing node.
* There is a duplicated call to heimdall which could be removed if the
stages share state - but its not clear if we want to do this.
* I don't think the mining stage needs to prune as this will be
replicated in the default iteration

This can be tested using the devnet with the following arguments:

```
--chain bor-devnet --bor.localheimdall --scenarios state-sync
```

This will generate sync events via an ethereum devnet which are
transmitted to bor chain and will be executed at the end of the snapshot
delay, which results in events generated from the bor chain. This tests
the whole sync, block generation, event lifecycle. As it needs to wait
for sprints to end after a sufficient delay it is quite slow to run.
2023-08-30 08:06:09 +01:00
Alex Sharov
d1d348211f
add flag --force.partial.commit: to workaround problem "start from backup takes long time and can't save partial progress" (#8090) 2023-08-30 08:49:16 +07:00
battlmonstr
32ca0e5ab1
sync: revert flawed dropUselessPeers logic and alleviate its issues (#8062)
The current logic is flawed, because it drops all peers that are less
synced.
It is valid to return empty responses by the eth spec.
A proper logic should penalize from the context of the sync process,
where enough "reputation" data is collected about a peer.

In order to be able to connect to erigon 2.48 peers that have
--sentry.drop-useless-peers enabled,
this adds a check to not reply with an empty headers list.
If we reply with an empty list, we're going to be considered useless and
kicked.
Once enough of erigon nodes are updated in the network past this commit,
this check should be removed,
because it is totally acceptable to return an empty list by the eth
spec.
2023-08-25 11:42:54 +02:00
Andrew Ashikhmin
a6d9d26fe9
Fix opSelfdestruct6780 (#8066)
also upgrade execution-spec-tests to
[v1.0.2](https://github.com/ethereum/execution-spec-tests/releases/tag/v1.0.2)
2023-08-25 08:06:59 +02:00
battlmonstr
2e29ff33e1
bor: BroadcastNewBlock to all peers from validator nodes (#8030)
Currently PropagateNewBlockHashes and BroadcastNewBlock
selects a subset of all sentries by taking a `Sqrt(len(sentries))`,
and then for each sentry SendMessageToRandomPeers
selects a subset of its peers by taking `Sqrt(len(peerInfos))`.

This behaviour limits the broadcast scope with a lot of peers, e.g. 100
becomes 10,
but is not great with very few peers, or if the message is very
important
to broadcast to everyone, which is the case of bor validator/proposer
nodes.

* send to all sentries in both BroadcastNewBlock and PropagateNewBlockHashes
* remove peerCountConstrained sqrt logic in SendMessageToRandomPeers
* add maxPeers provider func as a parameter to MultiClient
* default it to 10 for eth and 0 (unlimited) for bor validators

---------

Co-authored-by: Mark Holt <mark@distributed.vision>
2023-08-23 14:28:39 +02:00
Andrew Ashikhmin
6bc0ca9e85
Correctly compute fork id when timestamp fork is activated in genesis (#8046)
See https://github.com/ethereum/go-ethereum/pull/27895
2023-08-21 15:35:13 +02:00