When the sync loop first runs it suppresses block sync events both in
the initial loop and when the blocks being processed are greater than
1000.
This fix removed the first check, because otherwise the first block
received by the process ends up not getting sent to the tx pool. Which
means it won't produce new block for polygon.
As well as this fix - I have also moved the gas initialization to the
txpool start method rather than prompting it with a 'synthetic block
event'
As the txpool start has access to the core & tx DB's it can find the
current block and chain config internally so that it doesn't need to be
externally activated it can just do this itself on start up. This has
the advantage of making the txpool more self contained.
When discarding spamming we need to remove the tx from the subpools as
well as the sender tx list. Otherwise the tx is ignored by other
operations and left in the subpool
As well as the fix here this PR also contains several changes to TX
TRACING and other logging to make it easier to see what is going on with
pool processing
This PR contains 3 fixes for interaction between the Bor mining loop and
the TX pool which where causing the regular creation of blocks with zero
transactions.
* Mining/Tx pool block synchronization
The synchronization of the tx pool between the sync loop and the mining
loop has been changed so that both are triggered by the same event and
synchronized via a sync.Cond rather than a polling loop with a hard
coded loop limit. This means that mining now waits for the pool to be
updated from the previous block before it starts the mining process.
* Txpool Startup consolidated into its MainLoop
Previously the tx pool start process was dynamically triggered at
various points in the code. This has all now been moved to the start of
the main loop. This is necessary to avoid a timing hole which can leave
the mining loop hanging waiting for a previously block broadcast which
it missed due to its delay start.
* Mining listens for block broadcast to avoid duplicate mining
operations
The mining loop for bor has a recommit timer in case blocks re not
produced on time. However in the case of sprint transitions where the
seal publication is delayed this can lead to duplicate block production.
This is suppressed by introducing a `waiting` state which is exited upon
the block being broadcast from the sealing operation.
Currently the mining loop is broken for the polygon chain. This PR fixes
this.
High level changes:
- Introduces new Bor<->Heimdall stage specifically for the needs of the
mining flow
- Extracts out common logic from Bor<->Heimdall sync and mining stages
into shared functions
- Removes `mine` flag for the Bor<->Heimdall sync stage
- Extends the current `StartMining` function to prefetch span zero if
needed before the mining loop is started
- Fixes Bor to read span zero (instead of span 1) from heimdall when the
span is not initially set in the local smart contract that the Spanner
uses
Test with devnet "state-sync" scenario:
![Screenshot 2024-01-05 at 17 41
23](https://github.com/ledgerwatch/erigon/assets/94537774/34ca903a-69b8-416a-900f-a32f2d4417fa)
A crash on startup happens on --chain=mumbai , because I've confused
chainConfig.Bor (from type chain.Config) and config.Bor (from type
ethconfig.Config) in the setup code.
ethconfig.Config.Bor property contained bogus values, and was used only
to check its type in CreateConsensusEngine(). Its value was never read
(before PR #9117).
This change removes the property to avoid confusion and fix the crash.
Devnet network.BorStateSyncDelay was implemented using
ethconfig.Config.Bor, but it wasn't taking any effect. It should be
fixed separately in a different way.
Mdbx now takes a logger - but this has not been pushed to all callers -
meaning it had an invalid logger
This fixes the log propagation.
It also fixed a start-up issue for http.enabled and txpool.disable
created by a previous merge
This change introduces additional processes to manage snapshot uploading
for E2 snapshots:
## erigon snapshots upload
The `snapshots uploader` command starts a version of erigon customized
for uploading snapshot files to
a remote location.
It breaks the stage execution process after the senders stage and then
uses the snapshot stage to send
uploaded headers, bodies and (in the case of polygon) bor spans and
events to snapshot files. Because
this process avoids execution in run signifigantly faster than a
standard erigon configuration.
The uploader uses rclone to send seedable (100K or 500K blocks) to a
remote storage location specified
in the rclone config file.
The **uploader** is configured to minimize disk usage by doing the
following:
* It removes snapshots once they are loaded
* It aggressively prunes the database once entities are transferred to
snapshots
in addition to this it has the following performance related features:
* maximizes the workers allocated to snapshot processing to improve
throughput
* Can be started from scratch by downloading the latest snapshots from
the remote location to seed processing
## snapshots command
Is a stand alone command for managing remote snapshots it has the
following sub commands
* **cmp** - compare snapshots
* **copy** - copy snapshots
* **verify** - verify snapshots
* **manifest** - manage the manifest file in the root of remote snapshot
locations
* **torrent** - manage snapshot torrent files
* Chunked format -> blinded
* LZ4 -> ZSTD
* Implemented parent block root support for history download
* Rationale: Allows to optimize GC collection easily on state
reconstruction and it allows to read fast attestations in historical
states reader
"whitelisting" mechanism (list of files - stored in DB) - which
protecting us from downloading new files after upgrade/downgrade was
broken. And seems it became over-complicated with time.
I replacing it by 1 persistent flag inside downloader:
"prohibit_new_downloads.lock"
Erigon will turn downloader into this mode after
downloading/verification of first snapshots.
```
//Corner cases:
// - Erigon generated file X with hash H1. User upgraded Erigon. New version has preverified file X with hash H2. Must ignore H2 (don't send to Downloader)
// - Erigon "download once": means restart/upgrade/downgrade must not download files (and will be fast)
// - After "download once" - Erigon will produce and seed new files
```
------
`downloader --seedbox` is never "prohibit new downloads"
What does this PR do:
* Optional Backfilling and Caplin Archive Node
* Create antiquary for historical states
* Fixed gaps of chain gap related to the Head of the chain and anchor of
the chain.
* Added basic reader object to Read the Historical state
Changed distribution of httpcfg.HttpCfg to be pointer.
Added new flags:
rpc.slow.log - which is false by default, this flag need to enable
logging slow RPC requests
rpc.slow.log.threshold - which is 100 by default, this flag specify slow
threshold in milliseconds
Updated rpc handler to log slow requests:
- added map[request id] {method, timestamp}
- put every request details to map above
- delete request details from map above
- added time interval check for elements in map and if time difference
is more than given threshold print request id and the method
- app will print slow requests in next cases:
1. As soon as request take more than given threshold
2. Every 20 seconds if request still in process
3. After request finished and it took more than give threshold
---------
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
Reason:
- produce and seed snapshots earlier on chain tip. reduce depnedency on
"good peers with history" at p2p-network.
Some networks have no much archive peers, also ConsensusLayer clients
are not-good(not-incentivised) at serving history.
- avoiding having too much files:
more files(shards) - means "more metadata", "more lookups for
non-indexed queries", "more dictionaries", "more bittorrent
connections", ...
less files - means small files will be removed after merge (no peers for
this files).
ToDo:
[x] Recent 500K - merge up to 100K
[x] Older than 500K - merge up to 500K
[x] Start seeding 100k files
[x] Stop seeding 100k files after merge (right before delete)
In next PR:
[] Old version of Erigon must be able download recent hashes. To achieve
it - at first start erigon will download preverified hashes .toml from
s3 - if it's newer that what we have (build-in) - use it.
This introduces _experimental_ RPC daemon run by embedded Silkworm
library. Same notes as in PR #8353 apply here plus the following ones:
- activated if `http` command-line option is enabled and `silkworm.path`
option is present, nothing more is required (i.e. currently, both block
execution and RPC daemon run by Silkworm when specifying
`silkworm.path`, just to keep things as simple as possible)
- only Execution API endpoints are implemented by Silkworm RPCDaemon,
whilst Engine API endpoints are still served by Erigon RPCDaemon
- some features are still missing, in particular:
- state change notification handling
- custom JSON RPC settings (i.e. Erigon RPC settings are not passed to
Silkworm yet)
Add a check against the inbound headers before publishing a newly mined
block after the wait delay.
If the node received a block while it was processing transactions, or
waiting for its publish slot, do a final check that another node hasn't
already published a block.