When the sync loop first runs it suppresses block sync events both in
the initial loop and when the blocks being processed are greater than
1000.
This fix removed the first check, because otherwise the first block
received by the process ends up not getting sent to the tx pool. Which
means it won't produce new block for polygon.
As well as this fix - I have also moved the gas initialization to the
txpool start method rather than prompting it with a 'synthetic block
event'
As the txpool start has access to the core & tx DB's it can find the
current block and chain config internally so that it doesn't need to be
externally activated it can just do this itself on start up. This has
the advantage of making the txpool more self contained.
This fixes a couple of regressions for running the uploader for mumbai
* Now flags have moved to a higher context they need to be set in the
context not the flag values
* Span 0 of mumbai has a header/span mismatch for span zero sprint 0. So
the check here needs to be suppressed
This PR contains 3 fixes for interaction between the Bor mining loop and
the TX pool which where causing the regular creation of blocks with zero
transactions.
* Mining/Tx pool block synchronization
The synchronization of the tx pool between the sync loop and the mining
loop has been changed so that both are triggered by the same event and
synchronized via a sync.Cond rather than a polling loop with a hard
coded loop limit. This means that mining now waits for the pool to be
updated from the previous block before it starts the mining process.
* Txpool Startup consolidated into its MainLoop
Previously the tx pool start process was dynamically triggered at
various points in the code. This has all now been moved to the start of
the main loop. This is necessary to avoid a timing hole which can leave
the mining loop hanging waiting for a previously block broadcast which
it missed due to its delay start.
* Mining listens for block broadcast to avoid duplicate mining
operations
The mining loop for bor has a recommit timer in case blocks re not
produced on time. However in the case of sprint transitions where the
seal publication is delayed this can lead to duplicate block production.
This is suppressed by introducing a `waiting` state which is exited upon
the block being broadcast from the sealing operation.
Mdbx now takes a logger - but this has not been pushed to all callers -
meaning it had an invalid logger
This fixes the log propagation.
It also fixed a start-up issue for http.enabled and txpool.disable
created by a previous merge
Heimdall prepares the next span a number of sprints before the current
span ends. Currently we always fetch the next span regardless of which
sprint we are in during the current span. This causes a liveness issue
due to how the Heimdall client works (it infinitely retries until it
fetches a span - this issue will be fixed in a separate PR). This PR
fixes this by matching what bor does - it fetches the next span only in
the last sprint of the current span.
Changes:
- Adds a unit test for the above
- Adds a new function BlockInLastSprintOfSpan
- Some code reorg and cleanup - moves the span num related functions
from the bor package to the span sub package for better logical grouping
This change introduces additional processes to manage snapshot uploading
for E2 snapshots:
## erigon snapshots upload
The `snapshots uploader` command starts a version of erigon customized
for uploading snapshot files to
a remote location.
It breaks the stage execution process after the senders stage and then
uses the snapshot stage to send
uploaded headers, bodies and (in the case of polygon) bor spans and
events to snapshot files. Because
this process avoids execution in run signifigantly faster than a
standard erigon configuration.
The uploader uses rclone to send seedable (100K or 500K blocks) to a
remote storage location specified
in the rclone config file.
The **uploader** is configured to minimize disk usage by doing the
following:
* It removes snapshots once they are loaded
* It aggressively prunes the database once entities are transferred to
snapshots
in addition to this it has the following performance related features:
* maximizes the workers allocated to snapshot processing to improve
throughput
* Can be started from scratch by downloading the latest snapshots from
the remote location to seed processing
## snapshots command
Is a stand alone command for managing remote snapshots it has the
following sub commands
* **cmp** - compare snapshots
* **copy** - copy snapshots
* **verify** - verify snapshots
* **manifest** - manage the manifest file in the root of remote snapshot
locations
* **torrent** - manage snapshot torrent files
This adds a simulator object with implements the SentryServer api but
takes objects from a pre-existing snapshot file.
If the snapshot is not available locally it will download and index the
.seg file for the header range being asked for.
It is created as follows:
```go
sim, err := simulator.NewSentry(ctx, "mumbai", dataDir, 1, logger)
```
Where the arguments are:
* ctx - a callable context where cancel will close the simulator torrent
and file connections (it also has a Close method)
* chain - the name of the chain to take the snapshots from
* datadir - a directory potentially containing snapshot .seg files. If
not files exist in this directory they will be downloaded
* num peers - the number of peers the simulator should create
* logger - the loger to log actions to
It can be attached to a client as follows:
```go
simClient := direct.NewSentryClientDirect(66, sim)
```
At the moment only very basic functionality is implemented:
* get headers will return headers by range or hash (hash assumes a
pre-downloaded .seg as it needs an index
* the header replay semantics need to be confirmed
* eth 65 and 66(+) messaging is supported
* For details see: `simulator_test.go
More advanced peer behavior (e.g. header rewriting) can be added
Bodies/Transactions handling can be added
* Chunked format -> blinded
* LZ4 -> ZSTD
* Implemented parent block root support for history download
* Rationale: Allows to optimize GC collection easily on state
reconstruction and it allows to read fast attestations in historical
states reader
During testing we run into a "span 7813 not found (db)" due to a very
large unwind (1 million blocks).
This is because the block reader's `LastFrozenSpanID` and
`LastFrozenEventID` returned results that are not consistent with
`FrozenBorBlocks`. The latter is taking into account the existence of
`.idx` files while the former 2 functions were not.
Note such a large unwind is not likely to happen normally unless there
is a bug in our unwind logic or an operator is manually unwinding very
far back due to reasons like chain halts (ie mumbai bug problem from few
months ago), devel testing or anything else along these lines.
Regardless, it exposed the above discrepancy which is best to be fixed.