This PR
- increase the test time
- use the updated python scripts that reports as an issue a long exit
time
- change the test name to states that it is runned in the snapshot
downloading phase
(a later PR will add a similar test on the block downloading phase)
Integration tests CI is failing due to a port clash in devnet tests. I
believe this is because there are 2 packages of devnet integration tests
and go can run tests from separate packages in parallel (by default it
does package level parallelism). A simple fix would be to just have all
devnet integration tests in 1 package and run all these tests
sequentially within the package (ie not use t.Parallel). This PR moves
all devnet integration tests in 1 package.
`"ContextStart devnet start failed: private api: could not create
listener: listen top 127.0.0.1:10090: bind: address already in use,
addr=localhost:10090"`
![Screenshot 2024-01-10 at 13 38
37](https://github.com/ledgerwatch/erigon/assets/94537774/06bda987-45e5-46ef-9e0b-3876b3f85c01)
This PR contains 3 fixes for interaction between the Bor mining loop and
the TX pool which where causing the regular creation of blocks with zero
transactions.
* Mining/Tx pool block synchronization
The synchronization of the tx pool between the sync loop and the mining
loop has been changed so that both are triggered by the same event and
synchronized via a sync.Cond rather than a polling loop with a hard
coded loop limit. This means that mining now waits for the pool to be
updated from the previous block before it starts the mining process.
* Txpool Startup consolidated into its MainLoop
Previously the tx pool start process was dynamically triggered at
various points in the code. This has all now been moved to the start of
the main loop. This is necessary to avoid a timing hole which can leave
the mining loop hanging waiting for a previously block broadcast which
it missed due to its delay start.
* Mining listens for block broadcast to avoid duplicate mining
operations
The mining loop for bor has a recommit timer in case blocks re not
produced on time. However in the case of sprint transitions where the
seal publication is delayed this can lead to duplicate block production.
This is suppressed by introducing a `waiting` state which is exited upon
the block being broadcast from the sealing operation.
logic: 1/42 of ram, but not more than 2Gb for chandata and not more than
256mb for other databases.
---------
Co-authored-by: battlmonstr <battlmonstr@users.noreply.github.com>
This PR fixes an infinite recursion (stack overflow) error in devnet
integration tests that surfaced in our integration CI a week or more ago
and additionally enables some integration tests that are now fixed -
`TestStateSync` & `TestCallContract`.
![Screenshot 2024-01-09 at 17 10
24](https://github.com/ledgerwatch/erigon/assets/94537774/a5a8c9c9-9f68-4084-9e08-1bf3c1601cab)
Currently the mining loop is broken for the polygon chain. This PR fixes
this.
High level changes:
- Introduces new Bor<->Heimdall stage specifically for the needs of the
mining flow
- Extracts out common logic from Bor<->Heimdall sync and mining stages
into shared functions
- Removes `mine` flag for the Bor<->Heimdall sync stage
- Extends the current `StartMining` function to prefetch span zero if
needed before the mining loop is started
- Fixes Bor to read span zero (instead of span 1) from heimdall when the
span is not initially set in the local smart contract that the Spanner
uses
Test with devnet "state-sync" scenario:
![Screenshot 2024-01-05 at 17 41
23](https://github.com/ledgerwatch/erigon/assets/94537774/34ca903a-69b8-416a-900f-a32f2d4417fa)
While working on fixing the bor mining loop I stumbled across an error
in `ChainReader.BorSpan` - not implemented panic. Also hit a few other
panics due to missed logger in `ChainReaderImpl` struct initialisations.
This PR fixes both.
A crash on startup happens on --chain=mumbai , because I've confused
chainConfig.Bor (from type chain.Config) and config.Bor (from type
ethconfig.Config) in the setup code.
ethconfig.Config.Bor property contained bogus values, and was used only
to check its type in CreateConsensusEngine(). Its value was never read
(before PR #9117).
This change removes the property to avoid confusion and fix the crash.
Devnet network.BorStateSyncDelay was implemented using
ethconfig.Config.Bor, but it wasn't taking any effect. It should be
fixed separately in a different way.
Getting an error in one of the bor nodes in devnet when trying to run
the "state-sync" scenario:
```
[EROR] [01-03|16:55:44.179] cli.StartRpcServer error err="could not start separate Websocket RPC api at port 8546: listen tcp 127.0.0.1:8546: bind: address already in use"
```
This happens for scenarios with more than 1 node.
Digging further this regressions has happened due to this change:
https://github.com/ledgerwatch/erigon/pull/8909
This PR fixes this by updating the devnet `NodeArgs` struct to set the
corresponding `--ws.port` `arg` tag which now exists.
Hi, I made three suggestions for this section:
- "devenet" should be "devnet" (typo).
- "are currently build" should be "are currently built" (grammatical
error).
- "sptep" should be "step" (typo).
Thanks.