erigon-pulse/consensus/bor
Manav Darji 1da4d3abbf
eth, consensus/bor: fixes and improvements related to mining (#6051)
In context of https://github.com/ledgerwatch/erigon/issues/5694, this PR
adds some fixes and improvement in the mining flow. Also, a relevant
change in txpool (present in erigon-lib) is made here:
https://github.com/ledgerwatch/erigon-lib/pull/737

#### Changes in triggering mining in `startMining()`
The mining module didn't honour the block time as a simple 3 second
timer and a notifier from txpool was used to trigger mining. This would
cause inconsistencies, at least with the bor consensus. Hence, a geth
like approach is used instead for simplicity. A new head channel
subscription is added in the `startMining()` loop which would notify the
addition of new block. Hence, this would make sure that the block time
is being honoured. Moreover, the fixed 3 second timer is replaced by the
`miner.recommit` value set using flags.

####  Changes in the arrangement of calls made post mining
When all the mining stages are completed, erigon writes all the data in
a cache. It then processes the block through all the stages as it would
process a block received from P2P. In this case, some of the stages
aren't really required. Like the block header and body download stage is
not required as the block was mined locally. Even execution stage is not
required as it already went through it in the mining stages.

Now, we encountered an issue where the chain was halted and kept mining
the same block again and again (liveness issue). The root cause is
because of an error in a stage of it's parent block. This stage turns
out to be the 4th stage which is "Block body download" stage. This stage
tries to download the block body from peers using the headers. As, we
mined this block locally we don't really need to download anything (or
process anything again). Hence, it reaches out to the cache which we
store for the block body.

Interestingly that cache turned out to be empty for some blocks. This
was because post mining, before adding block header and body to a cache,
we call the broadcast method which starts the staged sync. So,
technically it’s a bit uncertain at any stage if the block header and
body has been written or not.(see
[this](https://github.com/ledgerwatch/erigon/blob/devel/eth/backend.go#L553-L572)).
To achieve complete certainty, we rearranged the calls with the write to
cache being called first and broadcast next. This pretty much solves the
issue as now we’re sure that we’d always have a block body in the cache
when we reach the body download stage.

#### Misc changes
This PR also adds some logs in bor consensus.
2022-11-18 02:39:16 +03:00
..
api.go Aggregator22.Unwind() (#5039) 2022-08-13 18:51:25 +07:00
bor.go eth, consensus/bor: fixes and improvements related to mining (#6051) 2022-11-18 02:39:16 +03:00
clerk.go Merging Turbo bor into devel (#3372) 2022-02-07 21:30:46 +00:00
errors.go Merging Turbo bor into devel (#3372) 2022-02-07 21:30:46 +00:00
genesis_contracts_client.go Use "err" key for logging errors. (#3632) 2022-03-01 15:40:51 +00:00
merkle.go Enable prealloc linter (#5177) 2022-08-26 10:04:36 +07:00
rest.go Aggregator22.Unwind() (#5039) 2022-08-13 18:51:25 +07:00
snapshot_test.go Merging Turbo bor into devel (#3372) 2022-02-07 21:30:46 +00:00
snapshot.go Merging Turbo bor into devel (#3372) 2022-02-07 21:30:46 +00:00
span.go Bor fixes (#3553) 2022-02-24 00:03:10 +00:00
validator_set.go go1.19 gofmt (#4988) 2022-08-10 19:04:13 +07:00
validator.go Remove capitalization and trailing newlines from err strings (#5186) 2022-08-26 13:20:19 +07:00