It turns out that "standard" BSC nodes based on Geth, do not propagate
new block hashes and blocks, at least towards Erigon nodes. This is a
workaround
---------
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
The Read method of math package used in randomAddress function is not
thread safe, instead I've used crand package which is used by matic and
is a fork of thread safe crypto package.
Embedded CL is not supported for Gnosis Chain, so it makes sense to set
`externalcl` to true by default for it.
Also, this PR sets `terminalTotalDifficultyPassed` for Gnosis Chain &
Chiado (see https://docs.gnosischain.com/updates/20221210-merge).
So there is an issue with tracing certain blocks/transactions on
Polygon, for example:
```
> '{"method": "trace_transaction","params":["0xb198d93f640343a98f90d93aa2b74b4fc5c64f3a649f1608d2bfd1004f9dee0e"],"id":1,"jsonrpc":"2.0"}'
```
gives the error `first run for txIndex 1 error: insufficient funds for
gas * price + value: address 0x10AD27A96CDBffC90ab3b83bF695911426A69f5E
have 16927727762862809 want 17594166808296934`
The reason is that this transaction is from the author of the block,
which doesn't have enough ETH to pay for the gas fee + tx value if he's
not the block author receiving transactions fees.
The issue is that currently the APIs are using `ethash.NewFaker()`
Engine for running traces, etc. which doesn't know how to get the author
for a specific block (which is consensus dependant); as it was noting in
several TODO comments.
The fix is to pass the Engine to the BaseAPI, which can then be used to
create the right Block Context. I chose to split the current Engine
interface in 2, with Reader and Writer, so that the BaseAPI only
receives the Reader one, which might be safer (even though it's only
used for getting the block Author).
This PR includes changes required for delhi hard fork schedule at block
`29638656` on mumbai testnet. It changes few major parameters.
1. Sprint length - the number of bor blocks post which a new validator
mines has been reduced from 64 to 16.
2. Block time - the block time which was increased earlier for some
experiments to 5 seconds has been reduced to 2 seconds (along with
backup multiplier and producer delay).
3. Base fee denominator - this fields has been increased from 8 to 16 to
smoothen the effect of EIP 1559.
In context of https://github.com/ledgerwatch/erigon/issues/5694, this PR
adds some fixes and improvement in the mining flow. Also, a relevant
change in txpool (present in erigon-lib) is made here:
https://github.com/ledgerwatch/erigon-lib/pull/737
#### Changes in triggering mining in `startMining()`
The mining module didn't honour the block time as a simple 3 second
timer and a notifier from txpool was used to trigger mining. This would
cause inconsistencies, at least with the bor consensus. Hence, a geth
like approach is used instead for simplicity. A new head channel
subscription is added in the `startMining()` loop which would notify the
addition of new block. Hence, this would make sure that the block time
is being honoured. Moreover, the fixed 3 second timer is replaced by the
`miner.recommit` value set using flags.
#### Changes in the arrangement of calls made post mining
When all the mining stages are completed, erigon writes all the data in
a cache. It then processes the block through all the stages as it would
process a block received from P2P. In this case, some of the stages
aren't really required. Like the block header and body download stage is
not required as the block was mined locally. Even execution stage is not
required as it already went through it in the mining stages.
Now, we encountered an issue where the chain was halted and kept mining
the same block again and again (liveness issue). The root cause is
because of an error in a stage of it's parent block. This stage turns
out to be the 4th stage which is "Block body download" stage. This stage
tries to download the block body from peers using the headers. As, we
mined this block locally we don't really need to download anything (or
process anything again). Hence, it reaches out to the cache which we
store for the block body.
Interestingly that cache turned out to be empty for some blocks. This
was because post mining, before adding block header and body to a cache,
we call the broadcast method which starts the staged sync. So,
technically it’s a bit uncertain at any stage if the block header and
body has been written or not.(see
[this](https://github.com/ledgerwatch/erigon/blob/devel/eth/backend.go#L553-L572)).
To achieve complete certainty, we rearranged the calls with the write to
cache being called first and broadcast next. This pretty much solves the
issue as now we’re sure that we’d always have a block body in the cache
when we reach the body download stage.
#### Misc changes
This PR also adds some logs in bor consensus.
BaseFee is required in AuRa headers when
[EIP-1559](https://eips.ethereum.org/EIPS/eip-1559) is activated.
Also:
- Basic AuRa header verification
- Extract some common RLP methods
- Tiny log clean-up
This fixes the following panic for Gnosis Chain on the validator switch
at block 9186425:
```
panic: method 'getValidators' not found
goroutine 90 [running]:
github.com/ledgerwatch/erigon/consensus/aura.(*ValidatorSafeContract).getListSyscall(0x14000ed9358, 0xd40004bf620)
github.com/ledgerwatch/erigon/consensus/aura/validators.go:634 +0x258
github.com/ledgerwatch/erigon/consensus/aura.(*ValidatorSafeContract).epochSet(0x16?, 0x20?, 0x8c2c79, {0xd4002d77180, 0x25f, 0x25f}, 0x11400fac7ee8?)
github.com/ledgerwatch/erigon/consensus/aura/validators.go:453 +0xdc
github.com/ledgerwatch/erigon/consensus/aura.(*ValidatorContract).epochSet(0x140006ae980?, 0x38?, 0x6f9d00000000c28e?, {0xd4002d77180?, 0x108acc108?, 0x40?}, 0x14000618000?)
```
Previously "in-memory" MDBX instances for fork validation and mining
were created inside `os.TempDir()`. We should create them inside
Erigon's datadir so that the file permissions and the disk are the same
as for the main database.
Prerequisite: https://github.com/ledgerwatch/erigon-lib/pull/676.
* Test GnosisGenesisStateRoot
* Delete obsolete allocations
* SysCallContract shouldn't increase nonce of SystemAddress
* Max gas limit in SysCallContract
* Restore error swallowing for Bor