Mdbx now takes a logger - but this has not been pushed to all callers -
meaning it had an invalid logger
This fixes the log propagation.
It also fixed a start-up issue for http.enabled and txpool.disable
created by a previous merge
Making the addReplyMatcher channel unbuffered makes the loop
going too slow sometimes for serving parallel requests.
This is an alternative fix for keeping the channel buffered.
Problem:
Some goroutines are blocked on shutdown:
1. table close <-tab.closed // because table loop pending
1. table loop <-refreshDone // because lookup shutdown blocks doRefresh
1. lookup shutdown <-it.replyCh // because it.queryfunc (findnode -
ensureBond) is blocked, and not returning errClosed (if it returns and
pushes to it.replyCh, then shutdown() will unblock)
1. findnode - ensureBond <-rm.errc // because the related replyMatcher
was added after loop() exited, so there's nothing to push errClosed and
unlock it
If addReplyMatcher channel is buffered, it is possible that
UDPv4.pending() adds a new reply matcher after closeCtx.Done().
Such reply matcher's errc result channel will never be updated, because
the UDPv4.loop() has exited at this point. Subsequent discovery
operations will deadlock.
Solution:
Revert to an unbuffered channel.
This fixes an issue where the mumbai testnet node struggle to find
peers. Before this fix in general test peer numbers are typically around
20 in total between eth66, eth67 and eth68. For new peers some can
struggle to find even a single peer after days of operation.
These are the numbers after 12 hours or running on a node which
previously could not find any peers: eth66=13, eth67=76, eth68=91.
The root cause of this issue is the following:
- A significant number of mumbai peers around the boot node return
network ids which are different from those currently available in the
DHT
- The available nodes are all consequently busy and return 'too many
peers' for long periods
These issues case a significant number of discovery timeouts, some of
the queries will never receive a response.
This causes the discovery read loop to enter a channel deadlock - which
means that no responses are processed, nor timeouts fired. This causes
the discovery process in the node to stop. From then on it just
re-requests handshakes from a relatively small number of peers.
This check in fixes this situation with the following changes:
- Remove the deadlock by running the timer in a separate go-routine so
it can run independently of the main request processing.
- Allow the discovery process matcher to match on port if no id match
can be established on initial ping. This allows subsequent node
validation to proceed and if the node proves to be valid via the
remainder of the look-up and handshake process it us used as a valid
peer.
- Completely unsolicited responses, i.e. those which come from a
completely unknown ip:port combination continue to be ignored.
-
An update to the devnet to introduce a local heimdall to facilitate
multiple validators without the need for an external process, and hence
validator registration/staking etc.
In this initial release only span generation is supported.
It has the following changes:
* Introduction of a local grpc heimdall interface
* Allocation of accounts via a devnet account generator ()
* Introduction on 'Services' for the network config
"--chain bor-devnet --bor.localheimdall" will run a 2 validator network
with a local service
"--chain bor-devnet --bor.withoutheimdall" will sun a single validator
with no heimdall service as before
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
@shyba hi, seems this lib doesn't work with
ghcr.io/goreleaser/goreleaser-cross (which producing release binaries)
removing it for now, feel free to add it in future - if can make it work
with goreleaser-cross
see: https://github.com/ledgerwatch/erigon/issues/7210
Hi,
I'm syncing Gnosis on a Celeron N5100 to get familiar with the codebase.
In the process I managed to optimize some things from profiling.
Since I'm not yet on the project Discord, I decided to open this PR as a
suggestion. This pass all tests here and gave me a nice boost for that
platform, although I didn't have time to benchmark it yet.
* reuse VM Memory objects with sync.Pool. It starts with 4k as `evmone`
[code
suggested](0897edb001/lib/evmone/execution_state.hpp (L49))
as a good value.
* set32 simplification: mostly cosmetic
* sha256-simd: Celeron has SHA instructions. We should probably do the
same for torrent later, but this already helped as it is very CPU bound
on such a low end processor. Maybe that helps ARM as well.
The test fails sometimes on macOS CI with:
v5_udp_test.go:477: unexpected error: "RPC timeout"
Fixing by increasing the timeout from 120ms to 700ms,
and move the test to the integration suite, because it takes up to 1s now.
* TestTable_ReadRandomNodesGetAll: refactor to integration and examples
* TestTable_bumpNoDuplicates: refactor to integration and examples
* TestUDPv4_smallNetConvergence: speed up from 1.7s to 0.3s by applying the test config
* TestUPNP_DDWRT: move to integration tests
* TestFairMix: split in 2, do more iterations in integration tests
* TestDialSched: speed up from 1s to 0.2s by removing the unexpected dial check,
(keep the check during the integration tests)
* configure a 50 ms timeout for tests (like v4 tests)
* use in-memory DB (like v4 tests)
* TestUDPv5_callTimeoutReset: improve speed from 1.2s to 0.2s
* TestUDPv5_callTimeoutReset: reduce the likelihood of "RPC timeout"
* move lookup tests to the "integration" suite
* log details of unmatched packets and sends to non-existing nodes
* fix flaky TestUDPv5_findnodeHandling:
Table.nextRevalidateTime was random (from 0 to 10s).
Sometimes it triggered doRevalidate immediately, and it produced an unexpected ping.
Configure a high interval to not revalidate during the tests.
Time improved from 1.7s to 0.2s.
Test with:
go test ./p2p/discover -run TestUDPv5 -count 1
The test is flaky when the reply timeout is too low.
Increasing the timeout makes it slow.
Move the test to the integration suite.
Having a higher timeout is fine there.
The UDP test must be closed after the serveTestnet exits.
If it happens before, the serveTestnet encounters this error.
(it tries to emulate a packet receival after closing the transport)
FindNode triggers a Ping in ensureBond.
This causes an extra Sleep for "ping back".
Don't wait for this in tests.
Close v5 tests.
The requests may also timeout if a lot of them queue up in the udpTest.pipe,
and serveTestnet is slow to process them.
Increase replyTimeout a bit to prevent that.
The test was flaky, because of the "endpoint prediction".
The test starts 5 nodes one by one.
Node 0 is used as a bootstrap node for nodes 1-4.
When it is about to add, say, node 3, nodes 0 and 1 might already have had a chance to communicate,
and updateEndpoints() deletes the node 0 UDP port, because fallbackUDP port was not configured.
In this case node 3 would get a bootstrap node 0 without a port and lead to an error:
v5_udp_test.go:110: bad bootstrap node "enr:...": missing UDP port
The problem was reproducible by this command:
go test ./p2p/discover -run TestUDPv5_lookupE2E -count 500
The test was slow, because it was trying to find
predefined nodeIDs (lookupTestnet) by generating random keys
and trying to find their neighbours
until it hits all nodes of the lookupTestnet.
In addition each FindNode response was waited for 0.5 sec (respTimeout).
This could take up to 30 sec and fail the test suite.
A fake random key generator is now used during the test.
It issues the expected keys, and the lookup converges quickly.
The reply timeout is reduced for the test.
Now it normally takes less than.1 sec.
This changes the definitions of Ping and Pong, adding an optional field
for the sequence number. This field was previously encoded/decoded using
the "tail" struct tag, but using "optional" is much nicer.
see https://github.com/ethereum/go-ethereum/pull/22842
Co-authored-by: Felix Lange <fjl@twurst.com>
Problem:
QuerySeeds will poke 150 random entries in the whole node DB and ignore hitting "field" entries.
In a bootstrap scenario it might hit hundreds of :lastping :lastpong entries,
and very few true "node record" entries.
After running for 15 minutes I've got totalEntryCount=1508 nodeRecordCount=114 entries.
There's a 1/16 chance of hitting a "node record" entry.
It means finding just about 10 nodes of 114 total on average from 150 attempts.
Solution:
Split "node record" entries to a separate table such that QuerySeeds doesn't do idle cycle hits.
Encode and hash logic was duplicated in multiple places.
* Move encoding to p2p/discover/v4wire
* Move hashing to p2p/enode/idscheme
* Change newRandomLookup to create a proper random key on a curve.
* use "log" for struct fields
* use "logger" for function parameters and local vars
This is a compromise between:
1) using logger := log.New() to avoid aliasing (log := log.New())
2) and keeping it short when logging e.g.: srv.log.Info(...)
* migrated consensus and chain config files for bsc support
* migrated more files from bsc
* fixed consensus crashing
* updated erigon lib for parlia snapshot prefix
* added staticpeers for bsc
* [+] added system contracts
[*] fixed bug with loading snapshot
[+] enabled gas bailout
[+] added fix to prevent syncing more than 1000 headers (for testing only)
[*] fixed bug with crashing sender recover sometimes
* migrated system contract calls
* [*] fixed bug with returning mutable balance object
[+] migrated lightclient contracts from bsc
[*] fixed parlia consensus config param
* [*] fixed tendermint deps
* [+] added some logs
* [+] enabled bsc forks
[*] fixed syscalls from coinbase
[*] more logging
* Fix call sys contract gas calculation
* [*] fixed executing system transactions
* [*] enabled receipt hash, gas and bloom filter checks
* [-] removed some logging scripts
[*] set header checkpoint to 10 million blocks (for testing forks)
* [*] fixed bug with commiting dirty inter block state state after system transaction execution
[-] removed some extra logs and comments
* [+] added chapel and rialto testnet support
* [*] fixed chapel allocs
* [-] removed 6 mil block limit for headers sync
* Fix hardforks on chapel and other testnets
* [*] fixed header sync issue after merge
* [*] tiny code cleanup
* [-] removed some comments
* [*] increased mdbx map size to 4 TB
* [*] increased max chaindata size to 6 tb
* [*] bring more compatibility with origin erigon and some code cleanup
* [+] added support of validator mode for BSC chain
* [*] enable private key load for bsc, rialto and chapel chains
* [*] fixed running BSC validator node
* Fix the branch list
* [*] tiny fixes for linter
* [*] formatted imports for core and parlia packages
* [*] fixed import rules in other files
* Revert "[*] formatted imports for core and parlia packages"
This reverts commit c764b58b34fedc2b14d69458583ba0dad114f227.
* [*] changed import rules in more packages
* [*] fixed type mismatch in hack command
* [*] fixed crash on new epoch, enabled bootstrap flags
* [*] fixed linter errors
* [*] fixed missing err check for syscalls
* [*] now BSC implementation is fully compatible with erigon original sources
* Revert "Add chain config and CLI changes for Binance Smart Chain support (#3131)"
This reverts commit 3d048b7f1a.
* Revert "Add Parlia consensus engine for Binance Smart Chain support (#3086)"
This reverts commit ee99f17fbe.
* [*] fixed several issues after merge
* [*] fixed integration compilation
* Revert "Fix the branch list"
This reverts commit 8150ca57e5f2707a84a9f6a1c5b809b7cc84547b.
* [-] removed receipt repair migration
* [*] fixed parlia fork numbers output
* [*] bring more devel compatibility, fixed bsc address list for access list calculation
* [*] fixed bug with commiting state transition for bad blocks in BSC
* [*] fixed bsc changes apply for integration command and updated config print for parlia
* [*] fixed bug with applying bsc forks for chapel and rialto testnet chains
[*] let's use finalize and assemble for mining to let consensus know for what it's finalizing block
* Fix compilation errors in hack.go
* Fix lint
* reset changes in erigon-snapshots to devel
* Remove unrelated changes
* Fix embed
* Remove more unrelated changes
* Remove more unrelated changes
* Restore clique and aura miner config
* Refactor interfaces not to use slice pointers
* Refactor parlia functions to return tx and receipt instead of dealing with slices
* Fix for header panic
* Fix lint, restore system contract addresses
* Remove more unrelated changes, unify GatherForks
Co-authored-by: Dmitry Ivanov <convexman18@gmail.com>
Co-authored-by: j75689 <j75689@gmail.com>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>