Current value: 16 was added by me 1 year ago and didn't mean anything.
Never seen this field holding much data, probably can increase.
Currently I see logs like (and 10x like this):
[DBUG] [11-24|06:59:38.353] slow peer or too many requests, dropping its old requests name=erigon/v2.54.0-aeec5...
# Background
Erigon currently uses a combination of Victoria Metrics and Prometheus
client for providing metrics.
We want to rationalize this and use only the Prometheus client library,
but we want to maintain the simplified Victoria Metrics methods for
constructing metrics.
This task is currently partly complete and needs to be finished to a
stage where we can remove the Victoria Metrics module from the Erigon
code base.
# Summary of changes
- Adds missing `NewCounter`, `NewSummary`, `NewHistogram`,
`GetOrCreateHistogram` functions to `erigon-lib/metrics` similar to the
interface VictoriaMetrics lib provides
- Minor tidy up for consistency inside `erigon-lib/metrics/set.go`
around return types (panic vs err consistency for funcs inside the
file), error messages, comments
- Replace all remaining usages of `github.com/VictoriaMetrics/metrics`
with `github.com/ledgerwatch/erigon-lib/metrics` - seamless (only import
changes) since interfaces match
Making the addReplyMatcher channel unbuffered makes the loop
going too slow sometimes for serving parallel requests.
This is an alternative fix for keeping the channel buffered.
Problem:
Some goroutines are blocked on shutdown:
1. table close <-tab.closed // because table loop pending
1. table loop <-refreshDone // because lookup shutdown blocks doRefresh
1. lookup shutdown <-it.replyCh // because it.queryfunc (findnode -
ensureBond) is blocked, and not returning errClosed (if it returns and
pushes to it.replyCh, then shutdown() will unblock)
1. findnode - ensureBond <-rm.errc // because the related replyMatcher
was added after loop() exited, so there's nothing to push errClosed and
unlock it
If addReplyMatcher channel is buffered, it is possible that
UDPv4.pending() adds a new reply matcher after closeCtx.Done().
Such reply matcher's errc result channel will never be updated, because
the UDPv4.loop() has exited at this point. Subsequent discovery
operations will deadlock.
Solution:
Revert to an unbuffered channel.
This fixes an issue where the mumbai testnet node struggle to find
peers. Before this fix in general test peer numbers are typically around
20 in total between eth66, eth67 and eth68. For new peers some can
struggle to find even a single peer after days of operation.
These are the numbers after 12 hours or running on a node which
previously could not find any peers: eth66=13, eth67=76, eth68=91.
The root cause of this issue is the following:
- A significant number of mumbai peers around the boot node return
network ids which are different from those currently available in the
DHT
- The available nodes are all consequently busy and return 'too many
peers' for long periods
These issues case a significant number of discovery timeouts, some of
the queries will never receive a response.
This causes the discovery read loop to enter a channel deadlock - which
means that no responses are processed, nor timeouts fired. This causes
the discovery process in the node to stop. From then on it just
re-requests handshakes from a relatively small number of peers.
This check in fixes this situation with the following changes:
- Remove the deadlock by running the timer in a separate go-routine so
it can run independently of the main request processing.
- Allow the discovery process matcher to match on port if no id match
can be established on initial ping. This allows subsequent node
validation to proceed and if the node proves to be valid via the
remainder of the look-up and handshake process it us used as a valid
peer.
- Completely unsolicited responses, i.e. those which come from a
completely unknown ip:port combination continue to be ignored.
-
* call getEnode before NodeStarted to make sure it is ready for RPC
calls
* fix connection error detection on macOS
* use a non-default p2p port to avoid conflicts
* disable bor milestones on local heimdall
* generate node keys for static peers config
Problem:
"Started P2P networking" log message contains port zero on startup,
e.g.: 127.0.0.1:0 because of the outdated localnodeAddrCache.
Solution:
Call updateLocalNodeStaticAddrCache after updating the port.
Closes#8078
This change is primarily intended to support go 1.21, but as a
side-effect requires updating libp2p, which in turn triggers an update
of golang.org/x/exp which creates quite a bit of (simple) churn in the
slice sorting.
This change introduces a new `cmp.Compare` function which can be used to
return an integer satisfying the compare interface for slice sorting.
In order to continue to support mplex for libp2p, the change references
github.com/libp2p/go-libp2p-mplex instead. Please see the PR at
https://github.com/libp2p/go-libp2p/pull/2498 for the official usptream
comment that indicates official support for mplex being moved to this
location.
Co-authored-by: Jason Yellick <jason@enya.ai>
This is a non functional change which consolidates the various packages
under metrics into the top level package now that the dead code is
removed.
It is a precursor to the removal of Victoria metrics after which all
erigon metrics code will be contained in this single package.
Improve p2p error handling to propagate errors
from the origin up the call chain the Server peer removal code
using a new PeerError type containing a DiscReason and a more detailed
description.
The origin can be tracked down using PeerErrorCode (code) and DiscReason
(reason)
which looks like this in the log:
> [TRACE] [08-28|16:33:40.205] Removing p2p peer peercount=0
url=enode://d399f4b...@1.2.3.4:30303 duration=6.901ms
err="PeerError(code=remote disconnect reason, reason=too many peers,
err=<nil>, message=Peer.run got a remote DiscReason)"
This is an update of:
https://github.com/ledgerwatch/erigon/pull/7846
which uses a local fork of victoria metrics to include the changes that
https://github.com/anshalshukla added to the original for we where
using.
It also includes code to address the duplicate metrics issue identified
here:
https://github.com/ledgerwatch/erigon/issues/8053
It has one more associated fix which is to correctly add a metadata
label to counters, these where previously labelled as gauges.
e.g.
```
# TYPE p2p_peers counter
p2p_peers 0
```
rather than
```
# TYPE p2p_peers gauge
p2p_peers 0
```
---------
Co-authored-by: Anshal Shukla <53994948+anshalshukla@users.noreply.github.com>
Co-authored-by: Anshal Shukla <shukla.anshal85@gmail.com>
The disconnect message could either be a plain integer, or a list with
one integer element. We were encoding it as a plain integer, but
decoding as a list. Change this to be able to decode any format.
An update to the devnet to introduce a local heimdall to facilitate
multiple validators without the need for an external process, and hence
validator registration/staking etc.
In this initial release only span generation is supported.
It has the following changes:
* Introduction of a local grpc heimdall interface
* Allocation of accounts via a devnet account generator ()
* Introduction on 'Services' for the network config
"--chain bor-devnet --bor.localheimdall" will run a 2 validator network
with a local service
"--chain bor-devnet --bor.withoutheimdall" will sun a single validator
with no heimdall service as before
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
The peer ID in sentry.proto is a H512 / 64 bytes value, and
MarshalPubkey creates it from a public key.
There's no need to cut the first byte, because MarshalPubkey already
does it.
Doing so results in a 63 bytes value that is incompatible with silkworm
sentry.
@shyba hi, seems this lib doesn't work with
ghcr.io/goreleaser/goreleaser-cross (which producing release binaries)
removing it for now, feel free to add it in future - if can make it work
with goreleaser-cross
see: https://github.com/ledgerwatch/erigon/issues/7210
the log line here was the culprit for the race. made sense to just
capture this on localnode creation instead and hold onto it for when the
server is started.
ran test with `-race` and `-count=5000` to double check and all looks
good
Hi,
I'm syncing Gnosis on a Celeron N5100 to get familiar with the codebase.
In the process I managed to optimize some things from profiling.
Since I'm not yet on the project Discord, I decided to open this PR as a
suggestion. This pass all tests here and gave me a nice boost for that
platform, although I didn't have time to benchmark it yet.
* reuse VM Memory objects with sync.Pool. It starts with 4k as `evmone`
[code
suggested](0897edb001/lib/evmone/execution_state.hpp (L49))
as a good value.
* set32 simplification: mostly cosmetic
* sha256-simd: Celeron has SHA instructions. We should probably do the
same for torrent later, but this already helped as it is very CPU bound
on such a low end processor. Maybe that helps ARM as well.
Regarding https://github.com/ledgerwatch/erigon/issues/6260
added flag `--p2p.allowed-ports=<porta>,<portb>` to restrict which ports
to use for sentries for different protocol versions.
Default for this flag is `30303, 30304` (first port is inherited from
`--port` flag defaults.
If `--port` is changed and it's new value is not presented in allowed
port list, provided port will be allowed as well as list provided via
`--p2p.allowed-ports`
Port picking is straightforward, we create sentry gRPC server for
protocol over first allowed port that is not already taken.
If there are no allowed ports left, erigon exits with hint.