Improve p2p error handling to propagate errors
from the origin up the call chain the Server peer removal code
using a new PeerError type containing a DiscReason and a more detailed
description.
The origin can be tracked down using PeerErrorCode (code) and DiscReason
(reason)
which looks like this in the log:
> [TRACE] [08-28|16:33:40.205] Removing p2p peer peercount=0
url=enode://d399f4b...@1.2.3.4:30303 duration=6.901ms
err="PeerError(code=remote disconnect reason, reason=too many peers,
err=<nil>, message=Peer.run got a remote DiscReason)"
* use semaphore instead of a chan struct{}
* move MaxPendingPeers default value to DefaultConfig.P2P
* log Error if Accept fails
* replace quit channel with context
* Switch peerId from 256 to 512 bit (as in stable)
* go mod tidy
* Fix some tests
* Fixed
* Fixes
* Fix tests
* Update to erigon-lib main
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
* use "log" for struct fields
* use "logger" for function parameters and local vars
This is a compromise between:
1) using logger := log.New() to avoid aliasing (log := log.New())
2) and keeping it short when logging e.g.: srv.log.Info(...)
USB enumeration still occured. Make sure it will only occur if --usb is set.
This also deprecates the 'NoUSB' config file option in favor of a new option 'USB'.
# Conflicts:
# cmd/geth/accountcmd_test.go
# cmd/geth/consolecmd_test.go
# cmd/geth/dao_test.go
# cmd/geth/genesis_test.go
# cmd/geth/les_test.go
Changes:
Simplify nested complexity
If an if blocks ends with a return statement then remove the else nesting.
Most of the changes has also been reported in golint https://goreportcard.com/report/github.com/ethereum/go-ethereum#golint
# Conflicts:
# cmd/utils/flags.go
# console/bridge.go
# crypto/bls12381/g2.go
# les/benchmark.go
# les/lespay/server/balance.go
# les/lespay/server/balance_tracker.go
# les/lespay/server/prioritypool.go
# les/odr_requests.go
# les/serverpool.go
# les/serverpool_test.go
# p2p/nodestate/nodestate_test.go
# trie/committer.go
- Remove the ws:// prefix from the status endpoint since
the ws:// is already included in the stack.WSEndpoint().
- Don't register the services again in the node start.
Registration is already done in the initialization stage.
- Expose admin namespace via websocket.
This namespace is necessary for connecting the peers via websocket.
- Offer logging relevant options for exec adapter.
It's really painful to mix all log output in the single console. So
this PR offers two additional options for exec adapter in this case
testers can config the log output(e.g. file output) and log level
for each p2p node.
This adds a few tiny fixes for les and the p2p simulation framework:
LES Parts
- Keep the LES-SERVER connection even it's non-synced
We had this idea to reject the connections in LES protocol if the les-server itself is
not synced. However, in LES protocol we will also receive the connection from another
les-server. In this case even the local node is not synced yet, we should keep the tcp
connection for other protocols(e.g. eth protocol).
- Don't count "invalid message" for non-existing GetBlockHeadersMsg request
In the eth syncing mechanism (full sync, fast sync, light sync), it will try to fetch
some non-existent blocks or headers(to ensure we indeed download all the missing chain).
In this case, it's possible that the les-server will receive the request for
non-existent headers. So don't count it as the "invalid message" for scheduling
dropping.
- Copy the announce object in the closure
Before the les-server pushes the latest headers to all connected clients, it will create
a closure and queue it in the underlying request scheduler. In some scenarios it's
problematic. E.g, in private networks, the block can be mined very fast. So before the
first closure is executed, we may already update the latest_announce object. So actually
the "announce" object we want to send is replaced.
The downsize is the client will receive two announces with the same td and then drop the
server.
P2P Simulation Framework
- Don't double register the protocol services in p2p-simulation "Start".
The protocols upon the devp2p are registered in the "New node stage". So don't reigster
them again when starting a node in the p2p simulation framework
- Add one more new config field "ExternalSigner", in order to use clef service in the
framework.
# Conflicts:
# les/server_handler.go
* Added back fdlimit to increase number of file descriptors
* Fixing a test file that fails on Mac with too few file descriptors available
* Trying to fix ci
* Trying to fix ci
* Trying to fix ci 3
* Fixing ci
This PR significantly changes the APIs for instantiating Ethereum nodes in
a Go program. The new APIs are not backwards-compatible, but we feel that
this is made up for by the much simpler way of registering services on
node.Node. You can find more information and rationale in the design
document: https://gist.github.com/renaynay/5bec2de19fde66f4d04c535fd24f0775.
There is also a new feature in Node's Go API: it is now possible to
register arbitrary handlers on the user-facing HTTP server. In geth, this
facility is used to enable GraphQL.
There is a single minor change relevant for geth users in this PR: The
GraphQL API is no longer available separately from the JSON-RPC HTTP
server. If you want GraphQL, you need to enable it using the
./geth --http --graphql flag combination.
The --graphql.port and --graphql.addr flags are no longer available.
# Conflicts:
# cmd/faucet/faucet.go
# cmd/geth/chaincmd.go
# cmd/geth/config.go
# cmd/geth/consolecmd.go
# cmd/geth/main.go
# cmd/utils/flags.go
# cmd/wnode/main.go
# core/rawdb/freezer.go
# eth/api_backend.go
# eth/backend.go
# ethclient/ethclient_test.go
# ethstats/ethstats.go
# graphql/service.go
# internal/ethapi/backend.go
# les/api_backend.go
# les/api_test.go
# les/checkpointoracle/oracle.go
# les/client.go
# les/commons.go
# les/server.go
# miner/stresstest/stress_clique.go
# miner/stresstest/stress_ethash.go
# mobile/geth.go
# node/api.go
# node/node.go
# node/node_example_test.go
# node/node_test.go
# node/rpcstack.go
# node/rpcstack_test.go
# node/service.go
# node/service_test.go
# node/utils_test.go
# p2p/simulations/examples/ping-pong.go
# p2p/testing/peerpool.go
# p2p/testing/protocolsession.go
# p2p/testing/protocoltester.go
# whisper/mailserver/server_test.go
# whisper/whisperv6/api_test.go
# whisper/whisperv6/filter_test.go
# whisper/whisperv6/whisper.go
# whisper/whisperv6/whisper_test.go
* p2p: add low port check in dialer
We already have a check like this for UDP ports, add a similar one in
the dialer. This prevents dials to port zero and it's also an extra
layer of protection against spamming HTTP servers.
* p2p/discover: use errLowPort in v4 code
* p2p: change port check
* p2p: add comment
* p2p/simulations/adapters: ensure assigned port is in all node records
* p2p: new dial scheduler
This change replaces the peer-to-peer dial scheduler with a new and
improved implementation. The new code is better than the previous
implementation in two key aspects:
- The time between discovery of a node and dialing that node is
significantly lower in the new version. The old dialState kept
a buffer of nodes and launched a task to refill it whenever the buffer
became empty. This worked well with the discovery interface we used to
have, but doesn't really work with the new iterator-based discovery
API.
- Selection of static dial candidates (created by Server.AddPeer or
through static-nodes.json) performs much better for large amounts of
static peers. Connections to static nodes are now limited like dynanic
dials and can no longer overstep MaxPeers or the dial ratio.
* p2p/simulations/adapters: adapt to new NodeDialer interface
* p2p: re-add check for self in checkDial
* p2p: remove peersetCh
* p2p: allow static dials when discovery is disabled
* p2p: add test for dialScheduler.removeStatic
* p2p: remove blank line
* p2p: fix documentation of maxDialPeers
* p2p: change "ok" to "added" in static node log
* p2p: improve dialTask docs
Also increase log level for "Can't resolve node"
* p2p: ensure dial resolver is truly nil without discovery
* p2p: add "looking for peers" log message
* p2p: clean up Server.run comments
* p2p: fix maxDialedConns for maxpeers < dialRatio
Always allocate at least one dial slot unless dialing is disabled using
NoDial or MaxPeers == 0. Most importantly, this fixes MaxPeers == 1 to
dedicate the sole slot to dialing instead of listening.
* p2p: fix RemovePeer to disconnect the peer again
Also make RemovePeer synchronous and add a test.
* p2p: remove "Connection set up" log message
* p2p: clean up connection logging
We previously logged outgoing connection failures up to three times.
- in SetupConn() as "Setting up connection failed addr=..."
- in setupConn() with an error-specific message and "id=... addr=..."
- in dial() as "Dial error task=..."
This commit ensures a single log message is emitted per failure and adds
"id=... addr=... conn=..." everywhere (id= omitted when the ID isn't
known yet).
Also avoid printing a log message when a static dial fails but can't be
resolved because discv4 is disabled. The light client hit this case all
the time, increasing the message count to four lines per failed
connection.
* p2p: document that RemovePeer blocks
* build: use golangci-lint
This changes build/ci.go to download and run golangci-lint instead
of gometalinter.
* core/state: fix unnecessary conversion
* p2p/simulations: fix lock copying (found by go vet)
* signer/core: fix unnecessary conversions
* crypto/ecies: remove unused function cmpPublic
* core/rawdb: remove unused function print
* core/state: remove unused function xTestFuzzCutter
* core/vm: disable TestWriteExpectedValues in a different way
* core/forkid: remove unused function checksum
* les: remove unused type proofsData
* cmd/utils: remove unused functions prefixedNames, prefixFor
* crypto/bn256: run goimports
* p2p/nat: fix goimports lint issue
* cmd/clef: avoid using unkeyed struct fields
* les: cancel context in testRequest
* rlp: delete unreachable code
* core: gofmt
* internal/build: simplify DownloadFile for Go 1.11 compatibility
* build: remove go test --short flag
* .travis.yml: disable build cache
* whisper/whisperv6: fix ineffectual assignment in TestWhisperIdentityManagement
* .golangci.yml: enable goconst and ineffassign linters
* build: print message when there are no lint issues
* internal/build: refactor download a bit
* rpc: improve codec abstraction
rpc.ServerCodec is an opaque interface. There was only one way to get a
codec using existing APIs: rpc.NewJSONCodec. This change exports
newCodec (as NewFuncCodec) and NewJSONCodec (as NewCodec). It also makes
all codec methods non-public to avoid showing internals in godoc.
While here, remove codec options in tests because they are not
supported anymore.
* p2p/simulations: use github.com/gorilla/websocket
This package was the last remaining user of golang.org/x/net/websocket.
Migrating to the new library wasn't straightforward because it is no
longer possible to treat WebSocket connections as a net.Conn.
* vendor: delete golang.org/x/net/websocket
* rpc: fix godoc comments and run gofmt
Most of these changes are related to the Go 1.13 changes to test binary
flag handling.
* cmd/geth: make attach tests more reliable
This makes the test wait for the endpoint to come up by polling
it instead of waiting for two seconds.
* tests: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
* crypto/ecies: remove useless -dump flag in tests
* p2p/simulations: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
* build: remove workaround for ./... vendor matching
This workaround was necessary for Go 1.8. The Go 1.9 release changed
the expansion rules to exclude vendored packages.
* Makefile: use relative path for GOBIN
This makes the "Run ./build/bin/..." line look nicer.
* les: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
This change
- implements concurrent LES request serving even for a single peer.
- replaces the request cost estimation method with a cost table based on
benchmarks which gives much more consistent results. Until now the
allowed number of light peers was just a guess which probably contributed
a lot to the fluctuating quality of available service. Everything related
to request cost is implemented in a single object, the 'cost tracker'. It
uses a fixed cost table with a global 'correction factor'. Benchmark code
is included and can be run at any time to adapt costs to low-level
implementation changes.
- reimplements flowcontrol.ClientManager in a cleaner and more efficient
way, with added capabilities: There is now control over bandwidth, which
allows using the flow control parameters for client prioritization.
Target utilization over 100 percent is now supported to model concurrent
request processing. Total serving bandwidth is reduced during block
processing to prevent database contention.
- implements an RPC API for the LES servers allowing server operators to
assign priority bandwidth to certain clients and change prioritized
status even while the client is connected. The new API is meant for
cases where server operators charge for LES using an off-protocol mechanism.
- adds a unit test for the new client manager.
- adds an end-to-end test using the network simulator that tests bandwidth
control functions through the new API.
* swarm/network: DRY out repeated giga comment
I not necessarily agree with the way we wait for event propagation.
But I truly disagree with having duplicated giga comments.
* p2p/simulations: encapsulate Node.Up field so we avoid data races
The Node.Up field was accessed concurrently without "proper" locking.
There was a lock on Network and that was used sometimes to access
the field. Other times the locking was missed and we had
a data race.
For example: https://github.com/ethereum/go-ethereum/pull/18464
The case above was solved, but there were still intermittent/hard to
reproduce races. So let's solve the issue permanently.
resolves: ethersphere/go-ethereum#1146
* p2p/simulations: fix unmarshal of simulations.Node
Making Node.Up field private in 13292ee897e345045fbfab3bda23a77589a271c1
broke TestHTTPNetwork and TestHTTPSnapshot. Because the default
UnmarshalJSON does not handle unexported fields.
Important: The fix is partial and not proper to my taste. But I cut
scope as I think the fix may require a change to the current
serialization format. New ticket:
https://github.com/ethersphere/go-ethereum/issues/1177
* p2p/simulations: Add a sanity test case for Node.Config UnmarshalJSON
* p2p/simulations: revert back to defer Unlock() pattern for Network
It's a good patten to call `defer Unlock()` right after `Lock()` so
(new) error cases won't miss to unlock. Let's get back to that pattern.
The patten was abandoned in 85a79b3ad3,
while fixing a data race. That data race does not exist anymore,
since the Node.Up field got hidden behind its own lock.
* p2p/simulations: consistent naming for test providers Node.UnmarshalJSON
* p2p/simulations: remove JSON annotation from private fields of Node
As unexported fields are not serialized.
* p2p/simulations: fix deadlock in Network.GetRandomDownNode()
Problem: GetRandomDownNode() locks -> getDownNodeIDs() ->
GetNodes() tries to lock -> deadlock
On Network type, unexported functions must assume that `net.lock`
is already acquired and should not call exported functions which
might try to lock again.
* p2p/simulations: ensure method conformity for Network
Connect* methods were moved to p2p/simulations.Network from
swarm/network/simulation. However these new methods did not follow
the pattern of Network methods, i.e., all exported method locks
the whole Network either for read or write.
* p2p/simulations: fix deadlock during network shutdown
`TestDiscoveryPersistenceSimulationSimAdapter` often got into deadlock.
The execution was stuck on two locks, i.e, `Kademlia.lock` and
`p2p/simulations.Network.lock`. Usually the test got stuck once in each
20 executions with high confidence.
`Kademlia` was stuck in `Kademlia.EachAddr()` and `Network` in
`Network.Stop()`.
Solution: in `Network.Stop()` `net.lock` must be released before
calling `node.Stop()` as stopping a node (somehow - I did not find
the exact code path) causes `Network.InitConn()` to be called from
`Kademlia.SuggestPeer()` and that blocks on `net.lock`.
Related ticket: https://github.com/ethersphere/go-ethereum/issues/1223
* swarm/state: simplify if statement in DBStore.Put()
* p2p/simulations: remove faulty godoc from private function
The comment started with the wrong method name.
The method is simple and self explanatory. Also, it's private.
=> Let's just remove the comment.