Commit Graph

730 Commits

Author SHA1 Message Date
battlmonstr
04498180dc
p2p/discv4: revert gotreply handler change from #8661 (#9119) (#9195)
The handler had race conditions in the candidates processing goroutine.
2024-01-11 15:04:46 +00:00
Mark Holt
19bc328a07
Added db loggers to all db callers and fixed flag settings (#9099)
Mdbx now takes a logger - but this has not been pushed to all callers -
meaning it had an invalid logger

This fixes the log propagation.

It also fixed a start-up issue for http.enabled and txpool.disable
created by a previous merge
2023-12-31 17:10:08 +07:00
Mark Holt
79ed8cad35
E2 snapshot uploading (#9056)
This change introduces additional processes to manage snapshot uploading
for E2 snapshots:

## erigon snapshots upload

The `snapshots uploader` command starts a version of erigon customized
for uploading snapshot files to
a remote location.  

It breaks the stage execution process after the senders stage and then
uses the snapshot stage to send
uploaded headers, bodies and (in the case of polygon) bor spans and
events to snapshot files. Because
this process avoids execution in run signifigantly faster than a
standard erigon configuration.

The uploader uses rclone to send seedable (100K or 500K blocks) to a
remote storage location specified
in the rclone config file.

The **uploader** is configured to minimize disk usage by doing the
following:

* It removes snapshots once they are loaded
* It aggressively prunes the database once entities are transferred to
snapshots

in addition to this it has the following performance related features:

* maximizes the workers allocated to snapshot processing to improve
throughput
* Can be started from scratch by downloading the latest snapshots from
the remote location to seed processing

## snapshots command

Is a stand alone command for managing remote snapshots it has the
following sub commands

* **cmp** - compare snapshots
* **copy** - copy snapshots
* **verify** - verify snapshots
* **manifest** - manage the manifest file in the root of remote snapshot
locations
* **torrent** - manage snapshot torrent files
2023-12-27 22:05:09 +00:00
Mark Holt
df0699a12b
Added sentry simulator implementation (#9087)
This adds a simulator object with implements the SentryServer api but
takes objects from a pre-existing snapshot file.

If the snapshot is not available locally it will download and index the
.seg file for the header range being asked for.

It is created as follows: 

```go
sim, err := simulator.NewSentry(ctx, "mumbai", dataDir, 1, logger)
```

Where the arguments are:

* ctx - a callable context where cancel will close the simulator torrent
and file connections (it also has a Close method)
* chain - the name of the chain to take the snapshots from
* datadir - a directory potentially containing snapshot .seg files. If
not files exist in this directory they will be downloaded
 *  num peers - the number of peers the simulator should create
 *  logger - the loger to log actions to

It can be attached to a client as follows:

```go
simClient := direct.NewSentryClientDirect(66, sim)
```

At the moment only very basic functionality is implemented:

* get headers will return headers by range or hash (hash assumes a
pre-downloaded .seg as it needs an index
* the header replay semantics need to be confirmed
* eth 65 and 66(+) messaging is supported
* For details see: `simulator_test.go

More advanced peer behavior (e.g. header rewriting) can be added
Bodies/Transactions handling can be added
2023-12-27 14:56:57 +00:00
battlmonstr
c1146bda49
p2p: skip TestUDPv4_smallNetConvergence on Linux (#8731) (#8962) 2023-12-12 17:06:48 +07:00
Alex Sharov
427f2637d2
mdbx: hard-limit of small db's dirty_space (#8850)
it didn't cause problems yet. but it seems a good idea in-general.
2023-11-29 15:09:55 +01:00
milen
230b013096
metrics: separate usage of prometheus counter and gauge interfaces (#8793) 2023-11-24 16:15:12 +01:00
Alex Sharov
3db9467c94
increase peer tasks queue size (#8825)
Current value: 16 was added by me 1 year ago and didn't mean anything.
Never seen this field holding much data, probably can increase.

Currently I see logs like (and 10x like this): 
[DBUG] [11-24|06:59:38.353] slow peer or too many requests, dropping its old requests name=erigon/v2.54.0-aeec5...
2023-11-24 12:42:08 +01:00
Alex Sharov
23f23bc971
disable disc tests on Mac (#8822)
TestUDPv4_smallNetConvergence tests are often timeout on mac - disabling
this tests on mac CI
2023-11-23 16:00:42 +07:00
milen
34c0fe29ad
metrics: swap remaining VictoriaMetrics usages with erigon-lib/metrics (#8762)
# Background

Erigon currently uses a combination of Victoria Metrics and Prometheus
client for providing metrics.

We want to rationalize this and use only the Prometheus client library,
but we want to maintain the simplified Victoria Metrics methods for
constructing metrics.

This task is currently partly complete and needs to be finished to a
stage where we can remove the Victoria Metrics module from the Erigon
code base.

# Summary of changes

- Adds missing `NewCounter`, `NewSummary`, `NewHistogram`,
`GetOrCreateHistogram` functions to `erigon-lib/metrics` similar to the
interface VictoriaMetrics lib provides
- Minor tidy up for consistency inside `erigon-lib/metrics/set.go`
around return types (panic vs err consistency for funcs inside the
file), error messages, comments
- Replace all remaining usages of `github.com/VictoriaMetrics/metrics`
with `github.com/ledgerwatch/erigon-lib/metrics` - seamless (only import
changes) since interfaces match
2023-11-20 12:23:23 +00:00
battlmonstr
a5ff524740
p2p: fix discovery shutdown (#8725) - alternative fix (#8757)
Making the addReplyMatcher channel unbuffered makes the loop
going too slow sometimes for serving parallel requests.
This is an alternative fix for keeping the channel buffered.
2023-11-17 11:02:28 +01:00
battlmonstr
3ca7fdf7e9
p2p: fix discovery shutdown (#8725) (#8735)
Problem:
Some goroutines are blocked on shutdown:
1. table close <-tab.closed // because table loop pending
1. table loop <-refreshDone // because lookup shutdown blocks doRefresh
1. lookup shutdown <-it.replyCh // because it.queryfunc (findnode -
ensureBond) is blocked, and not returning errClosed (if it returns and
pushes to it.replyCh, then shutdown() will unblock)
1. findnode - ensureBond <-rm.errc // because the related replyMatcher
was added after loop() exited, so there's nothing to push errClosed and
unlock it

If addReplyMatcher channel is buffered, it is possible that
UDPv4.pending() adds a new reply matcher after closeCtx.Done().
Such reply matcher's errc result channel will never be updated, because
the UDPv4.loop() has exited at this point. Subsequent discovery
operations will deadlock.

Solution:
Revert to an unbuffered channel.
2023-11-17 09:13:44 +07:00
Giulio rebuffo
274f84598c
Automation tool to automatically upload caplin's snapshot files to R2 (#8747)
Upload beacon snapshots to R2 every week by default
2023-11-16 20:59:43 +01:00
Alex Sharov
35bfffd621
sys deps up (#8695) 2023-11-11 15:04:18 +03:00
Mark Holt
509a7af26a
Discovery zero refresh timer (#8661)
This fixes an issue where the mumbai testnet node struggle to find
peers. Before this fix in general test peer numbers are typically around
20 in total between eth66, eth67 and eth68. For new peers some can
struggle to find even a single peer after days of operation.

These are the numbers after 12 hours or running on a node which
previously could not find any peers: eth66=13, eth67=76, eth68=91.

The root cause of this issue is the following:

- A significant number of mumbai peers around the boot node return
network ids which are different from those currently available in the
DHT
- The available nodes are all consequently busy and return 'too many
peers' for long periods

These issues case a significant number of discovery timeouts, some of
the queries will never receive a response.

This causes the discovery read loop to enter a channel deadlock - which
means that no responses are processed, nor timeouts fired. This causes
the discovery process in the node to stop. From then on it just
re-requests handshakes from a relatively small number of peers.

This check in fixes this situation with the following changes:

- Remove the deadlock by running the timer in a separate go-routine so
it can run independently of the main request processing.
- Allow the discovery process matcher to match on port if no id match
can be established on initial ping. This allows subsequent node
validation to proceed and if the node proves to be valid via the
remainder of the look-up and handshake process it us used as a valid
peer.
- Completely unsolicited responses, i.e. those which come from a
completely unknown ip:port combination continue to be ignored.
-
2023-11-07 08:48:58 +00:00
battlmonstr
d92898a508
p2p: silkworm sentry (#8527) 2023-11-02 08:35:13 +07:00
Dmytro
9adf31b8eb
bytes transfet separated by capability and category (#8568)
Co-authored-by: Mark Holt <mark@distributed.vision>
2023-10-27 22:30:28 +03:00
battlmonstr
f1c81dc14e
devnet: fix node startup on macOS (#8569)
* call getEnode before NodeStarted to make sure it is ready for RPC
calls
* fix connection error detection on macOS
* use a non-default p2p port to avoid conflicts
* disable bor milestones on local heimdall
* generate node keys for static peers config
2023-10-26 12:58:01 +07:00
Dmytro
ec59be2261
Dvovk/sentinel and sentry peers data collect (#8533) 2023-10-23 17:33:08 +03:00
a
436493350e
Sentinel refactor (#8296)
1. changes sentinel to use an http-like interface

2. moves hexutil, crypto/blake2b, metrics packages to erigon-lib
2023-10-22 01:17:18 +02:00
battlmonstr
e04dee12fd
p2p: bad p2p server port in the log (#8493)
Problem:
"Started P2P networking" log message contains port zero on startup,
e.g.: 127.0.0.1:0 because of the outdated localnodeAddrCache.

Solution:
Call updateLocalNodeStaticAddrCache after updating the port.
2023-10-17 10:40:02 +07:00
Alex Sharov
6d9a4f4d94
rpcdaemon: must not create db - because doesn't know right parameters (#8445) 2023-10-12 14:11:46 +07:00
Alex Sharov
404719c292
Medbx: add label to error messages, UpdateForkChoice: add ctx to erorrs, MemDb: increase db-limit from 512Mb to 512Gb (#8434) 2023-10-11 12:53:34 +07:00
Jason Yellick
5654ba07c9
Upgrade libp2p (enables go 1.21 support) (#8288)
Closes #8078 

This change is primarily intended to support go 1.21, but as a
side-effect requires updating libp2p, which in turn triggers an update
of golang.org/x/exp which creates quite a bit of (simple) churn in the
slice sorting.

This change introduces a new `cmp.Compare` function which can be used to
return an integer satisfying the compare interface for slice sorting.

In order to continue to support mplex for libp2p, the change references
github.com/libp2p/go-libp2p-mplex instead. Please see the PR at
https://github.com/libp2p/go-libp2p/pull/2498 for the official usptream
comment that indicates official support for mplex being moved to this
location.

Co-authored-by: Jason Yellick <jason@enya.ai>
2023-09-29 22:11:13 +02:00
battlmonstr
d6df923dd8
p2p: limit ping requests from a single peer (#8113)
see:
https://github.com/ethereum/go-ethereum/pull/27887
2023-09-06 17:56:03 +02:00
Mark Holt
8ea0096d56
moved metrics sub packages types to metrics (#8119)
This is a non functional change which consolidates the various packages
under metrics into the top level package now that the dead code is
removed.

It is a precursor to the removal of Victoria metrics after which all
erigon metrics code will be contained in this single package.
2023-09-03 08:09:27 +07:00
battlmonstr
340b9811b0
p2p: refactor peer errors to propagate with a DiscReason (#8089)
Improve p2p error handling to propagate errors
from the origin up the call chain the Server peer removal code
using a new PeerError type containing a DiscReason and a more detailed
description.

The origin can be tracked down using PeerErrorCode (code) and DiscReason
(reason)
which looks like this in the log:

> [TRACE] [08-28|16:33:40.205] Removing p2p peer peercount=0
url=enode://d399f4b...@1.2.3.4:30303 duration=6.901ms
err="PeerError(code=remote disconnect reason, reason=too many peers,
err=<nil>, message=Peer.run got a remote DiscReason)"
2023-08-31 16:45:23 +01:00
Mark Holt
a4cfbe0d56
Heimdall metrics + Metrics HTTP server rationalization (#8094)
This is an update of:

https://github.com/ledgerwatch/erigon/pull/7846

which uses a local fork of victoria metrics to include the changes that
https://github.com/anshalshukla added to the original for we where
using.

It also includes code to address the duplicate metrics issue identified
here:

https://github.com/ledgerwatch/erigon/issues/8053

It has one more associated fix which is to correctly add a metadata
label to counters, these where previously labelled as gauges.

e.g. 

```
# TYPE p2p_peers counter
p2p_peers 0
```
rather than

```
# TYPE p2p_peers gauge
p2p_peers 0
```

---------

Co-authored-by: Anshal Shukla <53994948+anshalshukla@users.noreply.github.com>
Co-authored-by: Anshal Shukla <shukla.anshal85@gmail.com>
2023-08-31 09:04:27 +01:00
battlmonstr
bb2c2adbb6
p2p: fix RLPx disconnect message decoding (#8056)
The disconnect message could either be a plain integer, or a list with
one integer element. We were encoding it as a plain integer, but
decoding as a list. Change this to be able to decode any format.
2023-08-24 13:49:19 +02:00
Alex Sharov
2b6c21fddb
move mdbx to new org (#8061) 2023-08-24 18:00:24 +07:00
battlmonstr
6c017c33f9
p2p: log NAT ExternalIP error (#8026) 2023-08-22 10:51:00 +02:00
Mark Holt
529d359ca6
Bor span testing (#7897)
An update to the devnet to introduce a local heimdall to facilitate
multiple validators without the need for an external process, and hence
validator registration/staking etc.

In this initial release only span generation is supported.  

It has the following changes:

* Introduction of a local grpc heimdall interface
* Allocation of accounts via a devnet account generator ()
* Introduction on 'Services' for the network config

"--chain bor-devnet --bor.localheimdall" will run a 2 validator network
with a local service
"--chain bor-devnet --bor.withoutheimdall" will sun a single validator
with no heimdall service as before

---------

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
2023-07-18 09:47:04 +01:00
a
19bc41198e
[caplin] conn gater (#7900)
conn gater


so this is what prysm did to address the issue
https://github.com/prysmaticlabs/prysm/pull/8648/files
2023-07-16 08:31:06 +02:00
Alex Sharov
4adb7fd737
move TestUDPv5_callResend to integration suite (#7845) 2023-07-05 10:45:20 +07:00
ledgerwatch
067f695fff
[devnet tool] Separate logging (#7553)
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
2023-05-20 14:48:16 +01:00
ledgerwatch
2a872b4d54
[devnet] separare logging - headers download (#7551)
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
2023-05-20 07:00:19 +01:00
ledgerwatch
c919283b0c
[devnet] separate logging p2p (#7549)
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
2023-05-19 23:08:45 +01:00
ledgerwatch
b0117a7c30
[devnet] separate logging - p2p (#7547)
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
2023-05-19 18:41:53 +01:00
battlmonstr
404e395bb4
p2p: fix peer ID serialization (#7495)
The peer ID in sentry.proto is a H512 / 64 bytes value, and
MarshalPubkey creates it from a public key.

There's no need to cut the first byte, because MarshalPubkey already
does it.
Doing so results in a 63 bytes value that is incompatible with silkworm
sentry.
2023-05-11 17:19:47 +01:00
Alex Sharov
f23612bdfe
Enode logging broke when NAT Parameter set in 2.43.0 (#7480)
for https://github.com/ledgerwatch/erigon/issues/7472
2023-05-10 10:25:53 +07:00
alex.sharov
baa8572353 tests: less output 2023-05-05 13:59:59 +07:00
Alex Sharov
69a3396433
add flag --db.size.limit (#7325) 2023-04-17 12:48:57 +00:00
Alex Sharov
a8e8bf4528
remove simd lib, because it doesn't work with ghcr.io/goreleaser/goreleaser-cross (which producing release binaries) (#7229)
@shyba hi, seems this lib doesn't work with
ghcr.io/goreleaser/goreleaser-cross (which producing release binaries)
removing it for now, feel free to add it in future - if can make it work
with goreleaser-cross
see: https://github.com/ledgerwatch/erigon/issues/7210
2023-03-31 05:07:43 +00:00
Alex Sharov
201572c6f5
enable more linters #954 (#7179) 2023-03-25 05:13:27 +00:00
hexoscott
3b36d5d57a
get localnode address up front on creation to save potential data race (#7111)
the log line here was the culprit for the race. made sense to just
capture this on localnode creation instead and hold onto it for when the
server is started.

ran test with `-race` and `-count=5000` to double check and all looks
good
2023-03-16 03:44:00 +00:00
Alex Sharov
157a380be7
e3: history no auto-increment (#7097) 2023-03-15 08:03:57 +00:00
Alex Sharov
bbe56620a3
move more parts to lru2 (#7098) 2023-03-14 07:37:23 +00:00
Victor Shyba
158fb2b606
Optimize memory buffer, simplify set32, use sha256-simd (#7060)
Hi,

I'm syncing Gnosis on a Celeron N5100 to get familiar with the codebase.
In the process I managed to optimize some things from profiling.
Since I'm not yet on the project Discord, I decided to open this PR as a
suggestion. This pass all tests here and gave me a nice boost for that
platform, although I didn't have time to benchmark it yet.

* reuse VM Memory objects with sync.Pool. It starts with 4k as `evmone`
[code
suggested](0897edb001/lib/evmone/execution_state.hpp (L49))
as a good value.
* set32 simplification: mostly cosmetic
* sha256-simd: Celeron has SHA instructions. We should probably do the
same for torrent later, but this already helped as it is very CPU bound
on such a low end processor. Maybe that helps ARM as well.
2023-03-14 07:17:04 +00:00
hexoscott
efd541028c
read metrics config from yaml file (#7089) 2023-03-14 00:07:05 +00:00
Alex Sharov
288fe5dcfb
dnsdisc: backport geth latest version and fixes for go1.20 (#6925) 2023-02-22 08:29:34 +00:00