eventsource is required for the validator api. this implements the
eventsource sink/server handler
the implementation is based off of this document:
https://html.spec.whatwg.org/multipage/server-sent-events.html
note that this is a building block for the full eventsource server.
there still needs to be work done
prysm has their own custom solution based off of protobuf/grpc:
https://hackmd.io/@prysmaticlabs/eventstream-api using that would be not
good
existing eventsource implementations for golang are not good for our
situation. options are:
1. https://github.com/r3labs/sse - has most stars - this is the best
contender, since it uses []byte and not string, but it allocates and
copies extra times in the server (because of use of fprintf) and makes
an incorrect assumption about Last-Event-ID needing to be a number (i
can't find this in the specification).
2. https://github.com/antage/eventsource -requires full buffers, copies
many times, does not provide abstraction for headers. relatively
unmaintained
3. https://github.com/donovanhide/eventsource - missing functionality
around sending ids, requires full buffers, etc
4. https://github.com/bernerdschaefer/eventsource - 10 years old,
unmaintained.
additionally, implemetations other than r3labs/sse are very incorrect
because they do not split up the data field correctly when newlines are
sent. (parsers by specification will fail to encode messages sent by
most of these implementations that have newlines, as i understand it).
the implementation by r3labs/sse is also incorrect because it does not
respect \r
finally, all these implementations have very heavy implementation of the
server, which we do not need since we will use fixed sequence ids.
r3labs/sse for instance hijacks the entire handler and ties that to the
server, losing a lot of flexiblity in how we implement our server
for the beacon api, we need to stream:
```head, block, attestation, voluntary_exit, bls_to_execution_change, finalized_checkpoint, chain_reorg, contribution_and_proof, light_client_finality_update, light_client_optimistic_update, payload_attributes```
some of these are rather big json payloads, and the ability to simultaneously stream them from io.Readers instead of making a full copy of the payload every time we wish to rebroadcast it will save a lot of heap size for both resource constrained environments and serving at scale.
the protocol itself is relatively simple, there are just a few gotchas
This PR adds support to store the transaction dependency (generated by
the block producer) in the block header for bor. This transaction
dependency will then be used by the parallel processor
([Block-STM](https://github.com/ledgerwatch/erigon/pull/7812/)).
I have created another
[PR](https://github.com/ledgerwatch/erigon-lib/pull/1064) in the
erigon-lib repo which adds the `IsParallelUniverse()` function.
# Background
Erigon currently uses a combination of Victoria Metrics and Prometheus
client for providing metrics.
We want to rationalize this and use only the Prometheus client library,
but we want to maintain the simplified Victoria Metrics methods for
constructing metrics.
This task is currently partly complete and needs to be finished to a
stage where we can remove the Victoria Metrics module from the Erigon
code base.
# Summary of changes
- Remove `UsePrometheusClient` boolean flag
- Remove `VictoriaMetrics` client lib and related code (simplifies
registry and prometheus http handler initialisation since now we have
only 1 registry and can use default `promhttp.Handler`)
# Background
Erigon currently uses a combination of Victoria Metrics and Prometheus
client for providing metrics.
We want to rationalize this and use only the Prometheus client library,
but we want to maintain the simplified Victoria Metrics methods for
constructing metrics.
This task is currently partly complete and needs to be finished to a
stage where we can remove the Victoria Metrics module from the Erigon
code base.
# Summary of changes
- Adds missing `NewCounter`, `NewSummary`, `NewHistogram`,
`GetOrCreateHistogram` functions to `erigon-lib/metrics` similar to the
interface VictoriaMetrics lib provides
- Minor tidy up for consistency inside `erigon-lib/metrics/set.go`
around return types (panic vs err consistency for funcs inside the
file), error messages, comments
- Replace all remaining usages of `github.com/VictoriaMetrics/metrics`
with `github.com/ledgerwatch/erigon-lib/metrics` - seamless (only import
changes) since interfaces match
# Background
Erigon currently uses a combination of Victoria Metrics and Prometheus
client for providing metrics.
We want to rationalize this and use only the Prometheus client library,
but we want to maintain the simplified Victoria Metrics methods for
constructing metrics.
This task is currently partly complete and needs to be finished to a
stage where we can remove the Victoria Metrics module from the Erigon
code base.
## Tests
### Functional
* Make sure that the format change int->float implied by VM to
Prometheus does not impact clients (pay particular attention to block
numbers)
* Check that the prometheus/grafana dashboards defined in cmd/prometheus
are functional after the change
(see docker-compose.yml for details and
https://github.com/ledgerwatch/erigon/tree/devel/cmd/prometheus#readme)
* Confirm that the underlying go metrics are still generated
* Confirm the following flags setting work:
--metrics, --metrics.addr, --metrics.port with the new code
* Confirm that --metrics and --proff settings and handlers configuration
still allow metrics and pprof to share a port
#### Float counters - scientific notation test case
![Screenshot_2023-11-07_at_15 57
21](https://github.com/ledgerwatch/erigon/assets/94537774/32f0a6f6-968b-477c-8ec8-bb1812f3e848)
![Screenshot 2023-11-15 at 16 26
56](https://github.com/ledgerwatch/erigon/assets/94537774/3f402b2e-e343-4928-9fbb-18fa4d077485)
#### Float counters - NaN test case
![Screenshot_2023-11-07_at_16 04
25](https://github.com/ledgerwatch/erigon/assets/94537774/cbf90d5d-3749-4bd7-971d-e2124e54267c)
![Screenshot 2023-11-15 at 16 28
36](https://github.com/ledgerwatch/erigon/assets/94537774/5924915e-1977-4b7f-8082-23f73d0957d5)
### Performance
* Check the performance of counters created by RPC calls measurements
created by rpc/metrics.go are not impacted by the change.
#### RPC
Performed tests on rpcdaemon & erigon on localhost using
`etc_blockNumber`.
Did tests with 100, 1000, 10000 requests. Got a steady 15 ms response
time.
#### Memory
![Screenshot 2023-11-16 at 09 58
39](https://github.com/ledgerwatch/erigon/assets/94537774/5dd956d7-903f-4bea-a460-d3644da56201)
This fixes an issue where the mumbai testnet node struggle to find
peers. Before this fix in general test peer numbers are typically around
20 in total between eth66, eth67 and eth68. For new peers some can
struggle to find even a single peer after days of operation.
These are the numbers after 12 hours or running on a node which
previously could not find any peers: eth66=13, eth67=76, eth68=91.
The root cause of this issue is the following:
- A significant number of mumbai peers around the boot node return
network ids which are different from those currently available in the
DHT
- The available nodes are all consequently busy and return 'too many
peers' for long periods
These issues case a significant number of discovery timeouts, some of
the queries will never receive a response.
This causes the discovery read loop to enter a channel deadlock - which
means that no responses are processed, nor timeouts fired. This causes
the discovery process in the node to stop. From then on it just
re-requests handshakes from a relatively small number of peers.
This check in fixes this situation with the following changes:
- Remove the deadlock by running the timer in a separate go-routine so
it can run independently of the main request processing.
- Allow the discovery process matcher to match on port if no id match
can be established on initial ping. This allows subsequent node
validation to proceed and if the node proves to be valid via the
remainder of the look-up and handshake process it us used as a valid
peer.
- Completely unsolicited responses, i.e. those which come from a
completely unknown ip:port combination continue to be ignored.
-
rlp2 is a package that aims to replace the existing erigon-lib/rlp
package and the erigon/common/rlp
it is called rlp2 for now because it requires breaking changes to
erigon-lib/rlp and i do not have the time right now to test all current
uses of such functions
however, the encoder/decoder characteristics of rlp2 might be desirable
for caplin, and also for execution layer parsing blob txns, so im
putting it in a folder called rlp2 (note that it exports package rlp for
easier switching later)
importantly, rlp2 is designed for single-pass decoding with the ability
to skip elements one does not care about. it also is zero alloc.
Reason:
- produce and seed snapshots earlier on chain tip. reduce depnedency on
"good peers with history" at p2p-network.
Some networks have no much archive peers, also ConsensusLayer clients
are not-good(not-incentivised) at serving history.
- avoiding having too much files:
more files(shards) - means "more metadata", "more lookups for
non-indexed queries", "more dictionaries", "more bittorrent
connections", ...
less files - means small files will be removed after merge (no peers for
this files).
ToDo:
[x] Recent 500K - merge up to 100K
[x] Older than 500K - merge up to 500K
[x] Start seeding 100k files
[x] Stop seeding 100k files after merge (right before delete)
In next PR:
[] Old version of Erigon must be able download recent hashes. To achieve
it - at first start erigon will download preverified hashes .toml from
s3 - if it's newer that what we have (build-in) - use it.
read transaction was opened before stream.Recv(), but stream.Recv() is
blocking infinity loop. so, this read transaction never rollback -
causing unlimited db grow.
---------
Merge PR #8596 into `devel`
---------
Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>