This PR is to add the request rate limiter.
The solution is to count the request number for each peer for each
minute, if the peer exceeds the limit, block the requests for a
specified time.
Current limits:
- Request limited to `5000` requests per minute for each handler.
- Penalty blockage time `1-minute`
This PR has fixes for a number of instances in the bor heimdall stage
where nil headers are either ignored or inadvertently processed.
It also has a demotion of milestone related logging messages to debug
for missing blocks because the process is not at the head of the chain +
a general reduction in periodic logging to 30 secs rather than 20 to
reduce the log output on long runs.
In addition there is a refactor of persistValidatorSets to perform
validator set initiation in a seperate function. This is intended to
clarify the operation of persistValidatorSets - which is till performing
2 actions, persisting the snapshot and then using it to check the header
against synthesized validator set in the snapshot.
eventsource is required for the validator api. this implements the
eventsource sink/server handler
the implementation is based off of this document:
https://html.spec.whatwg.org/multipage/server-sent-events.html
note that this is a building block for the full eventsource server.
there still needs to be work done
prysm has their own custom solution based off of protobuf/grpc:
https://hackmd.io/@prysmaticlabs/eventstream-api using that would be not
good
existing eventsource implementations for golang are not good for our
situation. options are:
1. https://github.com/r3labs/sse - has most stars - this is the best
contender, since it uses []byte and not string, but it allocates and
copies extra times in the server (because of use of fprintf) and makes
an incorrect assumption about Last-Event-ID needing to be a number (i
can't find this in the specification).
2. https://github.com/antage/eventsource -requires full buffers, copies
many times, does not provide abstraction for headers. relatively
unmaintained
3. https://github.com/donovanhide/eventsource - missing functionality
around sending ids, requires full buffers, etc
4. https://github.com/bernerdschaefer/eventsource - 10 years old,
unmaintained.
additionally, implemetations other than r3labs/sse are very incorrect
because they do not split up the data field correctly when newlines are
sent. (parsers by specification will fail to encode messages sent by
most of these implementations that have newlines, as i understand it).
the implementation by r3labs/sse is also incorrect because it does not
respect \r
finally, all these implementations have very heavy implementation of the
server, which we do not need since we will use fixed sequence ids.
r3labs/sse for instance hijacks the entire handler and ties that to the
server, losing a lot of flexiblity in how we implement our server
for the beacon api, we need to stream:
```head, block, attestation, voluntary_exit, bls_to_execution_change, finalized_checkpoint, chain_reorg, contribution_and_proof, light_client_finality_update, light_client_optimistic_update, payload_attributes```
some of these are rather big json payloads, and the ability to simultaneously stream them from io.Readers instead of making a full copy of the payload every time we wish to rebroadcast it will save a lot of heap size for both resource constrained environments and serving at scale.
the protocol itself is relatively simple, there are just a few gotchas
Changed distribution of httpcfg.HttpCfg to be pointer.
Added new flags:
rpc.slow.log - which is false by default, this flag need to enable
logging slow RPC requests
rpc.slow.log.threshold - which is 100 by default, this flag specify slow
threshold in milliseconds
Updated rpc handler to log slow requests:
- added map[request id] {method, timestamp}
- put every request details to map above
- delete request details from map above
- added time interval check for elements in map and if time difference
is more than given threshold print request id and the method
- app will print slow requests in next cases:
1. As soon as request take more than given threshold
2. Every 20 seconds if request still in process
3. After request finished and it took more than give threshold
---------
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
Current value: 16 was added by me 1 year ago and didn't mean anything.
Never seen this field holding much data, probably can increase.
Currently I see logs like (and 10x like this):
[DBUG] [11-24|06:59:38.353] slow peer or too many requests, dropping its old requests name=erigon/v2.54.0-aeec5...
This PR adds support to store the transaction dependency (generated by
the block producer) in the block header for bor. This transaction
dependency will then be used by the parallel processor
([Block-STM](https://github.com/ledgerwatch/erigon/pull/7812/)).
I have created another
[PR](https://github.com/ledgerwatch/erigon-lib/pull/1064) in the
erigon-lib repo which adds the `IsParallelUniverse()` function.
# Background
Erigon currently uses a combination of Victoria Metrics and Prometheus
client for providing metrics.
We want to rationalize this and use only the Prometheus client library,
but we want to maintain the simplified Victoria Metrics methods for
constructing metrics.
This task is currently partly complete and needs to be finished to a
stage where we can remove the Victoria Metrics module from the Erigon
code base.
# Summary of changes
- Remove `UsePrometheusClient` boolean flag
- Remove `VictoriaMetrics` client lib and related code (simplifies
registry and prometheus http handler initialisation since now we have
only 1 registry and can use default `promhttp.Handler`)
I using `https://heimdall-api-testnet.polygon.technology/` and seems
5sec timeout is not enough sometime - even that remote service working
well (node syncing well)
most of timeouts comes from same endpoint:
```
[bor.heimdall] request canceled reason="context deadline exceeded" path=/milestone/lastNoAck attempt=2
```