The current logic is flawed, because it drops all peers that are less
synced.
It is valid to return empty responses by the eth spec.
A proper logic should penalize from the context of the sync process,
where enough "reputation" data is collected about a peer.
In order to be able to connect to erigon 2.48 peers that have
--sentry.drop-useless-peers enabled,
this adds a check to not reply with an empty headers list.
If we reply with an empty list, we're going to be considered useless and
kicked.
Once enough of erigon nodes are updated in the network past this commit,
this check should be removed,
because it is totally acceptable to return an empty list by the eth
spec.
Currently PropagateNewBlockHashes and BroadcastNewBlock
selects a subset of all sentries by taking a `Sqrt(len(sentries))`,
and then for each sentry SendMessageToRandomPeers
selects a subset of its peers by taking `Sqrt(len(peerInfos))`.
This behaviour limits the broadcast scope with a lot of peers, e.g. 100
becomes 10,
but is not great with very few peers, or if the message is very
important
to broadcast to everyone, which is the case of bor validator/proposer
nodes.
* send to all sentries in both BroadcastNewBlock and PropagateNewBlockHashes
* remove peerCountConstrained sqrt logic in SendMessageToRandomPeers
* add maxPeers provider func as a parameter to MultiClient
* default it to 10 for eth and 0 (unlimited) for bor validators
---------
Co-authored-by: Mark Holt <mark@distributed.vision>
[txpool](https://github.com/ledgerwatch/erigon-lib/blob/main/txpool/pool.go)
expects an `OnNewBlock` update only after the DB transaction is
committed.
This fixes, for example, a nonce gap mis-detection in Hive test
"engine-cancun/Blob Transactions On Block 1, Cancun Genesis".
Otterscan API search methods allow the user to inform the page size.
This PR adds an internal max (default == 25 results) to cap the page
size, regardless of what the user asks.
It also adds a `--ots.search.max.pagesize` CLI args to override this max
(either in erigon and rpcdaemon binaries).
Miracoulously, hive tests pass first try. YIPPIE.
Also for the future, I added `--experimental.modular` which enables a
secondary engine API for consensus separation.
Now block building is responsibility of the execution module.
An update to the devnet to introduce a local heimdall to facilitate
multiple validators without the need for an external process, and hence
validator registration/staking etc.
In this initial release only span generation is supported.
It has the following changes:
* Introduction of a local grpc heimdall interface
* Allocation of accounts via a devnet account generator ()
* Introduction on 'Services' for the network config
"--chain bor-devnet --bor.localheimdall" will run a 2 validator network
with a local service
"--chain bor-devnet --bor.withoutheimdall" will sun a single validator
with no heimdall service as before
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
This request implements the insertion of Bor ephemeral transactions into
snapshot indexes.
I does this by taking the block hash from the header index and passing
it to the transaction indexer to add an additional index entry per block
into the transaction hash -> block index.
The passed entries are currently contained in an in memory array which
is (32 * number of blocks / sprint size) bytes.
In addition to the functional code there is also an update to the
`dump_test.go` so that it runs `DumpBlocks` to exercise the indexing
code. To facilitate this the `InsertChain` method in `mock_sentry` has
been modified so that it can process >128 blocks.
The code in this request also includes additional bor/consensus code
with the following functions:
`CalculateSprint`
`CalculateSprintCount`
The first function is a modification of the code in erigon-lib so that
the sprints are numerically rather than lexically ordered. This code
should be migrated to erigon-lib and should have its sprint set
calculated once from its underlying map rather than this process being
repeated every calculation.
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
Co-authored-by: ledgerwatch <akhounov@gmail.com>
Co-authored-by: Enrique Jose Avila Asapche <eavilaasapche@gmail.com>
Co-authored-by: Giulio <giulio.rebuffo@gmail.com>
This PR separates ENGINE from Ethbackend. It makes it so:
1) EthBackend not a god class
2) We can abstract away engine API so that we can make it CL-like and
enable Consensus-Execution driven design
3) Objective is Json-RPC -> Engine Consensus Module -> Execution module.
The fixes here fix a couple of issues related to devnet start-up
1. macos threading and syscall error return where causing multi node
start to both not wait and fail
2. On windows creating DB's with the default 2 TB mapsize causes the os
to reserve about 4GB of committed memory per DB. This may not be used -
but is reserved by the OS - so a default bor node reserves around 10GB
of storage. Starting many nodes causes the OS page file to become
exhausted.
To fix this the consensus DB's now use the node's OpenDatabase function
rather than their own, which means that the consensus DB's take notice
of the config.MdbxDBSizeLimit.
This fix leaves one 4GB committed memory allocation in the TX pool which
needs its own MapSize setting.
---------
Co-authored-by: Alex Sharp <akhounov@gmail.com>
Added support tunnel to the devnet cmd. In order to get this to run I
made the following changes:
* Create a public function
* Added non root logging
I have also added commentary to the readme to explain the additional
command line arguments needed to integrate with diagnostics. In summary,
if you set the --diagnostics.url the devenet will wait for diagnostic
requests rather than exiting
---------
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
- breaks dependency from staged_sync to package with block_reader
implementation
- breaks dependency from snap_sync to package with block_reader
implementation
- breaks dependency from mining to txpool implementation
This is an update to the devnet code which introduces the concept of
configurable scenarios. This replaces the previous hard coded execution
function.
The intention is that now both the network and the operations to run on
the network can be described in a data structure which is configurable
and composable.
The operating model is to create a network and then ask it to run
scenarios:
```go
network.Run(
runCtx,
scenarios.Scenario{
Name: "all",
Steps: []*scenarios.Step{
&scenarios.Step{Text: "InitSubscriptions", Args: []any{[]requests.SubMethod{requests.Methods.ETHNewHeads}}},
&scenarios.Step{Text: "PingErigonRpc"},
&scenarios.Step{Text: "CheckTxPoolContent", Args: []any{0, 0, 0}},
&scenarios.Step{Text: "SendTxWithDynamicFee", Args: []any{recipientAddress, services.DevAddress, sendValue}},
&scenarios.Step{Text: "AwaitBlocks", Args: []any{2 * time.Second}},
},
})
```
The steps here refer to step handlers which can be defined as follows:
```go
func init() {
scenarios.MustRegisterStepHandlers(
scenarios.StepHandler(GetBalance),
)
}
func GetBalance(ctx context.Context, addr string, blockNum requests.BlockNumber, checkBal uint64) {
...
```
This commit is an initial implementation of the scenario running - which
is working, but will need to be enhanced to make it more usable &
developable.
The current version of the code is working and has been tested with the
dev network, and bor withoutheimdall. There is a multi miner bor
heimdall configuration but this is yet to be tested.
Note that by default the scenario runner picks nodes at random on the
network to send transactions to. this causes the dev network to run very
slowly as it seems to take a long time to include transactions where the
nonce is incremented across nodes. It seems to take a long time for the
nonce to catch up in the transaction pool processing. This is yet to be
investigated.
attempt to address next issue:
> when I'm having a lot of websocket connections the node is freezing
and then it needs like 10 mins to sync. Then if I keep pushing requests
it falls out of sync all the time
This request adds an additional logging flag to change the name of the
logfiles produced by erigon to be prefixed by a name other than
'erigon'.
This allows multiple nodes to log to the same directory without
overwriting each others files. It is requires so that the devnet when
running can maintain all of its log files in a single consolidated
directory which survives devnet restarts.
---------
Co-authored-by: Mark Holt <mark@distributed.vision>
- allow store non-canonical blocks/senders
- optimize re-org: don't update/delete most of data
- allow mark chain as `Bad` - will be not visible by eth_getBlockByHash,
but can read if have hash+num
- stage_senders: don't re-calc existing senders
- stage_tx_lookup: prune less blocks per iteration - because
random-deletes are expensive. pruning must not slow-down sync.
- prune data even if --snap.stop is set
- "prune as-much-as-possible at startup" is not very good idea: at
initialCycle machine can be cold and prune will cause big downtime, no
reason to produce much freelist in 1 tx. People may also restart erigon
- because of some bug - and it will cause unexpected downtime (usually
Erigon startup very fast). So, I just remove all `initialSync`-related
logic in pruning.
- fix lost metrics about disk write byte/sec
it's step towards saving canonical and non-canonical bodies in same
table (and txs also in same own table). to reduce write amplification
(cheaper re-orgs)
PR change: reading BaseTxNum from existing snapshots instead of DB
DB will store in field body.BaseTxNum - non-canonical TxnID
Snapshots will store only canonical TxNum in field body.BaseTxNum
## What's this PR about?
- Added states to be sent to diagnostics system for header downloader
monitor
- Added the code for sending the states through the tunnel
- Code added for updating the states in the header_algos.go file
---------
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
- always RLock all snapshots - to guarantee consistency
- introduce class View (analog of RoTx and MakeContext)
- move read methods to View object
- View object will be managed by temporal_tx
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
## What's this PR is about?
Minor fix in args usage message of support flag. The current message
says that the flag should be 'metrics.url' but it reality it should be
'metrics.urls'
Blob transactions are SSZ encoded, so it had to be added to decoding.
There are 2 encoding forms: `network` and `minimal` (usual). Network
encoded blob transactions include "wrapper data" which are `kzgs`,
`blobs` and `proofs`, and decoded by `DecodeWrappedTransaction`. For
previous types of transactions the network encoding is no different.
Execution-payloads / blocks use the minimal encoding of transactions. In
the transaction-pool and local transaction-journal the network encoding
is used.
Concerns:
1. Possible performance reduction caused by these changes, not sure if
streams are better then slices. Go slices in this modifications are
read-only, so they should be referred to the same underlying array and
passed by a reference.
2. If `DecodeWrappedTransaction` and `DecodeTransaction` will create
confusion and should be merged into one function.
This is the beginning of the series of changes to make it possible to
run multiple instances of erigon inside a single process (as devnet tool
does), with the logging from these processes going to respective log
files correctly.
This is the first part where the initial infrastructure is being
established
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
this first major move separates the transient beacon state cache from
the underlying tree.
leaf updates are enforced in the setters, which should make programming
easier.
all exported methods of the raw.BeaconState should be safe to call
(without disrupting internal state)
changes many functions to consume *raw.BeaconState in perparation for
interface
beyond refactor it also:
adds a pool for the leaves of the validator ssz hash
adds a pool for the snappy writers
removed the parallel hash experiment (high memory use)
To expose context flags through diagnostic endpoint
`/debug/metrics/flags`.
The current `/debug/metrics/cmdline` only provides the command run by
user. In the case when user runs Erigon using config file, `cmdline`’s
info does not truly reflect the ‘launch setting' and flags set that
Erigon is running on.
## Example
### Command
```
erigon --datadir /tmp/data --config test.yaml
```
### Pseudo config file (in yaml)
```
datadir : '/tmp/data'
chain : "sepolia"
http : true
metrics: true
private.api.addr : "localhost:9090"
http.api : ["eth","debug","net"]
```
### Output
```
SUCCESS
chain=sepolia
config=test.yaml
datadir=/tmp/data
http=true
http.api=eth,debug,net
metrics=true
private.api.addr=localhost:9090
```
This addresses the last known deficiency of the eth_getProof
implementation. The previous code would return an error in the event
that the element was not found in the trie. EIP-1186 allows for
'negative' proofs where a proof demonstrates that an element cannot be
in the trie, so this commit updates the logic to support that case.
Co-authored-by: Jason Yellick <jason@enya.ai>
This PR extends the function merged in #7337 to cover the proof
generation paths as well.
The first commit simply migrates the internal proof checking code from
the rpc `eth_getProof` test to be available externally exported under
`turbo/trie`. In the process, it translates the in place testing
assertions to return more normally as errors and adapts to use the
pre-existing trie node types which are defined slightly differently
(although this code is still intended only for testing purposes).
The second commit refactors the existing trie fuzzing tests from the
previous PR and adds new fuzzing tests for the proof generation. In
order to validate the proofs, these fuzzing tests fundamentally do two
things. Firstly, after bootstrapping the test (by seeding, and modifying
the db), for each key we compute the 'naive' proof utilizing the
existing code in `proof.go` and the in memory `trie.Trie` structure. The
`trie.Trie` code actually had a couple small bugs which are fixed in
this PR (not handling value nodes, and not honoring the `NewTestRLPTrie`
contract of pre-RLP encoded nodes in proof generation). Secondly, we
re-compute the same proof for the flatDB production variant, and verify
that it is exactly the same proof as computed by the naive
implementation.
This fuzzing has been run for ~72 hours locally with no errors. Although
this code hasn't identified any new bugs in the proof generation path,
it improves coverage and should help to prevent regressions. Additional
extensions will be provided in a subsequent PR.
---------
Co-authored-by: Jason Yellick <jason@enya.ai>
There are currently a number of bugs in the flat DB trie hash
computation. These bugs are improbable in 'real' scenarios (hence the
ability to sync the assorted large chains), and, the repercussions of
encountering them are low (generally re-computing the hash at a later
block will succeed). Still, they can cause the process to crash, or
deadlock, and a clever adversary could feasibly construct a block or
storage update designed to trigger these bugs. (Or, an unlucky block may
trigger them inadvertently). Based on the tracing in the code, it seems
that some of these bugs may have been witnessed in the wild, but not
reliably reproduced (for instance when `maxlen >= len(curr)`).
1. There is an infinite loop that can occur in
_nextSiblingOfParentInMem. This occurs in the account path when the
c.k[1] entry is 'nil' and the code will continuously retry resolving
this entry indefinitely.
2. When the next trie node is deeper in the trie than the previous, but
is not a descendent of the previous trie node, then the old trie node is
inadvertently left in memory instead of nil-ed. This results in an
incorrect hash being computed.
3. When the last trie node being processed is comprised entirely of 'f'
nibbles, the 'FirstNotCoveredPrefix' returns an empty byte array,
because there is no next nibble sub-tree. This causes keys to
inappropriately be reprocessed triggering either an index out of bounds
panic or incorrect hash results.
4. When the _nextSiblingInDB path is triggered, if the next nibble
subtree contains no trie table entries, then any keys with a prefix in
that subtree will be skipped resulting in incorrect hash results.
The fuzzing seeds included cover all four of these cases (for both
accounts and storage where appropriate). As fuzzing is baked into the
native go 1.18 toolchain, running 'go test' (as the Makefile currently
does) is adequate to cover these failures. To run additional fuzzing the
`-fuzz` flag may be provided as documented in the go test doc.
Co-authored-by: Jason Yellick <jason@enya.ai>
types.NewMessage now expects maxFeePerDataGas param, which will be used
in transaction verification (preCheck). GetPayloadV3 method added to
EngineAPI. Some cosmetic changes applied.
According to EIP-1186 the `proof` parts of the response to eth_getProof
should be returned "starting with the stateRoot-Node, following the path
of the SHA3 (address) as key." Currently, the proof is returned in
traversal order, rather than from the root.
Although all of the proof elements are there and correct, this is
contrary to the EIP and will cause problems for some clients. The
existing rpc test uses a map to lookup proof elements by hash, rather
than by index, so this bug was not initially caught.
This commit fixes the behavior, updates the existing test, and adds
additional checks to the rpc test.
Co-authored-by: Jason Yellick <jason@enya.ai>
In this PR Header's ExcessDataGas is set to the actual value, but it's
still unused. It will be used to compute data fee for eip-4844 data
blobs, logic of which will be added in later PRs. Also eip-4844 header
verification logic added.
`excessDataGas` has been partially made eip-4844 ready, so instead of
passing nils to functions, now it actually assigned to some value (it is
expected to be nil until cancun update).
Pre-Shanghai blocks should have `nil` withdrawals, while post-Shanghai
blocks should have non-nil withdrawals (empty or non-empty slice, but
not `nil`). Judging from Issue #6976, that's not always a case. PR #7279
attempted to fix the issue, but unfortunately it only masks the root
cause.
This reverts commit c60a6a2962.
This PR completes the implementation of `eth_getProof` by adding support
for storage proofs.
Because storage proofs are potentially overlapping, the existing
strategy of simply aggregating all proofs together into a single result
was challenging. Instead, this commit rewires things to introduce a
ProofRetainer, which aggregates proofs and their corresponding nibble
encoded paths in the trie. Once all of the proofs have been aggregated,
the caller requests the proof result, which then iterates over the
aggregated proofs, placing each into the relevant proof array into the
result.
Although there are tests for `eth_getProof` as an RPC and for the new
`ProofRetainer` code, the code coverage for the proof generation over
complex tries is lacking. But, since this is not a new problem I'll plan
to follow up this PR with an additional one adding more coverage into
`turbo/trie`.
---------
Co-authored-by: Jason Yellick <jason@enya.ai>
@shyba hi, seems this lib doesn't work with
ghcr.io/goreleaser/goreleaser-cross (which producing release binaries)
removing it for now, feel free to add it in future - if can make it work
with goreleaser-cross
see: https://github.com/ledgerwatch/erigon/issues/7210
Small change in core.NewEVMBlockContext and now it expects
excessDataGas. This will be used in state transition to compute data fee
for eip-4844 data blobs. The logic that computes it will be added in the
next PRs.
This PR contains very small EIP-4844 additions. GasPool is modified and
now it is a struct with 2 fields "gas" and "dataGas" (blobs are priced
in dataGas). ExcessDataGas block header field added. ExcessDataGas
needed to compute the data gas price. EIP-4844 helper functions are
added as well.
This PR starts with a few small commits of code cleanup. Reviewed
separately they should hopefully obviously be functionally no-ops. I'm
happy to strip these out and submit them separately if desired.
The final commit is to add support for older blocks as a parameter to
eth_getProof. In order to compute proofs, the function leverages the
staged sync unwinding code to bring the hashed state table back to its
historic state, as well as to build a list of trie nodes which need to
be invalidated/re-computed in the trie computation. Because these
operations could be expensive for very old blocks, it limits the
distance proofs are allowed from the head. It also adds some additional
checks for correctness, as well as tests which verify the
implementation.
This was discussed a bit on Discord in the db-format topic.
---------
Co-authored-by: Jason Yellick <jason@enya.ai>
Hi,
I'm syncing Gnosis on a Celeron N5100 to get familiar with the codebase.
In the process I managed to optimize some things from profiling.
Since I'm not yet on the project Discord, I decided to open this PR as a
suggestion. This pass all tests here and gave me a nice boost for that
platform, although I didn't have time to benchmark it yet.
* reuse VM Memory objects with sync.Pool. It starts with 4k as `evmone`
[code
suggested](0897edb001/lib/evmone/execution_state.hpp (L49))
as a good value.
* set32 simplification: mostly cosmetic
* sha256-simd: Celeron has SHA instructions. We should probably do the
same for torrent later, but this already helped as it is very CPU bound
on such a low end processor. Maybe that helps ARM as well.
This PR adds missing tests for eth_getProof and does some mild
refactoring with switching from strings to more strict types. It's
likely best/most easily reviewed commit by commit.
Note, the tests include quite a number of helper types and functions for
doing the proof validation. This is largely because unlike Geth,
Erigon's approach to trie computations only requires serializing the
trie nodes, not deserializing them. Consequently, it wasn't obvious how
to leverage the existing trie code for doing deserialization and proof
checks. I checked on Discord, but, there were no suggestions. Of course,
any feedback is welcome and I'd be happy to remove this code if it can
be avoided.
Additionally, I've opted to change the interface type for `GetProof` to
use a `common.Hash` for the storage keys instead of a `string`. I
_think_ this should be fairly safe, as until very recently it was
unimplemented. That being said, since it's an interface, it has the
potential to break other consumers, anyone who was generating mocks
against it etc. There's additionally a `GetStorageAt` that follows the
same parameter. I'd be happy to submit a PR modifying this one as well
if desired.
Also, as a small note, there is test code for checking storage proofs,
but, storage proofs aren't currently supported by the implementation. My
hope is to add storage proofs and historic proofs in a followup PR.
---------
Co-authored-by: Jason Yellick <jason@enya.ai>
I was walking the code to try to understand in a bit more detail how the
state root hash is constructed and stumbled across
`retain_list_builder.go` as a consumer of the retain list APIs. But, as
far as I can tell, this file doesn't seem to actually be used anywhere
(including tests), and, it's seen no development (other than import
fixes) since 2020 or so. All linting and tests still pass for me locally
without it, so, I believe it's safe to simply remove.
Co-authored-by: Jason Yellick <jason@enya.ai>
This is a partial implementation of eth_getProof (see issue #1349),
supporting only a request for the latest block and an empty list of
storage keys (i.e. Account proof only). I don't know if there's a better
way of implementing this, but this was what I could come up with.
Posting it here in case it's useful.
Example output:
```
> eth.getProof("0x67b1d87101671b127f5f8714789C7192f7ad340e", [], 'latest')
{
accountProof: ["0xf90131a0252c9d4ed347b4cf3fdccaea3ccef0a507e6bd4dbe4dcd98609b7195347c4062a0ab8cdb808c8303bb61fb48e276217be9770fa83ecf3f90f2234d558885f5abf18080a01a697e814758281972fcd13bc9707dbcd2f195986b05463d7b78426508445a04a0b5d7a91be5ee273cce27e2ad9a160d2faadd5a6ba518d384019b68728a4f62f4a0c2c799b60a0cd6acd42c1015512872e86c186bcf196e85061e76842f3b7cf86080a0e73919d9f472eec11f6da95518503f5527a98b9428f7a02c4f55bf51854214e480a06301b39b2ea8a44df8b0356120db64b788e71f52e1d7a6309d0d2e5b86fee7cb8080a01b7779e149cadf24d4ffb77ca7e11314b8db7097e4d70b2a173493153ca2e5a0a066a7662811491b3d352e969506b420d269e8b51a224f574b3b38b3463f43f0098080", "0xf8518080808080a0a00135c9ec2655cb6a47ab7ad27d6fc150d9cba8b3d4a702e879179116a68a60808080808080a02fb46956347985b9870156b5747712899d213b1636ad4fe553c63e33521d567a80808080", "0xf873a02056274a27dd7524955417c11ecd917251cc7c4c8310f4c7e4bd3c304d3d9a79b850f84e808a021e19e0c9bab2400000a056e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421a0c5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470"],
address: "0x67b1d87101671b127f5f8714789c7192f7ad340e",
balance: "0x21e19e0c9bab2400000",
codeHash: "0xc5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470",
nonce: "0x0",
storageHash: "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
storageProof: []
}
> eth.getBlock('latest').stateRoot
"0x6a0673c691edfa4c4528323986bb43c579316f436ff6f8b4ac70854bbd95340b"
```
That's an initial PR mostly for code review, not ready for production
use
Got basic GraphQL working when querying a single block
---------
Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
No change to `go.mod` since `erigon-snapshot` has already been updated
in `devel`.
I'll keep my Chiado node up to serve the snapshots for a while, until
there are more erigon nodes in the network.
Resolves#6307
Fees going to the gas fee recipient should be based on the actual gas
used (available in the receipt) rather than the gas limit in a
transaction. This fixes Hive test "GetPayloadV2 Block Value".
Also `engine_getPayloadBodiesByRangeV1` params should be encoded as hex
strings.
Created to handle #6758.
Just takes the parameters and passes them down the pipe. A local test
against mainnet showed the trace size varied accordingly when passing
true/false values for `onlyTopCall`.
this pr does the following:
1. adds new function to ApiImpl `GetFilterLogs` which should implement
`eth_getFilterLogs` (eth_getFilterChanges except with only logs filters)
2. changes the ID generator of rpchelper.Filters to use crypto/rand.
3. switched logs subscriptions to use the secure ID instead of number
4. changes subcription ids from an 8 byte hex string to a 16 byte
It turns out that "standard" BSC nodes based on Geth, do not propagate
new block hashes and blocks, at least towards Erigon nodes. This is a
workaround
---------
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>