Integration tests CI is failing due to a port clash in devnet tests. I
believe this is because there are 2 packages of devnet integration tests
and go can run tests from separate packages in parallel (by default it
does package level parallelism). A simple fix would be to just have all
devnet integration tests in 1 package and run all these tests
sequentially within the package (ie not use t.Parallel). This PR moves
all devnet integration tests in 1 package.
`"ContextStart devnet start failed: private api: could not create
listener: listen top 127.0.0.1:10090: bind: address already in use,
addr=localhost:10090"`
![Screenshot 2024-01-10 at 13 38
37](https://github.com/ledgerwatch/erigon/assets/94537774/06bda987-45e5-46ef-9e0b-3876b3f85c01)
This PR contains 3 fixes for interaction between the Bor mining loop and
the TX pool which where causing the regular creation of blocks with zero
transactions.
* Mining/Tx pool block synchronization
The synchronization of the tx pool between the sync loop and the mining
loop has been changed so that both are triggered by the same event and
synchronized via a sync.Cond rather than a polling loop with a hard
coded loop limit. This means that mining now waits for the pool to be
updated from the previous block before it starts the mining process.
* Txpool Startup consolidated into its MainLoop
Previously the tx pool start process was dynamically triggered at
various points in the code. This has all now been moved to the start of
the main loop. This is necessary to avoid a timing hole which can leave
the mining loop hanging waiting for a previously block broadcast which
it missed due to its delay start.
* Mining listens for block broadcast to avoid duplicate mining
operations
The mining loop for bor has a recommit timer in case blocks re not
produced on time. However in the case of sprint transitions where the
seal publication is delayed this can lead to duplicate block production.
This is suppressed by introducing a `waiting` state which is exited upon
the block being broadcast from the sealing operation.
This PR fixes an infinite recursion (stack overflow) error in devnet
integration tests that surfaced in our integration CI a week or more ago
and additionally enables some integration tests that are now fixed -
`TestStateSync` & `TestCallContract`.
![Screenshot 2024-01-09 at 17 10
24](https://github.com/ledgerwatch/erigon/assets/94537774/a5a8c9c9-9f68-4084-9e08-1bf3c1601cab)
A crash on startup happens on --chain=mumbai , because I've confused
chainConfig.Bor (from type chain.Config) and config.Bor (from type
ethconfig.Config) in the setup code.
ethconfig.Config.Bor property contained bogus values, and was used only
to check its type in CreateConsensusEngine(). Its value was never read
(before PR #9117).
This change removes the property to avoid confusion and fix the crash.
Devnet network.BorStateSyncDelay was implemented using
ethconfig.Config.Bor, but it wasn't taking any effect. It should be
fixed separately in a different way.
Getting an error in one of the bor nodes in devnet when trying to run
the "state-sync" scenario:
```
[EROR] [01-03|16:55:44.179] cli.StartRpcServer error err="could not start separate Websocket RPC api at port 8546: listen tcp 127.0.0.1:8546: bind: address already in use"
```
This happens for scenarios with more than 1 node.
Digging further this regressions has happened due to this change:
https://github.com/ledgerwatch/erigon/pull/8909
This PR fixes this by updating the devnet `NodeArgs` struct to set the
corresponding `--ws.port` `arg` tag which now exists.
Hi, I made three suggestions for this section:
- "devenet" should be "devnet" (typo).
- "are currently build" should be "are currently built" (grammatical
error).
- "sptep" should be "step" (typo).
Thanks.
Mdbx now takes a logger - but this has not been pushed to all callers -
meaning it had an invalid logger
This fixes the log propagation.
It also fixed a start-up issue for http.enabled and txpool.disable
created by a previous merge
This change introduces additional processes to manage snapshot uploading
for E2 snapshots:
## erigon snapshots upload
The `snapshots uploader` command starts a version of erigon customized
for uploading snapshot files to
a remote location.
It breaks the stage execution process after the senders stage and then
uses the snapshot stage to send
uploaded headers, bodies and (in the case of polygon) bor spans and
events to snapshot files. Because
this process avoids execution in run signifigantly faster than a
standard erigon configuration.
The uploader uses rclone to send seedable (100K or 500K blocks) to a
remote storage location specified
in the rclone config file.
The **uploader** is configured to minimize disk usage by doing the
following:
* It removes snapshots once they are loaded
* It aggressively prunes the database once entities are transferred to
snapshots
in addition to this it has the following performance related features:
* maximizes the workers allocated to snapshot processing to improve
throughput
* Can be started from scratch by downloading the latest snapshots from
the remote location to seed processing
## snapshots command
Is a stand alone command for managing remote snapshots it has the
following sub commands
* **cmp** - compare snapshots
* **copy** - copy snapshots
* **verify** - verify snapshots
* **manifest** - manage the manifest file in the root of remote snapshot
locations
* **torrent** - manage snapshot torrent files
- changed communication tunnel to web socket in order to connect to
remote nodes
- changed diagnostics.url flag to diagnostics.addr as now user need to
enter only address and support command will connect to it through
websocket
- changed flag debug.urls to debug.addrs in order to have ability to
change connection type between erigon and support to websocket and don't
change user API
- added auto trying to connect to connect to ws if connection with was
failed
* fix "genesis hash does not match" when dev nodes connect
The "dev" nodes need to have the same --miner.etherbase in order to
generate the same genesis ExtraData by DeveloperGenesisBlock(). Override
DevnetEtherbase global var that's used if --miner.etherbase is not
passed. (for NonBlockProducer case)
* fix missing private key for the hardcoded DevnetEtherbase
Fixes panic if SigKey is not found. Bor non-producers will use a default
`DevnetEtherbase` while Dev nodes modify it. Save hardcoded
DevnetEtherbase/DevnetSignPrivateKey into accounts so that SigKey can
recover it.
* refactor devnet.node to contain Node config
This avoids interface{} type casts and fixes an error with
Heimdall.validatorSet == nil
* add connection retries to rpcCall and Subscribe of requestGenerator
Fixes "connection refused" errors due to node not ready to handle early
RPC requests.
* fix deadlock in Heimdall.NodeStarted
* fix GetBlockByNumber
Fixes "cannot unmarshal string into Go struct field body.transactions of
type jsonrpc.RPCTransaction"
* demote "no of blocks on childchain is less than confirmations
required" to Info (#8626)
* demote "mismatched pending subpool size" to Debug (#8615)
* revert wiggle testing code
* call getEnode before NodeStarted to make sure it is ready for RPC
calls
* fix connection error detection on macOS
* use a non-default p2p port to avoid conflicts
* disable bor milestones on local heimdall
* generate node keys for static peers config
## Description
Add the possibility to change the RPC address used in `devnet` without
requiring a rebuild.
## Test
- [x] `make devnet`
- [x] `./build/bin/devnet --datadir tmp`
- [x] `./build/bin/devnet --datadir tmp --rpc.host localhost --rpc.port
8546`
Fixes and issue with Polygon validators where locally mined blocks are
broadcast with invalid header hashes because the NewBlock message
constructor was removing the ReceiptHash which contributed to the header
hash.
The results in the bor header validation code not being able to
correctly identify the signer of the header - so header validation
fails.
This also likely fixes part of the bogon-block issue which was
identified by the polygon team.
Code to support react based UI for diagnostics:
* pprof, prometheus and diagnistics rationalized to use a single router
(i.e. they can all run in the same port)
* support_cmd updated to support node routing (was only first node)
* Multi content support in router tunnel (application/octet-stream &
appliaction/json)
* Routing requests changed from using http forms to rest + query params
* REST query requests can now be made against erigon base port and
diagnostics with the same url format/params
---------
Co-authored-by: dvovk <vovk.dimon@gmail.com>
Co-authored-by: Mark Holt <mark@disributed.vision>
This request is extending the devnet functionality to more fully handle
contract processing by adding support for the following calls:
* trace_call,
* trace_transaction
* debug_accountAt,
* eth_getCode
* eth_estimateGas
* eth_gasPrice
It also contains an initial rationalization of the devnet subscription
code to use the erigon client code directly rather than using its own
intermediate subscription management.
This is used to implement a general purpose block waiter - which can be
used in any scenario step - rather than being specific to transaction
processing.
This pull also contains an end to end tested sync processor for bor and
associated support services:
* Heimdall (supports sync event transfer)
* Faucet - allows the creation and funding of arbitary test specific
accounts (cross chain)
Notes and Caveats:
* Code generation for contracts requires `--evm-version paris`. For
chains which don't support push0 for solc over 0.8.19
* The bor log processing post the application of sync events causes a
panic - this will be the subject of a seperate smaller push as it is not
devnet specific
* The bor code seems to make repeated calls for the same sync events and
also reverts requests - this needs further investigation. This is the
behaviour of the current implementation and may be required - although
it does seem to generate repeat processing - which could be avoided.
This request implements an end to end Polygon state sync in the devnet.
It does this by deploying smart contracts ont the L2 & L2 chain which
follow the polygon fx portal model with security checks removed to
simplify the code. The sync events generated are routed through a local
mock heimdal - to avoid the consensus process for testing purposes.
The commit also includes support code to help the delivery of additional
contract based scenratios.
An update to the devnet to introduce a local heimdall to facilitate
multiple validators without the need for an external process, and hence
validator registration/staking etc.
In this initial release only span generation is supported.
It has the following changes:
* Introduction of a local grpc heimdall interface
* Allocation of accounts via a devnet account generator ()
* Introduction on 'Services' for the network config
"--chain bor-devnet --bor.localheimdall" will run a 2 validator network
with a local service
"--chain bor-devnet --bor.withoutheimdall" will sun a single validator
with no heimdall service as before
---------
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro-2.local>
The fixes here fix a couple of issues related to devnet start-up
1. macos threading and syscall error return where causing multi node
start to both not wait and fail
2. On windows creating DB's with the default 2 TB mapsize causes the os
to reserve about 4GB of committed memory per DB. This may not be used -
but is reserved by the OS - so a default bor node reserves around 10GB
of storage. Starting many nodes causes the OS page file to become
exhausted.
To fix this the consensus DB's now use the node's OpenDatabase function
rather than their own, which means that the consensus DB's take notice
of the config.MdbxDBSizeLimit.
This fix leaves one 4GB committed memory allocation in the TX pool which
needs its own MapSize setting.
---------
Co-authored-by: Alex Sharp <akhounov@gmail.com>
The check in catches errors in the node start-up code and makes sure
that the network is stopped if any node fails to start cleanly, and
that5 it returns an error - so that any calling code can take
appropriate action.
Added support tunnel to the devnet cmd. In order to get this to run I
made the following changes:
* Create a public function
* Added non root logging
I have also added commentary to the readme to explain the additional
command line arguments needed to integrate with diagnostics. In summary,
if you set the --diagnostics.url the devenet will wait for diagnostic
requests rather than exiting
---------
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
This is an update to the devnet code which introduces the concept of
configurable scenarios. This replaces the previous hard coded execution
function.
The intention is that now both the network and the operations to run on
the network can be described in a data structure which is configurable
and composable.
The operating model is to create a network and then ask it to run
scenarios:
```go
network.Run(
runCtx,
scenarios.Scenario{
Name: "all",
Steps: []*scenarios.Step{
&scenarios.Step{Text: "InitSubscriptions", Args: []any{[]requests.SubMethod{requests.Methods.ETHNewHeads}}},
&scenarios.Step{Text: "PingErigonRpc"},
&scenarios.Step{Text: "CheckTxPoolContent", Args: []any{0, 0, 0}},
&scenarios.Step{Text: "SendTxWithDynamicFee", Args: []any{recipientAddress, services.DevAddress, sendValue}},
&scenarios.Step{Text: "AwaitBlocks", Args: []any{2 * time.Second}},
},
})
```
The steps here refer to step handlers which can be defined as follows:
```go
func init() {
scenarios.MustRegisterStepHandlers(
scenarios.StepHandler(GetBalance),
)
}
func GetBalance(ctx context.Context, addr string, blockNum requests.BlockNumber, checkBal uint64) {
...
```
This commit is an initial implementation of the scenario running - which
is working, but will need to be enhanced to make it more usable &
developable.
The current version of the code is working and has been tested with the
dev network, and bor withoutheimdall. There is a multi miner bor
heimdall configuration but this is yet to be tested.
Note that by default the scenario runner picks nodes at random on the
network to send transactions to. this causes the dev network to run very
slowly as it seems to take a long time to include transactions where the
nonce is incremented across nodes. It seems to take a long time for the
nonce to catch up in the transaction pool processing. This is yet to be
investigated.
This branch is intended to allow the devnet to be used for testing
multiple consents types beyond the default clique. It is initially being
used to test Bor consensus for polygon.
It also has the following refactoring:
### 1. Network configuration
The two node arg building functions miningNodeArgs and nonMiningNodeArgs
have been replaced with a configuration struct which is used to
configure:
```go
network := &node.Network{
DataDir: dataDir,
Chain: networkname.DevChainName,
//Chain: networkname.BorDevnetChainName,
Logger: logger,
BasePrivateApiAddr: "localhost:9090",
BaseRPCAddr: "localhost:8545",
Nodes: []node.NetworkNode{
&node.Miner{},
&node.NonMiner{},
},
}
```
and start multiple nodes
```go
network.Start()
```
Network start will create a network of nodes ensuring that all nodes are
configured with non clashing network ports set via command line
arguments on start-up.
### 2. Request Routing
The `RequestRouter` has been updated to take a 'target' rather than
using a static dispatcher which routes to a single node on the network.
Each node in the network has its own request generator so command and
services have more flexibility in request routing and
`ExecuteAllMethods` currently takes the `node.Network` as an argument
and can pick which node (node 0 for the moment) to send requests to.