79ed8cad35
This change introduces additional processes to manage snapshot uploading for E2 snapshots: ## erigon snapshots upload The `snapshots uploader` command starts a version of erigon customized for uploading snapshot files to a remote location. It breaks the stage execution process after the senders stage and then uses the snapshot stage to send uploaded headers, bodies and (in the case of polygon) bor spans and events to snapshot files. Because this process avoids execution in run signifigantly faster than a standard erigon configuration. The uploader uses rclone to send seedable (100K or 500K blocks) to a remote storage location specified in the rclone config file. The **uploader** is configured to minimize disk usage by doing the following: * It removes snapshots once they are loaded * It aggressively prunes the database once entities are transferred to snapshots in addition to this it has the following performance related features: * maximizes the workers allocated to snapshot processing to improve throughput * Can be started from scratch by downloading the latest snapshots from the remote location to seed processing ## snapshots command Is a stand alone command for managing remote snapshots it has the following sub commands * **cmp** - compare snapshots * **copy** - copy snapshots * **verify** - verify snapshots * **manifest** - manage the manifest file in the root of remote snapshot locations * **torrent** - manage snapshot torrent files |
||
---|---|---|
.. | ||
downloadernat | ||
main.go | ||
readme.md | ||
recompress.sh | ||
torrent_hashes_update.sh |
Downloader
Service to seed/download historical data (snapshots, immutable .seg files) by Bittorrent protocol
Start Erigon with snapshots support
As many other Erigon components (txpool, sentry, rpc daemon) it may be built-into Erigon or run as separated process.
# 1. Downloader by default run inside Erigon, by `--snapshots` flag:
erigon --snapshots --datadir=<your_datadir>
# 2. It's possible to start Downloader as independent process, by `--snapshots --downloader.api.addr=127.0.0.1:9093` flags:
make erigon downloader
# Start downloader (can limit network usage by 512mb/sec: --torrent.download.rate=512mb --torrent.upload.rate=512mb)
downloader --downloader.api.addr=127.0.0.1:9093 --torrent.port=42068 --datadir=<your_datadir>
# --downloader.api.addr - is for internal communication with Erigon
# --torrent.port=42068 - is for public BitTorrent protocol listen
# Erigon on startup does send list of .torrent files to Downloader and wait for 100% download accomplishment
erigon --snapshots --downloader.api.addr=127.0.0.1:9093 --datadir=<your_datadir>
Use --snap.keepblocks=true
to don't delete retired blocks from DB
Any network/chain can start with snapshot sync:
- node will download only snapshots registered in next repo https://github.com/ledgerwatch/erigon-snapshot
- node will move old blocks from DB to snapshots of 1K blocks size, then merge snapshots to bigger range, until snapshots of 500K blocks, then automatically start seeding new snapshot
Flag --snapshots
is compatible with --prune
flag
How to create new network or bootnode
# Need create new snapshots and start seeding them
# Create new snapshots (can change snapshot size by: --from=0 --to=1_000_000 --segment.size=500_000)
# It will dump blocks from Database to .seg files:
erigon snapshots retire --datadir=<your_datadir>
# Create .torrent files (you can think about them as "checksum")
downloader torrent_create --datadir=<your_datadir>
# output format is compatible with https://github.com/ledgerwatch/erigon-snapshot
downloader torrent_hashes --datadir=<your_datadir>
# Start downloader (read all .torrent files, and download/seed data)
downloader --downloader.api.addr=127.0.0.1:9093 --datadir=<your_datadir>
Additional info:
# Snapshots creation does not require fully-synced Erigon - few first stages enough. For example:
STOP_AFTER_STAGE=Senders ./build/bin/erigon --snapshots=false --datadir=<your_datadir>
# But for security - better have fully-synced Erigon
# Erigon can use snapshots only after indexing them. Erigon will automatically index them but also can run (this step is not required for seeding):
erigon snapshots index --datadir=<your_datadir>
Architecture
Downloader works based on <your_datadir>/snapshots/*.torrent files. Such files can be created 4 ways:
- Erigon can do grpc call downloader.Download(list_of_hashes), it will trigger creation of .torrent files
- Erigon can create new .seg file, Downloader will scan .seg file and create .torrent
- operator can manually copy .torrent files (rsync from other server or restore from backup)
- operator can manually copy .seg file, Downloader will scan .seg file and create .torrent
Erigon does:
- connect to Downloader
- share list of hashes (see https://github.com/ledgerwatch/erigon-snapshot )
- wait for download of all snapshots
- when .seg available - automatically create .idx files - secondary indices, for example to find block by hash
- then switch to normal staged sync (which doesn't require connection to Downloader)
- ensure that snapshot downloading happens only once: even if new Erigon version does include new pre-verified snapshot hashes, Erigon will not download them (to avoid unpredictable downtime) - but Erigon may produce them by self.
Downloader does:
- Read .torrent files, download everything described by .torrent files
- Use https://github.com/ngosang/trackerslist see ./trackers/embed.go
- automatically seeding
Technical details:
- To prevent attack - .idx creation using random Seed - all nodes will have different .idx file (and same .seg files)
- If you add/remove any .seg file manually, also need
remove
<your_datadir>/downloader
folder
How to verify that .seg files have the same checksum as current .torrent files
# Use it if you see weird behavior, bugs, bans, hardware issues, etc...
downloader --verify --datadir=<your_datadir>
downloader --verify --verify.files=v1-1-2-transaction.seg --datadir=<your_datadir>
Create cheap seedbox
Usually Erigon's network is self-sufficient - peers automatically producing and seeding snapshots. But new network or new type of snapshots need Bootstraping step - no peers yet have this files.
Seedbox - machie which ony seeding archive files:
- Doesn't need synced erigon
- Can work on very cheap disks, cpu, ram
- It works exactly like Erigon node - downloading archive files and seed them
downloader --seedbox --datadir=<your> --chain=mainnet
Seedbox can fallback to Webseed - HTTP url to centralized infrastructure. For example: private S3 bucket with signed_urls, or any HTTP server with files. Main idea: erigon decentralized infrastructure has higher prioriity than centralized (which used as support/fallback).
# Erigon has default webseed url's - and you can create own
downloader --datadir=<your> --chain=mainnet --webseed=<webseed_url>
# See also: `downloader --help` of `--webseed` flag. There is an option to pass it by `datadir/webseed.toml` file
Utilities
downloader torrent_cat /path/to.torrent
downloader torrent_magnet /path/to.torrent
Faster rsync
rsync -aP --delete -e "ssh -T -o Compression=no -x" <src> <dst>
Release details
Start automatic commit of new hashes to branch master
crontab -e
@hourly cd <erigon_source_dir> && ./cmd/downloader/torrent_hashes_update.sh <your_datadir> <network_name> 1>&2 2>> ~/erigon_cron.log
It does push to branch auto
, before release - merge auto
to main
manually