erigon-pulse/cmd/downloader
ledgerwatch 538543ad17
Experiment files 1 by 1 (#3959)
* Experiment files 1 by 1

* Remove check

* sort preverified snapshots

* docs: docker permissions

* sort preverified snapshots

* sort preverified snapshots

* sort preverified snapshots

* sort preverified snapshots

* sort preverified snapshots

* sort preverified snapshots

* save

* Fix speed log, remove file name

* Move timer out of the loop

* Calculate total size of downloaded files

* Fixes

* Fix

* Fix

* Fix

* Move downloadData

* Fix

* Revert "Fix"

This reverts commit 038e02b8a4d23cd32ddb111e9f325fc4ce1bbe2b.

* Revert "Move downloadData"

This reverts commit 8130a4d9bdc0705082eb7fe94e2261c9313f8482.

* Revert "Fix"

This reverts commit 1dca25bd68772bc42ac710c24698c8670f9f6b86.

* Revert "Fix"

This reverts commit ee5a1e82abd47bef4f9d8f0f68b8497476d29c0b.

* Revert "Fix"

This reverts commit 8af7be71d4685e0d6115fef91ed2f304695e1df9.

* Revert "Fixes"

This reverts commit 50509af81f3721cca957cd15d0286e8f30e5097b.

* Revert "Calculate total size of downloaded files"

This reverts commit 64a26df54f6226d739c8a5b57b32ad5af07d3061.

* Remove progress

* Remove progress

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
2022-04-25 19:32:27 +01:00
..
downloader Experiment files 1 by 1 (#3959) 2022-04-25 19:32:27 +01:00
downloadergrpc Embed rpcdaemon: prepared direct clients (#3492) 2022-02-12 19:47:19 +07:00
trackers trackerslist up 2022-04-06 16:19:59 +07:00
main.go Replace ioutil with io and os (#3946) 2022-04-23 15:43:00 +01:00
readme.md Update readme.md 2022-04-21 14:42:07 +07:00
recompress.sh More compact representation of huffman trees in the seg files (#3875) 2022-04-13 16:29:44 +01:00
torrent_hashes_update.sh snapshots: auto fix 2022-04-02 13:20:26 +07:00

Downloader

Service to seed/download historical data (snapshots, immutable .seg files) by Bittorrent protocol

How to Start Erigon in snapshot sync mode

As many other Erigon components (txpool, sentry, rpc daemon) it may be built-into Erigon or run as separated process.

# 1. Downloader by default run inside Erigon, by `--syncmode=snap` flag:
erigon --syncmode=snap --datadir=<your_datadir> 
# 2. It's possible to start Downloader as independent process, by `--syncmode=snap --downloader.api.addr=127.0.0.1:9093` flags:
make erigon downloader 

# Start downloader (can limit network usage by 512mb/sec: --torrent.download.rate=512mb --torrent.upload.rate=512mb)
downloader --downloader.api.addr=127.0.0.1:9093 --torrent.port=42068 --datadir=<your_datadir>
# --downloader.api.addr - is for internal communication with Erigon
# --torrent.port=42068  - is for public BitTorrent protocol listen 

# Erigon on startup does send list of .torrent files to Downloader and wait for 100% download accomplishment
erigon --syncmode=snap --downloader.api.addr=127.0.0.1:9093 --datadir=<your_datadir> 

Use --snap.keepblocks=true to don't delete retired blocks from DB

Any network/chain can start with snapshot sync:

  • node will download only snapshots registered in next repo https://github.com/ledgerwatch/erigon-snapshot
  • node will move old blocks from DB to snapshots of 1K blocks size, then merge snapshots to bigger range, until snapshots of 500K blocks, then automatically start seeding new snapshot

Flag --syncmode=snap is compatible with --prune flag

How to create new network or bootnode

# Need create new snapshots and start seeding them
 
# Create new snapshots (can change snapshot size by: --from=0 --to=1_000_000 --segment.size=500_000)
# It will dump blocks from Database to .seg files:
erigon snapshots create --datadir=<your_datadir> 

# Create .torrent files (Downloader will seed automatically all .torrent files)
# output format is compatible with https://github.com/ledgerwatch/erigon-snapshot
downloader torrent_hashes --rebuild --datadir=<your_datadir>

# Start downloader (seeds automatically)
downloader --downloader.api.addr=127.0.0.1:9093 --datadir=<your_datadir>

# Erigon is not required for snapshots seeding 

# But Erigon can use snapshots only after indexing them (this step is not required for seeding)
erigon snapshots index --datadir=<your_datadir> 

Architecture

Downloader works based on <your_datadir>/snapshots/*.torrent files. Such files can be created 4 ways:

  • Erigon can do grpc call downloader.Download(list_of_hashes), it will trigger creation of .torrent files
  • Erigon can create new .seg file, Downloader will scan .seg file and create .torrent
  • operator can manually copy .torrent files (rsync from other server or restore from backup)
  • operator can manually copy .seg file, Downloader will scan .seg file and create .torrent

Erigon does:

  • connect to Downloader
  • share list of hashes (see https://github.com/ledgerwatch/erigon-snapshot )
  • wait for download of all snapshots
  • when .seg available - automatically create .idx files - secondary indices, for example to find block by hash
  • then switch to normal staged sync (which doesn't require connection to Downloader)

Downloader does:

Technical details:

  • To prevent attack - .idx creation using random Seed - all nodes will have different .idx file (and same .seg files)

How to verify that .seg files have same checksum withch current .torrent files

# Use it if you see weird behavior, bugs, bans, hardware issues, etc...
downloader torrent_hashes --verify --datadir=<your_datadir>

Faster rsync

rsync -aP --delete -e "ssh -T -o Compression=no -x" <src> <dst>

Release details

Start automatic commit of new hashes to branch master

crontab -e
@hourly        cd <erigon_source_dir> && ./cmd/downloader/torrent_hashes_update.sh <your_datadir> <network_name> 1>&2 2>> ~/erigon_cron.log

It does push to branch auto, before release - merge auto to main manually