erigon-pulse/cmd/downloader/readme.md

117 lines
4.5 KiB
Markdown
Raw Normal View History

2022-04-21 07:42:07 +00:00
# Downloader
2021-12-25 08:32:51 +00:00
2022-04-21 07:42:07 +00:00
Service to seed/download historical data (snapshots, immutable .seg files) by Bittorrent protocol
2021-12-25 08:32:51 +00:00
## How to Start Erigon in snapshot sync mode
2022-01-31 07:04:16 +00:00
2022-04-21 07:42:07 +00:00
As many other Erigon components (txpool, sentry, rpc daemon) it may be built-into Erigon or run as separated process.
```shell
# 1. Downloader by default run inside Erigon, by `--syncmode=snap` flag:
erigon --syncmode=snap --datadir=<your_datadir>
2022-02-10 01:33:05 +00:00
```
```shell
# 2. It's possible to start Downloader as independent process, by `--syncmode=snap --downloader.api.addr=127.0.0.1:9093` flags:
make erigon downloader
2022-04-21 07:42:07 +00:00
# Start downloader (can limit network usage by 512mb/sec: --torrent.download.rate=512mb --torrent.upload.rate=512mb)
downloader --downloader.api.addr=127.0.0.1:9093 --torrent.port=42068 --datadir=<your_datadir>
# --downloader.api.addr - is for internal communication with Erigon
# --torrent.port=42068 - is for public BitTorrent protocol listen
# Erigon on startup does send list of .torrent files to Downloader and wait for 100% download accomplishment
erigon --syncmode=snap --downloader.api.addr=127.0.0.1:9093 --datadir=<your_datadir>
```
Use `--snap.keepblocks=true` to don't delete retired blocks from DB
Any network/chain can start with snapshot sync:
- node will download only snapshots registered in next repo https://github.com/ledgerwatch/erigon-snapshot
- node will move old blocks from DB to snapshots of 1K blocks size, then merge snapshots to bigger range, until
snapshots of 500K blocks, then automatically start seeding new snapshot
Flag `--syncmode=snap` is compatible with `--prune` flag
## How to create new network or bootnode
```shell
# Need create new snapshots and start seeding them
# Create new snapshots (can change snapshot size by: --from=0 --to=1_000_000 --segment.size=500_000)
# It will dump blocks from Database to .seg files:
erigon snapshots create --datadir=<your_datadir>
# Create .torrent files (Downloader will seed automatically all .torrent files)
# output format is compatible with https://github.com/ledgerwatch/erigon-snapshot
downloader torrent_hashes --rebuild --datadir=<your_datadir>
# Start downloader (seeds automatically)
downloader --downloader.api.addr=127.0.0.1:9093 --datadir=<your_datadir>
2022-05-18 06:50:28 +00:00
# Erigon is not required for snapshots seeding. But Erigon with --syncmode=snap also does seeding.
```
Additional info:
```shell
# Snapshots creation does not require fully-synced Erigon - few first stages enough. For example:
STOP_BEFORE_STAGE=Execution ./build/bin/erigon --syncmode=full --datadir=<your_datadir>
# But for security - better have fully-synced Erigon
2022-04-21 07:42:07 +00:00
2022-05-18 06:50:28 +00:00
# Erigon can use snapshots only after indexing them. Erigon will automatically index them but also can run (this step is not required for seeding):
2022-04-21 07:42:07 +00:00
erigon snapshots index --datadir=<your_datadir>
```
2021-12-25 08:32:51 +00:00
## Architecture
Downloader works based on <your_datadir>/snapshots/*.torrent files. Such files can be created 4 ways:
2021-12-31 05:09:11 +00:00
- Erigon can do grpc call downloader.Download(list_of_hashes), it will trigger creation of .torrent files
- Erigon can create new .seg file, Downloader will scan .seg file and create .torrent
- operator can manually copy .torrent files (rsync from other server or restore from backup)
- operator can manually copy .seg file, Downloader will scan .seg file and create .torrent
2021-12-25 08:32:51 +00:00
Erigon does:
- connect to Downloader
- share list of hashes (see https://github.com/ledgerwatch/erigon-snapshot )
- wait for download of all snapshots
- when .seg available - automatically create .idx files - secondary indices, for example to find block by hash
2021-12-25 08:32:51 +00:00
- then switch to normal staged sync (which doesn't require connection to Downloader)
Downloader does:
2021-12-31 05:09:11 +00:00
- Read .torrent files, download everything described by .torrent files
- Use https://github.com/ngosang/trackerslist see [./trackers/embed.go](./trackers/embed.go)
2021-12-25 08:32:51 +00:00
- automatically seeding
Technical details:
- To prevent attack - .idx creation using random Seed - all nodes will have different .idx file (and same .seg files)
## How to verify that .seg files have same checksum withch current .torrent files
```
# Use it if you see weird behavior, bugs, bans, hardware issues, etc...
downloader torrent_hashes --verify --datadir=<your_datadir>
```
2022-03-21 02:39:48 +00:00
## Faster rsync
```
rsync -aP --delete -e "ssh -T -o Compression=no -x" <src> <dst>
2022-03-30 02:54:07 +00:00
```
## Release details
Start automatic commit of new hashes to branch `master`
```
crontab -e
2022-03-30 02:55:22 +00:00
@hourly cd <erigon_source_dir> && ./cmd/downloader/torrent_hashes_update.sh <your_datadir> <network_name> 1>&2 2>> ~/erigon_cron.log
2022-03-30 02:54:07 +00:00
```
It does push to branch `auto`, before release - merge `auto` to `main` manually