erigon-pulse/state
Alex Sharov b683ed435c
Compress params change (#651)
Main Target: reduce RAM usage of huffman tables. If possible - improve
decompression speed. Compression speed not so important.

Experiments on 74Gb uncompressed file (bsc
012500-013000-transactions.seg)
Ram - needed just to open compressed file (Huff tables, etc...)
dec_speed - loop with `word, _ = g.Next(word[:0])`
skip_speed - loop with `g.Skip()` 
```
| DictSize | Ram  | file_size | dec_speed | skip_speed |
| -------- | ---- | --------- | --------- | ---------- |
| 1M       | 70Mb | 35871Mb   | 4m06s     | 1m58s      |
| 512K     | 42Mb | 36496Mb   | 3m49s     | 1m51s      |
| 256K     | 21Mb | 37100Mb   | 3m44s     | 1m48s      |
| 128K     | 11Mb | 37782Mb   | 3m25s     | 1m44s      |
| 64K      | 7Mb  | 38597Mb   | 3m16s     | 1m34s      |
| 32K      | 5Mb  | 39626Mb   | 3m0s      | 1m29s      |
```
 
Also about small sampling: skip superstrings if superstringNumber % 4 !=
0 does reduce compression ratio by 1% - checked on big BSC file and
small (1gb) goerli file.

so, I feel it's not so bad idea to use:
maxDictPatterns=64k
samplingFactor=4

Tradeoffs: sacrify 5% compression ratio to 4x compression speedup (i
think even more), 30% decompression speedup, 10x RAM reduction

Release: I will not change existing snapshots - now will focus on
releasing new block snapshots and releasing new history snapshots
(Erigon3). If have time will re-compress existing snapshots later.
2022-10-05 17:54:48 +07:00
..
aggregator22.go Compress params change (#651) 2022-10-05 17:54:48 +07:00
aggregator_test.go E3 agg commitment (#647) 2022-09-26 15:59:24 +01:00
aggregator.go erigon3: allow set workers amount for history compress and merge #657 2022-09-28 14:31:28 +07:00
domain_test.go E3 agg commitment (#647) 2022-09-26 15:59:24 +01:00
domain.go Compress params change (#651) 2022-10-05 17:54:48 +07:00
history_test.go erigon3: step toward background snapshots build #663 2022-10-02 10:03:49 +07:00
history.go Compress params change (#651) 2022-10-05 17:54:48 +07:00
inverted_index_test.go WithTablessCfg -> WithTableCfg (#601) 2022-08-24 11:02:47 +02:00
inverted_index.go agg22 madv helpers (#668) 2022-10-05 13:17:23 +07:00
merge_test.go reverted minHeap at elias-fano merge (#655) 2022-09-27 11:54:29 +01:00
merge.go erigon3: allow set workers amount for history compress and merge #657 2022-09-28 14:31:28 +07:00
read_indices.go Split aggregator to 2.2 and 2.3 versions (#539) 2022-07-23 09:06:52 +01:00
state_recon.go eliasfano32.Max() method on serialized bytes (#664) 2022-10-04 10:51:34 +01:00