Main Target: reduce RAM usage of huffman tables. If possible - improve
decompression speed. Compression speed not so important.
Experiments on 74Gb uncompressed file (bsc
012500-013000-transactions.seg)
Ram - needed just to open compressed file (Huff tables, etc...)
dec_speed - loop with `word, _ = g.Next(word[:0])`
skip_speed - loop with `g.Skip()`
```
| DictSize | Ram | file_size | dec_speed | skip_speed |
| -------- | ---- | --------- | --------- | ---------- |
| 1M | 70Mb | 35871Mb | 4m06s | 1m58s |
| 512K | 42Mb | 36496Mb | 3m49s | 1m51s |
| 256K | 21Mb | 37100Mb | 3m44s | 1m48s |
| 128K | 11Mb | 37782Mb | 3m25s | 1m44s |
| 64K | 7Mb | 38597Mb | 3m16s | 1m34s |
| 32K | 5Mb | 39626Mb | 3m0s | 1m29s |
```
Also about small sampling: skip superstrings if superstringNumber % 4 !=
0 does reduce compression ratio by 1% - checked on big BSC file and
small (1gb) goerli file.
so, I feel it's not so bad idea to use:
maxDictPatterns=64k
samplingFactor=4
Tradeoffs: sacrify 5% compression ratio to 4x compression speedup (i
think even more), 30% decompression speedup, 10x RAM reduction
Release: I will not change existing snapshots - now will focus on
releasing new block snapshots and releasing new history snapshots
(Erigon3). If have time will re-compress existing snapshots later.
* added commitment to aggregator
* added commitment evaluation by updates, fixed mainnet roothash mismatch
* added ability to change starting state of hph
* replayable erigon23 with commitment
* possible fix for eliasfano index read after close
* fixed db pruning and restart
* Initial fixes
* Debug
* clear downHashedLen for branch nodes
* Fix key length, cleanup
* Cleanup
* Cleanup
* picked aggregator updates
* fixed empty cell hash for ProcessUpdate evaluation
* hashBuffer moved from Cell to HexPatriciaHashed
* fixed codeHash incorrect renewal
* lint
* removed valuemergefn from history
* fixed lint
* fixed test
* rewritten fuzz test on hph
* fix for Win tests - do not remove tmp dir after test
* win
* fixup after merge
* close aggregator after test
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
* Remove interfaces for replacement
* Squashed 'interfaces/' content from commit 041a3b20
git-subtree-dir: interfaces
git-subtree-split: 041a3b204cceee5348d54bfa683dec6c7cf30d14
* save
* save
* Domain
* First functions
* change year
* More on domain
* More to test
* More on test
* More on domains
* buildFiles
* More on domains
* Collation test
* Fix collate
* Add test for decompressors
* Restructure history tables
* Split history into 2 tables
* Fix lint
* Check index files in the test
* Close files
* Add file scanning
* Fix lint
* Fix lint
* Add readFromFiles
* Add ef history idx file
* Start cleanup
* More to cleanup, test for ef history
* More test
* Add prune to test
* Test for prune and fix
* Start history access
* History test
* Test for LastDup
* Fix one lint
* Workaround
* History tests
* Debug
* Fix
* Fix in history
* Fix lint
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@alexs-macbook-pro.home>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alex Sharp <alexsharp@alexs-mbp.lan>