mirror of
https://gitlab.com/pulsechaincom/erigon-pulse.git
synced 2025-01-01 00:31:21 +00:00
b683ed435c
Main Target: reduce RAM usage of huffman tables. If possible - improve decompression speed. Compression speed not so important. Experiments on 74Gb uncompressed file (bsc 012500-013000-transactions.seg) Ram - needed just to open compressed file (Huff tables, etc...) dec_speed - loop with `word, _ = g.Next(word[:0])` skip_speed - loop with `g.Skip()` ``` | DictSize | Ram | file_size | dec_speed | skip_speed | | -------- | ---- | --------- | --------- | ---------- | | 1M | 70Mb | 35871Mb | 4m06s | 1m58s | | 512K | 42Mb | 36496Mb | 3m49s | 1m51s | | 256K | 21Mb | 37100Mb | 3m44s | 1m48s | | 128K | 11Mb | 37782Mb | 3m25s | 1m44s | | 64K | 7Mb | 38597Mb | 3m16s | 1m34s | | 32K | 5Mb | 39626Mb | 3m0s | 1m29s | ``` Also about small sampling: skip superstrings if superstringNumber % 4 != 0 does reduce compression ratio by 1% - checked on big BSC file and small (1gb) goerli file. so, I feel it's not so bad idea to use: maxDictPatterns=64k samplingFactor=4 Tradeoffs: sacrify 5% compression ratio to 4x compression speedup (i think even more), 30% decompression speedup, 10x RAM reduction Release: I will not change existing snapshots - now will focus on releasing new block snapshots and releasing new history snapshots (Erigon3). If have time will re-compress existing snapshots later. |
||
---|---|---|
.. | ||
compress_fuzz_test.go | ||
compress_test.go | ||
compress.go | ||
decompress_bench_test.go | ||
decompress_test.go | ||
decompress.go | ||
parallel_compress.go |