Main Target: reduce RAM usage of huffman tables. If possible - improve
decompression speed. Compression speed not so important.
Experiments on 74Gb uncompressed file (bsc
012500-013000-transactions.seg)
Ram - needed just to open compressed file (Huff tables, etc...)
dec_speed - loop with `word, _ = g.Next(word[:0])`
skip_speed - loop with `g.Skip()`
```
| DictSize | Ram | file_size | dec_speed | skip_speed |
| -------- | ---- | --------- | --------- | ---------- |
| 1M | 70Mb | 35871Mb | 4m06s | 1m58s |
| 512K | 42Mb | 36496Mb | 3m49s | 1m51s |
| 256K | 21Mb | 37100Mb | 3m44s | 1m48s |
| 128K | 11Mb | 37782Mb | 3m25s | 1m44s |
| 64K | 7Mb | 38597Mb | 3m16s | 1m34s |
| 32K | 5Mb | 39626Mb | 3m0s | 1m29s |
```
Also about small sampling: skip superstrings if superstringNumber % 4 !=
0 does reduce compression ratio by 1% - checked on big BSC file and
small (1gb) goerli file.
so, I feel it's not so bad idea to use:
maxDictPatterns=64k
samplingFactor=4
Tradeoffs: sacrify 5% compression ratio to 4x compression speedup (i
think even more), 30% decompression speedup, 10x RAM reduction
Release: I will not change existing snapshots - now will focus on
releasing new block snapshots and releasing new history snapshots
(Erigon3). If have time will re-compress existing snapshots later.
* Enable unconvert linter
* Print filename and line number when linting fails
* Use same golangci-lint version in makefile as in ci
* Remove unnecessary conversions
* Remove unnecessary conversions
* dirty working equal dictionaries
* slow but working decompression
* much cleaner implementation with LRU words in dictionary with configurable condensity
* fixed comment
* removed tabs in comment line to fix lint
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
* Split History from Domain
* Add History.prune
* More on history
* Fix HistoryHistory test
* Merge history files
* Scan file test for history
* Add aggregator for erigon 2.2
* Change to generics, introduce contexts
* Delete to belong to Aggregator
* Fix lint
* Fix lint
* Fix lint
* Fix lint
* Use pointers to InvertedIndex again
* Remove prints
* Close embedded InvertedIndex
* Fix closing files
* Print
* Update ci.yml
* More printing
* Fix
* Make InvertedIndex pointer inside History
* Fix
* Update ci.yml
* Remove print
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
* Reduce allocations in domain and aggregator
* Make keys always uncompressed, values compressed only for code
* Functions to remake index
* Fix index recreation
* Test for reindex, fix
* Use uncompress vals in history
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
* dirty trie with direct reading of account/storage data from state
run with fixes
implemented trie with direct reading from state
* cleaner version without updates
* Elias fano search and merge
* Add first cut of search
* Iterator and test
* Changes in aggregator
* Elias fano bitmap
* Fix uncompress decompress
* Print
* Print
* No print
* Print
* Print
* Print
* Change to AppendBytes
* Print
* Fix NextUncompressed
* Remove print
* Fix history search
* Fix in history search
* More tracing
* More tracing
* Fix
* Print
* Print key
* More print
* Print
* No deletion for history records
* Remove print
* Fix
* Fix
* Fix test
* Fix lint
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
* Intermediate work
* Allow uncompressed words
* Fix
* Fix tests
* Add NextUncompressed, remove g.word buffer
* Code simplifications, no goroutines when workers == 1
* Fix lint|
* Add test for MatchPrefix
* Work on patricia
* Beginning of new matcher
* Fuzz test for new longest match
* No skip
* Fixes
* Fixes
* More tracing
* Fixes
* Fixes
* Change back to old FindLongestMatches
* Switch to old match finder
* Print mismatches
* Fix
* After fix
* After fix
* After fix
* Print pointers
* Fixes and tests
* Print
* Print
* Print
* More tests
* Intermediate
* Fix
* Fix
* Prints
* Fix
* Fix
* Initialise matchStack
* Compute only once
* Compute only once
* Switch back
* Switch to old Find
* Introduce sais
* Switch patricia to sais
* Use sais in compressor
* Use sais in compressor
* Remove unused code
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
* Revert "unnecessary includes removed"
This reverts commit 76406bb78b.
* Revert "local dev setup"
This reverts commit ac06fd9400.
* Revert "compress/cgo-addition"
This reverts commit fae7683d46, reversing
changes made to e3e108c6c4.