Commit Graph

10 Commits

Author SHA1 Message Date
Alex Sharov
b683ed435c
Compress params change (#651)
Main Target: reduce RAM usage of huffman tables. If possible - improve
decompression speed. Compression speed not so important.

Experiments on 74Gb uncompressed file (bsc
012500-013000-transactions.seg)
Ram - needed just to open compressed file (Huff tables, etc...)
dec_speed - loop with `word, _ = g.Next(word[:0])`
skip_speed - loop with `g.Skip()` 
```
| DictSize | Ram  | file_size | dec_speed | skip_speed |
| -------- | ---- | --------- | --------- | ---------- |
| 1M       | 70Mb | 35871Mb   | 4m06s     | 1m58s      |
| 512K     | 42Mb | 36496Mb   | 3m49s     | 1m51s      |
| 256K     | 21Mb | 37100Mb   | 3m44s     | 1m48s      |
| 128K     | 11Mb | 37782Mb   | 3m25s     | 1m44s      |
| 64K      | 7Mb  | 38597Mb   | 3m16s     | 1m34s      |
| 32K      | 5Mb  | 39626Mb   | 3m0s      | 1m29s      |
```
 
Also about small sampling: skip superstrings if superstringNumber % 4 !=
0 does reduce compression ratio by 1% - checked on big BSC file and
small (1gb) goerli file.

so, I feel it's not so bad idea to use:
maxDictPatterns=64k
samplingFactor=4

Tradeoffs: sacrify 5% compression ratio to 4x compression speedup (i
think even more), 30% decompression speedup, 10x RAM reduction

Release: I will not change existing snapshots - now will focus on
releasing new block snapshots and releasing new history snapshots
(Erigon3). If have time will re-compress existing snapshots later.
2022-10-05 17:54:48 +07:00
Alex Sharov
1e029ac6d8
go1.19 gofmt (#576)
* save

* save
2022-08-10 19:00:19 +07:00
Alex Sharov
e824fdff60
remove fuzzbeta build tag, because now go1.18 is minimum requirement (#428) 2022-07-03 14:38:53 +06:00
Alex Sharov
b1dc1bfbbf
run go fix ./... (#453) 2022-05-17 14:48:16 +07:00
Alex Sharov
83951a1d62
Enable more linters (#381) 2022-03-19 11:38:37 +07:00
Alex Sharov
c0fcdabf91
compress: less allocs (#361) 2022-03-12 15:33:01 +07:00
ledgerwatch
75b52ac25e
[compress] Allow uncompressed words (#350)
* Intermediate work

* Allow uncompressed words

* Fix

* Fix tests

* Add NextUncompressed, remove g.word buffer

* Code simplifications, no goroutines when workers == 1

* Fix lint|

* Add test for MatchPrefix

* Work on patricia

* Beginning of new matcher

* Fuzz test for new longest match

* No skip

* Fixes

* Fixes

* More tracing

* Fixes

* Fixes

* Change back to old FindLongestMatches

* Switch to old match finder

* Print mismatches

* Fix

* After fix

* After fix

* After fix

* Print pointers

* Fixes and tests

* Print

* Print

* Print

* More tests

* Intermediate

* Fix

* Fix

* Prints

* Fix

* Fix

* Initialise matchStack

* Compute only once

* Compute only once

* Switch back

* Switch to old Find

* Introduce sais

* Switch patricia to sais

* Use sais in compressor

* Use sais in compressor

* Remove unused code

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-03-09 17:25:22 +00:00
ledgerwatch
7ec016b160
Fixes in compress (#260)
* Fixes in compress

* Reuse outputFile also as uncompressed file

* Close file before renaming

* Trace

* Untrace

* Use 8 threads

* Print aggregations

* Print merge and timing

* Print merge and timing

* readonly mode for patricia

* Fix to infinite loop

* Fix file names

* Cleanup

* Cleanup

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-01-24 22:13:48 +00:00
ledgerwatch
083ee83906
Generalise patricia tree, initial compress (#103)
* Generalise patricia tree, initial compress

* Include tranform

* Generalise Insert

* More on compression

* Fix lint

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2021-10-11 18:31:49 +01:00
ledgerwatch
240f7f1212
Patricia tree (binary) as a memory efficient alternative to Aho-Corassick (#97)
* First commit

* Refactor

* Fixes

* Now with fuzz tests

* Add matching

* Fixes for empty key

* Reduce garbage during matching

* Swap state objects

* Find longest matches

* Simplify FindLongestMatches

* Simplify matches

* Switch from pointers to Match

* Use pointer for pt

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2021-10-06 15:28:56 +01:00