Commit Graph

94 Commits

Author SHA1 Message Date
Alex Sharov
0eab2a3dd1
e3: prevent files ranges overlap (kill -9 during merge handle) (#674) 2022-10-12 10:18:51 +07:00
Alex Sharov
b683ed435c
Compress params change (#651)
Main Target: reduce RAM usage of huffman tables. If possible - improve
decompression speed. Compression speed not so important.

Experiments on 74Gb uncompressed file (bsc
012500-013000-transactions.seg)
Ram - needed just to open compressed file (Huff tables, etc...)
dec_speed - loop with `word, _ = g.Next(word[:0])`
skip_speed - loop with `g.Skip()` 
```
| DictSize | Ram  | file_size | dec_speed | skip_speed |
| -------- | ---- | --------- | --------- | ---------- |
| 1M       | 70Mb | 35871Mb   | 4m06s     | 1m58s      |
| 512K     | 42Mb | 36496Mb   | 3m49s     | 1m51s      |
| 256K     | 21Mb | 37100Mb   | 3m44s     | 1m48s      |
| 128K     | 11Mb | 37782Mb   | 3m25s     | 1m44s      |
| 64K      | 7Mb  | 38597Mb   | 3m16s     | 1m34s      |
| 32K      | 5Mb  | 39626Mb   | 3m0s      | 1m29s      |
```
 
Also about small sampling: skip superstrings if superstringNumber % 4 !=
0 does reduce compression ratio by 1% - checked on big BSC file and
small (1gb) goerli file.

so, I feel it's not so bad idea to use:
maxDictPatterns=64k
samplingFactor=4

Tradeoffs: sacrify 5% compression ratio to 4x compression speedup (i
think even more), 30% decompression speedup, 10x RAM reduction

Release: I will not change existing snapshots - now will focus on
releasing new block snapshots and releasing new history snapshots
(Erigon3). If have time will re-compress existing snapshots later.
2022-10-05 17:54:48 +07:00
Alex Sharov
0d6bc2eca4
Madv helpers (#667) 2022-10-04 10:51:51 +01:00
Alex Sharov
784b6cc904
erigon3: build .vi after downloading (#659) 2022-09-29 12:14:45 +07:00
Alex Sharov
ad0e8d47e9
remove sequential compressor #648 2022-09-22 13:59:22 +07:00
Alex Sharov
f05cd214bd
aggregator22: read dir without idx (#638) 2022-09-18 17:38:43 +07:00
Håvard Anda Estensen
f418be8e50
Enable unconvert linter (#609)
* Enable unconvert linter

* Print filename and line number when linting fails

* Use same golangci-lint version in makefile as in ci

* Remove unnecessary conversions

* Remove unnecessary conversions
2022-08-30 09:50:23 +07:00
Håvard Anda Estensen
3b0c5f75f8
Enable prealloc linter (#607) 2022-08-29 11:07:53 +07:00
Alex Sharov
c7cf5b6530
clean (#599) 2022-08-22 15:56:18 +07:00
Alex Sharov
09dba54e27
Compress: limit patternMaxDepth (#598)
* save

* save

* save

* save

* save

* save
2022-08-22 13:04:01 +07:00
Artem Tsebrovskiy
db7322ef87
compress: implemented consensed huffman pattern tables (#536)
* dirty working equal dictionaries

* slow but working decompression

* much cleaner implementation with LRU words in dictionary with configurable condensity

* fixed comment

* removed tabs in comment line to fix lint

Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
2022-08-22 09:11:56 +07:00
Alex Sharov
71d14b3d85
enable some linters (#577) 2022-08-10 19:08:09 +07:00
Alex Sharov
1e029ac6d8
go1.19 gofmt (#576)
* save

* save
2022-08-10 19:00:19 +07:00
Alex Sharov
127d1bac5b
decompress: catch maxDepth underflow 2022-08-01 12:37:10 +07:00
Håvard Anda Estensen
ad2344a6cc
Replace ioutil with io and os (#560) 2022-08-01 11:03:48 +07:00
ledgerwatch
fadc9b21d1
[erigon2.2] Split 2.2 and 2.3 prototype (#548)
* Introduce access functions to history

* Add missing functions

* Add missing functions

* Add missing functions

* Changeover in the aggregator

* Intermediate

* Fix domain tests

* Fix lint

* Fix lint

* Fix lint

* Close files

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-07-28 08:47:13 +01:00
ledgerwatch
596d10ea2e
Split aggregator to 2.2 and 2.3 versions (#539)
* Split History from Domain

* Add History.prune

* More on history

* Fix HistoryHistory test

* Merge history files

* Scan file test for history

* Add aggregator for erigon 2.2

* Change to generics, introduce contexts

* Delete to belong to Aggregator

* Fix lint

* Fix lint

* Fix lint

* Fix lint

* Use pointers to InvertedIndex again

* Remove prints

* Close embedded InvertedIndex

* Fix closing files

* Print

* Update ci.yml

* More printing

* Fix

* Make InvertedIndex pointer inside History

* Fix

* Update ci.yml

* Remove print

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-07-23 09:06:52 +01:00
Alex Sharov
f23061eed9
compressor: generic sort (#524) 2022-07-18 17:12:39 +07:00
Alex Sharov
e824fdff60
remove fuzzbeta build tag, because now go1.18 is minimum requirement (#428) 2022-07-03 14:38:53 +06:00
ledgerwatch
707a89842d
Add function to get history without state (#501)
* Add function to get history without state

* Add recon functions

* Expose endMinimax

* Recon prints

* Add NoState access methods

* MaxTxNum functions

* MaxTxNum functions

* MaxTxNum functions

* MaxTxNum functions

* History iterator

* Iterator

* history iterators to aggregator

* Print

* Fix

* Fix

* Fix

* Fix

* Fix

* Fix

* Print

* Print

* Print

* Fix

* Fix

* Fix

* Fix

* Fix

* Print

* Print

* Print

* Print

* Print

* Add stats

* Remove time measurement

* Contexts for thread safety

* Partial iterators

* Fix

* Fix

* Not use SkipUncompressed

* Print

* Print

* Pass empty vals

* Parallel bitmap collection

* Print

* ReconTx iterator

* ReconTx iterator

* ReconTx iterator

* ReconTx iterator

* Print

* Print

* Remove print

* Print

* Print

* Print

* Print

* Print

* Print

* Dedicated getter for Iterate

* For for storage 0

* Remove print

* do not perform unnecessary changes

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-07-02 19:38:34 +01:00
Alex Sharov
ceafdded8f
Compress: reduce etl buffers to save RAM (#502) 2022-06-25 19:39:36 +06:00
ledgerwatch
df49481ddc
[erigon 2.2] Make keys always uncompressed, values compressed only for code (#492)
* Reduce allocations in domain and aggregator

* Make keys always uncompressed, values compressed only for code

* Functions to remake index

* Fix index recreation

* Test for reindex, fix

* Use uncompress vals in history

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-06-17 12:39:49 +01:00
Artem Tsebrovskiy
f8bdadf3e0
HPH with direct reading from state by plainKey (#472)
* dirty trie with direct reading of account/storage data from state

run with fixes

implemented trie with direct reading from state

* cleaner version without updates
2022-06-09 13:46:11 +01:00
Alex Sharov
fdf7c6598b
compress.Count() method (#478) 2022-06-03 12:14:58 +07:00
Artem Tsebrovskiy
49e3522a05
added print of decompressed file at panic (#468)
* added print of decompressed file at panic

* more info for recovered decompressing
2022-05-27 08:20:53 +07:00
Artem Tsebrovskiy
6de4ac4ba9
reduced memory footprint on building huffman table (#459) 2022-05-20 11:23:05 +07:00
Alex Sharov
7908982ed9
MatchPrefix: limit 2nd loop iterations (#458)
* sf

* sf

* save
2022-05-19 12:27:36 +07:00
Alex Sharov
a8ce14e8cc
option to disable runtime.ReadMemStats (#457)
* save

* save

* save

* save
2022-05-19 11:46:55 +07:00
Alex Sharov
e304418d5a
MatchPrefix: working version (#456) 2022-05-18 14:36:01 +07:00
Alex Sharov
b4776607dc
MatchPrefix: don't compare if prefix longer than word (#455)
* save

* save

* save

* save

* save

* fd
2022-05-18 10:29:19 +07:00
Artem Tsebrovskiy
6d2181968a
reduce memory footprint during decompression (#452) 2022-05-17 12:38:48 +07:00
Alex Sharov
a86660187d
Test: support of nil value for prefixMatch (#450)
* save

* save

* save

* save
2022-05-16 20:59:29 +01:00
Alex Sharov
91f7d84e60
Generic sort of slices (no allocs, inlinable) (#449)
* save

* save
2022-05-16 08:23:43 +01:00
Alex Sharov
d882a11c67
up linter version (#443)
* save

* save

* save

* save
2022-05-10 10:14:02 +07:00
ledgerwatch
dd3e7fd537
Update decompress.go (#439) 2022-05-06 14:55:11 +01:00
Artem Tsebrovskiy
abd93fe9c9
implement bin_patricia_hashed trie (#430)
* commitment: implemented semi-working bin patricia trie

* commitment: added initialize function to select commitment implementation

* deleted reference implementation of binary trie

* added branch merge function selection in accordance with current commitment type

* smarter branch prefix convolution to reduce disk usage

* implemented DELETE update

* commitment/bin-trie: fixed merge processing and storage encoding

* added changed hex to bin patricia trie

* fixed trie variant select

* allocate if bufPos larger than buf size

* added tracing code

* Fix lint

* Skip test

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-05-05 13:08:58 +01:00
Alex Sharov
04337fd090
Compress: reduce maxlen to 512 (#416) 2022-04-17 07:59:29 +07:00
ledgerwatch
f18e05186d
Compact huffman representation in files (#414)
* More compact huffman represenation

* Intermediate

* Intermediate

* fix

* Fix lint

* Fix lint

* Fix lint

* Change min file size

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-04-13 12:55:15 +01:00
Alex Sharov
75b64f01a3
compressor: log lvl #408 2022-04-01 10:44:25 +07:00
Alex Sharov
83951a1d62
Enable more linters (#381) 2022-03-19 11:38:37 +07:00
ledgerwatch
f93ea948d0
[erigon2] Optimise Huffman decoder (#374)
* Update

* Intermediate

* Huffman decoding

* Fix lint

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-03-18 09:10:18 +00:00
ledgerwatch
77eb94b53e
Elias fano search and merge (#357)
* Elias fano search and merge

* Add first cut of search

* Iterator and test

* Changes in aggregator

* Elias fano bitmap

* Fix uncompress decompress

* Print

* Print

* No print

* Print

* Print

* Print

* Change to AppendBytes

* Print

* Fix NextUncompressed

* Remove print

* Fix history search

* Fix in history search

* More tracing

* More tracing

* Fix

* Print

* Print key

* More print

* Print

* No deletion for history records

* Remove print

* Fix

* Fix

* Fix test

* Fix lint

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-03-13 22:46:17 +00:00
Alex Sharov
c1f1365f92
cancel compress (#362) 2022-03-12 16:34:58 +07:00
Alex Sharov
c0fcdabf91
compress: less allocs (#361) 2022-03-12 15:33:01 +07:00
Alex Sharov
6512e3c941
add emptyWordsCount field to .seg file header (breaking .seg format) (#355)
* up torrent

* save

* save

* save

* save

* save

* save

* save
2022-03-10 07:48:37 +00:00
ledgerwatch
75b52ac25e
[compress] Allow uncompressed words (#350)
* Intermediate work

* Allow uncompressed words

* Fix

* Fix tests

* Add NextUncompressed, remove g.word buffer

* Code simplifications, no goroutines when workers == 1

* Fix lint|

* Add test for MatchPrefix

* Work on patricia

* Beginning of new matcher

* Fuzz test for new longest match

* No skip

* Fixes

* Fixes

* More tracing

* Fixes

* Fixes

* Change back to old FindLongestMatches

* Switch to old match finder

* Print mismatches

* Fix

* After fix

* After fix

* After fix

* Print pointers

* Fixes and tests

* Print

* Print

* Print

* More tests

* Intermediate

* Fix

* Fix

* Prints

* Fix

* Fix

* Initialise matchStack

* Compute only once

* Compute only once

* Switch back

* Switch to old Find

* Introduce sais

* Switch patricia to sais

* Use sais in compressor

* Use sais in compressor

* Remove unused code

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-03-09 17:25:22 +00:00
racytech
7763945374
Reverted 3 last commits (#348)
* Revert "unnecessary includes removed"

This reverts commit 76406bb78b.

* Revert "local dev setup"

This reverts commit ac06fd9400.

* Revert "compress/cgo-addition"

This reverts commit fae7683d46, reversing
changes made to e3e108c6c4.
2022-02-24 14:39:42 +00:00
Kairat Abylkasymov
76406bb78b unnecessary includes removed 2022-02-24 06:21:25 -05:00
Kairat Abylkasymov
ac06fd9400 local dev setup 2022-02-24 06:15:14 -05:00
Alex Sharov
3205770ee0
snapshots: fix test (#346) 2022-02-24 08:35:13 +07:00