Commit Graph

71 Commits

Author SHA1 Message Date
Alex Sharov
fdf7c6598b
compress.Count() method (#478) 2022-06-03 12:14:58 +07:00
Artem Tsebrovskiy
49e3522a05
added print of decompressed file at panic (#468)
* added print of decompressed file at panic

* more info for recovered decompressing
2022-05-27 08:20:53 +07:00
Artem Tsebrovskiy
6de4ac4ba9
reduced memory footprint on building huffman table (#459) 2022-05-20 11:23:05 +07:00
Alex Sharov
7908982ed9
MatchPrefix: limit 2nd loop iterations (#458)
* sf

* sf

* save
2022-05-19 12:27:36 +07:00
Alex Sharov
a8ce14e8cc
option to disable runtime.ReadMemStats (#457)
* save

* save

* save

* save
2022-05-19 11:46:55 +07:00
Alex Sharov
e304418d5a
MatchPrefix: working version (#456) 2022-05-18 14:36:01 +07:00
Alex Sharov
b4776607dc
MatchPrefix: don't compare if prefix longer than word (#455)
* save

* save

* save

* save

* save

* fd
2022-05-18 10:29:19 +07:00
Artem Tsebrovskiy
6d2181968a
reduce memory footprint during decompression (#452) 2022-05-17 12:38:48 +07:00
Alex Sharov
a86660187d
Test: support of nil value for prefixMatch (#450)
* save

* save

* save

* save
2022-05-16 20:59:29 +01:00
Alex Sharov
91f7d84e60
Generic sort of slices (no allocs, inlinable) (#449)
* save

* save
2022-05-16 08:23:43 +01:00
Alex Sharov
d882a11c67
up linter version (#443)
* save

* save

* save

* save
2022-05-10 10:14:02 +07:00
ledgerwatch
dd3e7fd537
Update decompress.go (#439) 2022-05-06 14:55:11 +01:00
Artem Tsebrovskiy
abd93fe9c9
implement bin_patricia_hashed trie (#430)
* commitment: implemented semi-working bin patricia trie

* commitment: added initialize function to select commitment implementation

* deleted reference implementation of binary trie

* added branch merge function selection in accordance with current commitment type

* smarter branch prefix convolution to reduce disk usage

* implemented DELETE update

* commitment/bin-trie: fixed merge processing and storage encoding

* added changed hex to bin patricia trie

* fixed trie variant select

* allocate if bufPos larger than buf size

* added tracing code

* Fix lint

* Skip test

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-05-05 13:08:58 +01:00
Alex Sharov
04337fd090
Compress: reduce maxlen to 512 (#416) 2022-04-17 07:59:29 +07:00
ledgerwatch
f18e05186d
Compact huffman representation in files (#414)
* More compact huffman represenation

* Intermediate

* Intermediate

* fix

* Fix lint

* Fix lint

* Fix lint

* Change min file size

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-04-13 12:55:15 +01:00
Alex Sharov
75b64f01a3
compressor: log lvl #408 2022-04-01 10:44:25 +07:00
Alex Sharov
83951a1d62
Enable more linters (#381) 2022-03-19 11:38:37 +07:00
ledgerwatch
f93ea948d0
[erigon2] Optimise Huffman decoder (#374)
* Update

* Intermediate

* Huffman decoding

* Fix lint

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-03-18 09:10:18 +00:00
ledgerwatch
77eb94b53e
Elias fano search and merge (#357)
* Elias fano search and merge

* Add first cut of search

* Iterator and test

* Changes in aggregator

* Elias fano bitmap

* Fix uncompress decompress

* Print

* Print

* No print

* Print

* Print

* Print

* Change to AppendBytes

* Print

* Fix NextUncompressed

* Remove print

* Fix history search

* Fix in history search

* More tracing

* More tracing

* Fix

* Print

* Print key

* More print

* Print

* No deletion for history records

* Remove print

* Fix

* Fix

* Fix test

* Fix lint

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-03-13 22:46:17 +00:00
Alex Sharov
c1f1365f92
cancel compress (#362) 2022-03-12 16:34:58 +07:00
Alex Sharov
c0fcdabf91
compress: less allocs (#361) 2022-03-12 15:33:01 +07:00
Alex Sharov
6512e3c941
add emptyWordsCount field to .seg file header (breaking .seg format) (#355)
* up torrent

* save

* save

* save

* save

* save

* save

* save
2022-03-10 07:48:37 +00:00
ledgerwatch
75b52ac25e
[compress] Allow uncompressed words (#350)
* Intermediate work

* Allow uncompressed words

* Fix

* Fix tests

* Add NextUncompressed, remove g.word buffer

* Code simplifications, no goroutines when workers == 1

* Fix lint|

* Add test for MatchPrefix

* Work on patricia

* Beginning of new matcher

* Fuzz test for new longest match

* No skip

* Fixes

* Fixes

* More tracing

* Fixes

* Fixes

* Change back to old FindLongestMatches

* Switch to old match finder

* Print mismatches

* Fix

* After fix

* After fix

* After fix

* Print pointers

* Fixes and tests

* Print

* Print

* Print

* More tests

* Intermediate

* Fix

* Fix

* Prints

* Fix

* Fix

* Initialise matchStack

* Compute only once

* Compute only once

* Switch back

* Switch to old Find

* Introduce sais

* Switch patricia to sais

* Use sais in compressor

* Use sais in compressor

* Remove unused code

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-03-09 17:25:22 +00:00
racytech
7763945374
Reverted 3 last commits (#348)
* Revert "unnecessary includes removed"

This reverts commit 76406bb78b.

* Revert "local dev setup"

This reverts commit ac06fd9400.

* Revert "compress/cgo-addition"

This reverts commit fae7683d46, reversing
changes made to e3e108c6c4.
2022-02-24 14:39:42 +00:00
Kairat Abylkasymov
76406bb78b unnecessary includes removed 2022-02-24 06:21:25 -05:00
Kairat Abylkasymov
ac06fd9400 local dev setup 2022-02-24 06:15:14 -05:00
Alex Sharov
3205770ee0
snapshots: fix test (#346) 2022-02-24 08:35:13 +07:00
ledgerwatch
c71ac02a0f
[erigon2] Optimisations in etl collector and compressor (#339)
* Optimisations in etl collector and compressor

* Not copy k and v in the collector

* Fix lint

* Optimisations

* Change Load1 back to Load

* Reduce allocations for tests

* preallocate inv

* counting hits and misses

* Try to fix

* Try to fix

* Relaxation 1

* Relaxation 2

* Add arch tables

* Fix

* Update arch tables and use them

* Not to override larger value

* Increase arch table size

* Increase arch table size

* Fixes to arch

* Print

* Off by one

* Print

* Fix

* Remove print

* Perform update of arch in the background

* Build up huffman tree

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-02-20 22:14:06 +00:00
Alex Sharov
1f5a1ab9cd
fuzz cases (#328) 2022-02-14 11:53:20 +07:00
Alex Sharov
6f85066c7e
path -> filepath (path package is for urls) (#321) 2022-02-12 20:11:30 +07:00
Alex Sharov
e649f7ea91
Less alloc etl recsplit (#307)
* less allocs recsplit

* save

* save
2022-02-09 13:22:45 +07:00
Alex Sharov
567d9ddfed
ParallelCompressor: Remove intermediate ETL collectors (#302) 2022-02-04 16:48:02 +07:00
ledgerwatch
55080d5c01
Proper reset of decompressor getter (#299)
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-02-03 17:58:56 +00:00
Alex Sharov
0feb7fd591
Decompressor.WithReadAhead (#290) 2022-02-01 11:19:11 +07:00
ledgerwatch
4e8840256e
[erigon2] Use shorter references instead of full plain keys in the commitment files (#289)
* Rearrange aggregations

* More rearranging before introducing 3 threads

* Background aggregation|

* Concurrency fixes

* Remove files under lock

* Better logging

* Remove files without lock

* Fix lint

* Fix locking

* Try

* Fix background Merge

* Log merging

* Log merging

* Less logging

* Millisecond

* Add Stats function

* Log merge only after 1m

* Wrong counting

* plain key extract and replace functions

* Insert valTransform function

* Not parse first byte

* Not parse first byte

* Fix lint

* Switch to thin state references

* Fix lint

* Fix lint

* Debug print|

* Fix decoding

* Turn off valTransform

* Not to reuse transformer

* Print

* Print

* Print

* Derive hashed keys later

* Fix

* Fix log

* Fix

* Debug

* Another fix

* Fix

* Fix

* Print

* Print

* Data race

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-01-31 22:32:00 +00:00
ledgerwatch
586ab3e6b3
Separate state btree files (#287)
* Separate state file btrees, fix Match in the decompressor

* fix match

* Fix to match

* Switch back from Match

* Try to use match, close indices

* Fixing Match

* Use Skip

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-01-29 11:12:38 +00:00
Alex Sharov
dfdf7c8a66
[wip] parallel compress: less read of dat file (#284)
* save

* save

* save
2022-01-27 17:13:26 +07:00
Alex Sharov
ec11eb3d91
parallel compressor: don't save dict (#283)
* save

* save
2022-01-27 12:54:38 +07:00
ledgerwatch
7ec016b160
Fixes in compress (#260)
* Fixes in compress

* Reuse outputFile also as uncompressed file

* Close file before renaming

* Trace

* Untrace

* Use 8 threads

* Print aggregations

* Print merge and timing

* Print merge and timing

* readonly mode for patricia

* Fix to infinite loop

* Fix file names

* Cleanup

* Cleanup

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-01-24 22:13:48 +00:00
primal_concrete_sledge
d8a33270e8
issue/issue-249-add_index_reader (#273)
* issue/issue-249-add_index_reader

* Add licence
2022-01-24 20:39:04 +00:00
primal_concrete_sledge
e69a5da702
Issue 248 refinements for decompressor api (#271)
* issue/ISSUE-248-refinements_for_decompressor_api

* Fix match tedst expectations

* Remove unneeded comment
2022-01-24 09:18:08 +00:00
ledgerwatch
340195df93
Less verbose parallel compressor (#247)
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-01-18 14:20:05 +00:00
Alex Sharov
11ab5bdbb8
Parallel compressor - allow empty words (#245)
* save

* save

* Fix lint

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-01-18 13:57:35 +00:00
Alex Sharov
0f80e9941f
Switch to parallel compressor (#244) 2022-01-18 12:55:20 +07:00
Alex Sharov
7c2104e2e1
fix to no prealloc (because max size unknown) 2022-01-17 17:05:37 +07:00
Alex Sharov
51220cfe43
ParallelCompressor class, DecompressedFile class (#234)
* save

* save

* save

* remove major jump check

* remove major jump check

* log

* log

* save

* format docs

* format docs

* issue-260

* issue-260

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save
2022-01-17 08:50:42 +00:00
Alex Sharov
01a6417505
snapshots: same workers amount #233 2022-01-15 11:23:19 +07:00
Alex Sharov
1647faec37
Fix bigChunk helper (#229) 2022-01-12 10:46:26 +07:00
Alex Sharov
f92c12855d
Decompressor: fast .Count method(#226) 2022-01-09 17:32:56 +07:00
alex.sharov
a8c2481967 create huffman_codes.txt in tmpdir 2022-01-09 14:52:52 +07:00