Commit Graph

28 Commits

Author SHA1 Message Date
Alex Sharov
b995bd7540
recsplit: configurable etl limit #474 2022-05-30 09:06:11 +07:00
Alex Sharov
8c4a3df7f1
etl: collector log level (#385) 2022-03-21 11:22:17 +07:00
Alex Sharov
83951a1d62
Enable more linters (#381) 2022-03-19 11:38:37 +07:00
ledgerwatch
77eb94b53e
Elias fano search and merge (#357)
* Elias fano search and merge

* Add first cut of search

* Iterator and test

* Changes in aggregator

* Elias fano bitmap

* Fix uncompress decompress

* Print

* Print

* No print

* Print

* Print

* Print

* Change to AppendBytes

* Print

* Fix NextUncompressed

* Remove print

* Fix history search

* Fix in history search

* More tracing

* More tracing

* Fix

* Print

* Print key

* More print

* Print

* No deletion for history records

* Remove print

* Fix

* Fix

* Fix test

* Fix lint

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-03-13 22:46:17 +00:00
Alex Sharov
6512e3c941
add emptyWordsCount field to .seg file header (breaking .seg format) (#355)
* up torrent

* save

* save

* save

* save

* save

* save

* save
2022-03-10 07:48:37 +00:00
ledgerwatch
1c6e82c2b6
[erigon2] Thin commitment (2nd attempt) (#329)
* Another fix for history files

* Half way through

* Another fix

* Correct closing sequence

* Remove first byte insert marker

* More on think commitments

* Fixes

* Fixes

* Print

* Skip touchMap

* Merge branchData from trees and from files

* Fill branch commitment

* Fill branch commitment

* Print

* Fix?

* Merge branchData when updating in the tree

* Better panic

* Prints

* Prints

* Prints

* Create complete branch data if it did not exist before

* Cleanup printing

* Fix merge

* Fix merge use

* Fix transform

* Better startBlock panic

* Preserve touchMap

* Merge commitments during aggregation

* Merge commitments during aggregation

* Merge commitments during aggregation

* Merge commitments during aggregation

* Merge commitments

* Merge commitments

* Merge commitments

* Merge commitments

* Merge commitments

* Merge commitments

* Merge commitments

* Include fieldbits during transform

* Fix history reads

* Print

* Print

* Prints

* Prints

* Prints

* Prints

* Prints

* Prints

* Prints

* Prints

* Prints

* Prints

* Prints

* Print largestMerge, lock files in branchFn

* Add storage lock

* Prints

* prefixLen fix

* prefixLen fix

* Fixes

* Remove print

* Remove print

* Set changesets and commitments flags upfront

* Logging instead of printing

* Fix history merge, recsplit panic

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-02-16 16:44:00 +00:00
Alex Sharov
79aa17d297
Recsplit: use crypto rand seed if user not set (#325) 2022-02-13 16:14:04 +07:00
ledgerwatch
441a4c3cde
[erigon2] Chain history and bitmap indices - part 2 (#308)
* corretly shutdown history goroutine

* Different final merge for history files

* Skip value

* Bitmap production bug

* Bitmap production bug

* Bitmap production bug

* Bitmap production bug

* Bitmap production bug

* Bitmap production bug

* Bitmap production bug

* Bitmap production bug

* Handle collision

* Handle collision

* Debug

* Debug

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-02-10 15:30:55 +00:00
Alex Sharov
3c4f8a759c
recsplit: reset offset collector, etl: faster flush and load (#310) 2022-02-10 14:40:24 +07:00
Alex Sharov
e649f7ea91
Less alloc etl recsplit (#307)
* less allocs recsplit

* save

* save
2022-02-09 13:22:45 +07:00
Alex Sharov
79eb27d3f1
Helper to prohibit cli flag changes (#262) 2022-01-22 10:48:22 +07:00
alex.sharov
8a0d41693f create idx in tmpdir 2022-01-07 14:38:38 +07:00
alex.sharov
6aa0a5f08e create idx in tmpdir 2022-01-07 14:37:27 +07:00
alex.sharov
68b0fe6030 create dir automatically 2022-01-07 14:27:26 +07:00
Alex Sharov
bb3f510d16
RecSplit: store BaseDataID in .idx file (helps to navigate over non blockNum-based entries) (#180)
* save

* save

* save

* save
2021-11-21 14:52:23 +00:00
Alex Sharov
bb1d712834
Hack: dump bodies and headers (#177)
* save

* save

* save

* save
2021-11-19 22:00:55 +07:00
Alex Sharov
5b7f67deae
Snapshot naming (#163)
* save

* save

* save

* save

* save

* save
2021-11-15 14:19:56 +00:00
Alex Sharov
d79f87a0e9
Recsplit: single offset bucket (#152) 2021-11-08 14:27:21 +07:00
Alex Sharov
f6b0a0c969
Recsplit: collision typed error (#150) 2021-11-07 09:54:48 +07:00
Alex Sharov
3c86aa6290
ETL: use logPrefix as suffix of tmp files (#146) 2021-11-05 17:04:17 +07:00
Alex Sharov
b50cb37fa8
Recsplit: call ef.Build and set ef.prevOffset (#140) 2021-11-01 09:23:38 +07:00
Alex Sharov
78e3f747f4
recsplit: bigger bufio buffer (#129) 2021-10-26 11:19:26 +07:00
Alex Sharov
ba51a5966a
etl.collector - move logPrefix to constructor (#128) 2021-10-25 09:12:00 +07:00
ledgerwatch
967937151d
Fixes for compress, decompressor, and tests (#110)
* Fixes for compress, and first test

* Add decompressor and memory mapping

* Add decompressor and memory mapping

* Fix for windows

* Fix lint

* Fix compile for windows

* More on decompressor

* Fix lint

* Decompress

* Fix lint

* Use decompressor in tests, fixes

* Introduce Index for RecSplit

* Fix compilation on Windows

* close index file on failure

* Fixes to the tests

* Add single Elias Fano, fix recsplit fuzz test

* Fix elias fano

* Add two layer index

* Add two level index to the tests

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2021-10-16 10:43:41 +01:00
ledgerwatch
47490aa942
Optimise RecSplit (#82)
* not allocate count

* Print timings

* More time measurement

* See time with fanout=2

* Less branching?

* Revert

* Split functions for fingeprint and bucket separation

* Save indices

* Fix limits

* Use original split formula

* Revert

* uint16

* Correctly measure 2

* Less branching again?

* No time measurements

* Cleanup

* Fix lint

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2021-09-20 17:39:32 +01:00
ledgerwatch
f2549ad6ec
Integration of recsplit (#79)
* Integration of recsplit

* Add tables

* Print bucket by bucket

* Not to print all keys

* Print correct bitSize

* switch to []byte

* Optimisation

* Fix

* Fix lint

* Performance improvements

* Print bucket info

* Add tracing

* Fixed split

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2021-09-20 12:14:49 +01:00
ledgerwatch
312d43aa88
Recsplit encoding (#69)
* Recsplit encoding

* Added Golomb-Rice encoding

* More on encoding

* More

* Fix compile errors

* Fix fuzz test, add corpus

* Integrated Elias-Fano

* Fix lint

* Add select64

* More

* Add fuzz test for elias fano

* Debugging elias fano

* Fuzz test for elias fano

* More elias fano debugging

* Fix elias fano

* More fixes

* Fix to golombRiceLength

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2021-09-18 22:59:27 +01:00
ledgerwatch
6dce34ac32
Initial recsplit (#67)
* Initial recsplit

* Move licence

* Fix bucket count and key count

* Check for duplicate kwys

* More recsplit implementation

* Skeleton of recsplit, fuzz test

* Fix lint

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2021-09-13 18:31:09 +01:00