Commit Graph

40 Commits

Author SHA1 Message Date
Alex Sharov
b995bd7540
recsplit: configurable etl limit #474 2022-05-30 09:06:11 +07:00
Alex Sharov
b1dc1bfbbf
run go fix ./... (#453) 2022-05-17 14:48:16 +07:00
Alex Sharov
8c4a3df7f1
etl: collector log level (#385) 2022-03-21 11:22:17 +07:00
Alex Sharov
83951a1d62
Enable more linters (#381) 2022-03-19 11:38:37 +07:00
ledgerwatch
77eb94b53e
Elias fano search and merge (#357)
* Elias fano search and merge

* Add first cut of search

* Iterator and test

* Changes in aggregator

* Elias fano bitmap

* Fix uncompress decompress

* Print

* Print

* No print

* Print

* Print

* Print

* Change to AppendBytes

* Print

* Fix NextUncompressed

* Remove print

* Fix history search

* Fix in history search

* More tracing

* More tracing

* Fix

* Print

* Print key

* More print

* Print

* No deletion for history records

* Remove print

* Fix

* Fix

* Fix test

* Fix lint

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-03-13 22:46:17 +00:00
Alex Sharov
6512e3c941
add emptyWordsCount field to .seg file header (breaking .seg format) (#355)
* up torrent

* save

* save

* save

* save

* save

* save

* save
2022-03-10 07:48:37 +00:00
ledgerwatch
c71ac02a0f
[erigon2] Optimisations in etl collector and compressor (#339)
* Optimisations in etl collector and compressor

* Not copy k and v in the collector

* Fix lint

* Optimisations

* Change Load1 back to Load

* Reduce allocations for tests

* preallocate inv

* counting hits and misses

* Try to fix

* Try to fix

* Relaxation 1

* Relaxation 2

* Add arch tables

* Fix

* Update arch tables and use them

* Not to override larger value

* Increase arch table size

* Increase arch table size

* Fixes to arch

* Print

* Off by one

* Print

* Fix

* Remove print

* Perform update of arch in the background

* Build up huffman tree

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-02-20 22:14:06 +00:00
ledgerwatch
1c6e82c2b6
[erigon2] Thin commitment (2nd attempt) (#329)
* Another fix for history files

* Half way through

* Another fix

* Correct closing sequence

* Remove first byte insert marker

* More on think commitments

* Fixes

* Fixes

* Print

* Skip touchMap

* Merge branchData from trees and from files

* Fill branch commitment

* Fill branch commitment

* Print

* Fix?

* Merge branchData when updating in the tree

* Better panic

* Prints

* Prints

* Prints

* Create complete branch data if it did not exist before

* Cleanup printing

* Fix merge

* Fix merge use

* Fix transform

* Better startBlock panic

* Preserve touchMap

* Merge commitments during aggregation

* Merge commitments during aggregation

* Merge commitments during aggregation

* Merge commitments during aggregation

* Merge commitments

* Merge commitments

* Merge commitments

* Merge commitments

* Merge commitments

* Merge commitments

* Merge commitments

* Include fieldbits during transform

* Fix history reads

* Print

* Print

* Prints

* Prints

* Prints

* Prints

* Prints

* Prints

* Prints

* Prints

* Prints

* Prints

* Prints

* Print largestMerge, lock files in branchFn

* Add storage lock

* Prints

* prefixLen fix

* prefixLen fix

* Fixes

* Remove print

* Remove print

* Set changesets and commitments flags upfront

* Logging instead of printing

* Fix history merge, recsplit panic

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-02-16 16:44:00 +00:00
Alex Sharov
79aa17d297
Recsplit: use crypto rand seed if user not set (#325) 2022-02-13 16:14:04 +07:00
Alex Sharov
6f85066c7e
path -> filepath (path package is for urls) (#321) 2022-02-12 20:11:30 +07:00
ledgerwatch
441a4c3cde
[erigon2] Chain history and bitmap indices - part 2 (#308)
* corretly shutdown history goroutine

* Different final merge for history files

* Skip value

* Bitmap production bug

* Bitmap production bug

* Bitmap production bug

* Bitmap production bug

* Bitmap production bug

* Bitmap production bug

* Bitmap production bug

* Bitmap production bug

* Handle collision

* Handle collision

* Debug

* Debug

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-02-10 15:30:55 +00:00
Alex Sharov
3c4f8a759c
recsplit: reset offset collector, etl: faster flush and load (#310) 2022-02-10 14:40:24 +07:00
Alex Sharov
e649f7ea91
Less alloc etl recsplit (#307)
* less allocs recsplit

* save

* save
2022-02-09 13:22:45 +07:00
Alex Sharov
ec354d1615
Fuzz fixes (#295)
* fuzz tests fixes

* fuzz tests fixes
2022-02-02 09:18:04 +00:00
ledgerwatch
4e8840256e
[erigon2] Use shorter references instead of full plain keys in the commitment files (#289)
* Rearrange aggregations

* More rearranging before introducing 3 threads

* Background aggregation|

* Concurrency fixes

* Remove files under lock

* Better logging

* Remove files without lock

* Fix lint

* Fix locking

* Try

* Fix background Merge

* Log merging

* Log merging

* Less logging

* Millisecond

* Add Stats function

* Log merge only after 1m

* Wrong counting

* plain key extract and replace functions

* Insert valTransform function

* Not parse first byte

* Not parse first byte

* Fix lint

* Switch to thin state references

* Fix lint

* Fix lint

* Debug print|

* Fix decoding

* Turn off valTransform

* Not to reuse transformer

* Print

* Print

* Print

* Derive hashed keys later

* Fix

* Fix log

* Fix

* Debug

* Another fix

* Fix

* Fix

* Print

* Print

* Data race

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-01-31 22:32:00 +00:00
primal_concrete_sledge
d8a33270e8
issue/issue-249-add_index_reader (#273)
* issue/issue-249-add_index_reader

* Add licence
2022-01-24 20:39:04 +00:00
Alex Sharov
79eb27d3f1
Helper to prohibit cli flag changes (#262) 2022-01-22 10:48:22 +07:00
ledgerwatch
e5c07ec901
[erigon2] Resumable prototype (#236)
* Not warn about files that don't match at all

* Not warn about files that don't match at all

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Add trace

* Cleanup

* Cleanup

* Cleanup

* Cleanup

* Cleanup

* Cleanup

* Cleanup

* Cleanup

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-01-15 22:09:06 +00:00
alex.sharov
8a0d41693f create idx in tmpdir 2022-01-07 14:38:38 +07:00
alex.sharov
6aa0a5f08e create idx in tmpdir 2022-01-07 14:37:27 +07:00
alex.sharov
68b0fe6030 create dir automatically 2022-01-07 14:27:26 +07:00
Alex Sharov
bb3f510d16
RecSplit: store BaseDataID in .idx file (helps to navigate over non blockNum-based entries) (#180)
* save

* save

* save

* save
2021-11-21 14:52:23 +00:00
Alex Sharov
bb1d712834
Hack: dump bodies and headers (#177)
* save

* save

* save

* save
2021-11-19 22:00:55 +07:00
Alex Sharov
5b7f67deae
Snapshot naming (#163)
* save

* save

* save

* save

* save

* save
2021-11-15 14:19:56 +00:00
ledgerwatch
fd19ad8148
State aggregator (#114)
* State aggregator

* Compile fix

* More

* Add

* More

* More on aggregator

* Writes (still incorrect)

* Move table names

* More

* Start of aggregation

* Change files instead of db

* More on change files

* More

* More

* Dealing with state and change files

* More

* More

* More boilerplate

* More

* More

* Iteration over storage

* More boilerplate

* More fixes

* Insert flag

* More

* Unit test

* Add more to the test

* Expand the test a bit

* More testing

* Keep fixing the test

* More fixes to the test

* Clean up DB tables upon aggregation

* More fixes

* Remove update/insert indicator from returned values

* Add assertions

* close files before deleting

* close files before deleting

* close files before deleting

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2021-11-13 12:12:29 +00:00
Alex Sharov
d79f87a0e9
Recsplit: single offset bucket (#152) 2021-11-08 14:27:21 +07:00
alex.sharov
377bc94675 save 2021-11-08 07:59:18 +07:00
Alex Sharov
f6b0a0c969
Recsplit: collision typed error (#150) 2021-11-07 09:54:48 +07:00
Alex Sharov
3c86aa6290
ETL: use logPrefix as suffix of tmp files (#146) 2021-11-05 17:04:17 +07:00
Alex Sharov
47f8ac208a
EliasFano: fix jump calculation, fuzzing to trigger jump logic (#145) 2021-11-04 13:25:23 +07:00
Alex Sharov
b50cb37fa8
Recsplit: call ef.Build and set ef.prevOffset (#140) 2021-11-01 09:23:38 +07:00
Alex Sharov
8213f020f0
recsplit add MustOpen method (#138) 2021-10-31 09:38:10 +07:00
Alex Sharov
78e3f747f4
recsplit: bigger bufio buffer (#129) 2021-10-26 11:19:26 +07:00
Alex Sharov
ba51a5966a
etl.collector - move logPrefix to constructor (#128) 2021-10-25 09:12:00 +07:00
Alex Sharov
70c39cd195
Pool intrinsic gas check (#126) 2021-10-25 08:49:04 +07:00
ledgerwatch
967937151d
Fixes for compress, decompressor, and tests (#110)
* Fixes for compress, and first test

* Add decompressor and memory mapping

* Add decompressor and memory mapping

* Fix for windows

* Fix lint

* Fix compile for windows

* More on decompressor

* Fix lint

* Decompress

* Fix lint

* Use decompressor in tests, fixes

* Introduce Index for RecSplit

* Fix compilation on Windows

* close index file on failure

* Fixes to the tests

* Add single Elias Fano, fix recsplit fuzz test

* Fix elias fano

* Add two layer index

* Add two level index to the tests

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2021-10-16 10:43:41 +01:00
ledgerwatch
47490aa942
Optimise RecSplit (#82)
* not allocate count

* Print timings

* More time measurement

* See time with fanout=2

* Less branching?

* Revert

* Split functions for fingeprint and bucket separation

* Save indices

* Fix limits

* Use original split formula

* Revert

* uint16

* Correctly measure 2

* Less branching again?

* No time measurements

* Cleanup

* Fix lint

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2021-09-20 17:39:32 +01:00
ledgerwatch
f2549ad6ec
Integration of recsplit (#79)
* Integration of recsplit

* Add tables

* Print bucket by bucket

* Not to print all keys

* Print correct bitSize

* switch to []byte

* Optimisation

* Fix

* Fix lint

* Performance improvements

* Print bucket info

* Add tracing

* Fixed split

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2021-09-20 12:14:49 +01:00
ledgerwatch
312d43aa88
Recsplit encoding (#69)
* Recsplit encoding

* Added Golomb-Rice encoding

* More on encoding

* More

* Fix compile errors

* Fix fuzz test, add corpus

* Integrated Elias-Fano

* Fix lint

* Add select64

* More

* Add fuzz test for elias fano

* Debugging elias fano

* Fuzz test for elias fano

* More elias fano debugging

* Fix elias fano

* More fixes

* Fix to golombRiceLength

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2021-09-18 22:59:27 +01:00
ledgerwatch
6dce34ac32
Initial recsplit (#67)
* Initial recsplit

* Move licence

* Fix bucket count and key count

* Check for duplicate kwys

* More recsplit implementation

* Skeleton of recsplit, fuzz test

* Fix lint

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2021-09-13 18:31:09 +01:00