Commit Graph

30 Commits

Author SHA1 Message Date
Alex Sharov
f23061eed9
compressor: generic sort (#524) 2022-07-18 17:12:39 +07:00
Alex Sharov
ceafdded8f
Compress: reduce etl buffers to save RAM (#502) 2022-06-25 19:39:36 +06:00
Alex Sharov
a8ce14e8cc
option to disable runtime.ReadMemStats (#457)
* save

* save

* save

* save
2022-05-19 11:46:55 +07:00
Alex Sharov
91f7d84e60
Generic sort of slices (no allocs, inlinable) (#449)
* save

* save
2022-05-16 08:23:43 +01:00
Alex Sharov
04337fd090
Compress: reduce maxlen to 512 (#416) 2022-04-17 07:59:29 +07:00
ledgerwatch
f18e05186d
Compact huffman representation in files (#414)
* More compact huffman represenation

* Intermediate

* Intermediate

* fix

* Fix lint

* Fix lint

* Fix lint

* Change min file size

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-04-13 12:55:15 +01:00
Alex Sharov
75b64f01a3
compressor: log lvl #408 2022-04-01 10:44:25 +07:00
Alex Sharov
83951a1d62
Enable more linters (#381) 2022-03-19 11:38:37 +07:00
ledgerwatch
77eb94b53e
Elias fano search and merge (#357)
* Elias fano search and merge

* Add first cut of search

* Iterator and test

* Changes in aggregator

* Elias fano bitmap

* Fix uncompress decompress

* Print

* Print

* No print

* Print

* Print

* Print

* Change to AppendBytes

* Print

* Fix NextUncompressed

* Remove print

* Fix history search

* Fix in history search

* More tracing

* More tracing

* Fix

* Print

* Print key

* More print

* Print

* No deletion for history records

* Remove print

* Fix

* Fix

* Fix test

* Fix lint

Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-03-13 22:46:17 +00:00
Alex Sharov
c1f1365f92
cancel compress (#362) 2022-03-12 16:34:58 +07:00
Alex Sharov
c0fcdabf91
compress: less allocs (#361) 2022-03-12 15:33:01 +07:00
Alex Sharov
6512e3c941
add emptyWordsCount field to .seg file header (breaking .seg format) (#355)
* up torrent

* save

* save

* save

* save

* save

* save

* save
2022-03-10 07:48:37 +00:00
ledgerwatch
75b52ac25e
[compress] Allow uncompressed words (#350)
* Intermediate work

* Allow uncompressed words

* Fix

* Fix tests

* Add NextUncompressed, remove g.word buffer

* Code simplifications, no goroutines when workers == 1

* Fix lint|

* Add test for MatchPrefix

* Work on patricia

* Beginning of new matcher

* Fuzz test for new longest match

* No skip

* Fixes

* Fixes

* More tracing

* Fixes

* Fixes

* Change back to old FindLongestMatches

* Switch to old match finder

* Print mismatches

* Fix

* After fix

* After fix

* After fix

* Print pointers

* Fixes and tests

* Print

* Print

* Print

* More tests

* Intermediate

* Fix

* Fix

* Prints

* Fix

* Fix

* Initialise matchStack

* Compute only once

* Compute only once

* Switch back

* Switch to old Find

* Introduce sais

* Switch patricia to sais

* Use sais in compressor

* Use sais in compressor

* Remove unused code

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-03-09 17:25:22 +00:00
racytech
7763945374
Reverted 3 last commits (#348)
* Revert "unnecessary includes removed"

This reverts commit 76406bb78b.

* Revert "local dev setup"

This reverts commit ac06fd9400.

* Revert "compress/cgo-addition"

This reverts commit fae7683d46, reversing
changes made to e3e108c6c4.
2022-02-24 14:39:42 +00:00
Kairat Abylkasymov
ac06fd9400 local dev setup 2022-02-24 06:15:14 -05:00
ledgerwatch
c71ac02a0f
[erigon2] Optimisations in etl collector and compressor (#339)
* Optimisations in etl collector and compressor

* Not copy k and v in the collector

* Fix lint

* Optimisations

* Change Load1 back to Load

* Reduce allocations for tests

* preallocate inv

* counting hits and misses

* Try to fix

* Try to fix

* Relaxation 1

* Relaxation 2

* Add arch tables

* Fix

* Update arch tables and use them

* Not to override larger value

* Increase arch table size

* Increase arch table size

* Fixes to arch

* Print

* Off by one

* Print

* Fix

* Remove print

* Perform update of arch in the background

* Build up huffman tree

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-02-20 22:14:06 +00:00
Alex Sharov
567d9ddfed
ParallelCompressor: Remove intermediate ETL collectors (#302) 2022-02-04 16:48:02 +07:00
Alex Sharov
ec11eb3d91
parallel compressor: don't save dict (#283)
* save

* save
2022-01-27 12:54:38 +07:00
ledgerwatch
7ec016b160
Fixes in compress (#260)
* Fixes in compress

* Reuse outputFile also as uncompressed file

* Close file before renaming

* Trace

* Untrace

* Use 8 threads

* Print aggregations

* Print merge and timing

* Print merge and timing

* readonly mode for patricia

* Fix to infinite loop

* Fix file names

* Cleanup

* Cleanup

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
Co-authored-by: Alex Sharp <alexsharp@Alexs-MacBook-Pro.local>
2022-01-24 22:13:48 +00:00
primal_concrete_sledge
d8a33270e8
issue/issue-249-add_index_reader (#273)
* issue/issue-249-add_index_reader

* Add licence
2022-01-24 20:39:04 +00:00
ledgerwatch
340195df93
Less verbose parallel compressor (#247)
Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-01-18 14:20:05 +00:00
Alex Sharov
11ab5bdbb8
Parallel compressor - allow empty words (#245)
* save

* save

* Fix lint

Co-authored-by: Alexey Sharp <alexeysharp@Alexeys-iMac.local>
2022-01-18 13:57:35 +00:00
Alex Sharov
0f80e9941f
Switch to parallel compressor (#244) 2022-01-18 12:55:20 +07:00
Alex Sharov
51220cfe43
ParallelCompressor class, DecompressedFile class (#234)
* save

* save

* save

* remove major jump check

* remove major jump check

* log

* log

* save

* format docs

* format docs

* issue-260

* issue-260

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save

* save
2022-01-17 08:50:42 +00:00
Alex Sharov
01a6417505
snapshots: same workers amount #233 2022-01-15 11:23:19 +07:00
Alex Sharov
1647faec37
Fix bigChunk helper (#229) 2022-01-12 10:46:26 +07:00
alex.sharov
a8c2481967 create huffman_codes.txt in tmpdir 2022-01-09 14:52:52 +07:00
alex.sharov
8bc0f26a49 create .seg in tmpdir 2022-01-09 14:49:56 +07:00
Alex Sharov
0d5d8975d9
Snapshots: create .dat in tmpdir (#225) 2022-01-09 14:43:55 +07:00
Alex Sharov
f5733d438f
Parallel compression (#223) 2022-01-06 14:13:03 +07:00