Answer the question
In order to leave comments, you need to log in
What is the best way to store sorted integers in a binary file?
I store large sets of 32-bit integers, sorted in ascending order. Now, without hesitation, I just write 4 bytes for each. With the growth of the system, the volume of file data has become tangible, backups to the clouds have become worth a noticeable penny.
Because the only task is sequential reading of these integers, then I imagine that you can significantly reduce the amount of data if you write only the first value in its entirety + consecutive increments that can fit in 1-2-3 bytes.
Questions:
01010101 01010101 01010101 01010101 - начальное значение
10110101 - b - сколько байт в следующих 4 блоках
(10 11 01 01 = 2, 3, 1, 1)
01010101 01010101 - inc - инкремент
01010101 01010101 01010101 - inc
01010101 - inc
01010101 - inc
Следующий блок из 4 инкрементов:
11011101 - b - (11 01 11 01 = 3, 1, 3, 1)
01010101 01010101 01010101 - inc
01010101 - inc
01010101 01010101 01010101 - inc
01010101 - inc
...
Answer the question
In order to leave comments, you need to log in
At first I wanted to talk about storing WAV files in PCM format, but changed my mind. You are going to save on cookies - not the right approach.
I offer the best option + its modification.
1. Since the data is written and read sequentially, then God ordered you to use streaming compression (this is if the file does not fit into the RAM, otherwise it would be possible to zip stupidly). Without knowing the programming language, I can not suggest a library. In C++, you can decorate a file stream with an archiver to pack data on the fly. You should not write the archiver yourself, it is better to look for a ready-made implementation.
2. Since the numbers are ordered, you correctly suggested storing the difference between neighboring numbers, this will provide a lot of repeating values that should be perfectly compressed by any archiver.
As a rule, archivers consider the checksum of the encoded data, so additional verification should not be required. If the data gets corrupted when uploading to/from the cloud, then this should be cleared up during unpacking.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question