S
S
Sergey Sokolov2014-12-06 14:51:11
Data storage
Sergey Sokolov, 2014-12-06 14:51:11

What is the best way to store sorted integers in a binary file?

I store large sets of 32-bit integers, sorted in ascending order. Now, without hesitation, I just write 4 bytes for each. With the growth of the system, the volume of file data has become tangible, backups to the clouds have become worth a noticeable penny.
Because the only task is sequential reading of these integers, then I imagine that you can significantly reduce the amount of data if you write only the first value in its entirety + consecutive increments that can fit in 1-2-3 bytes.
Questions:

  1. surely there is a well-known compression format/algorithm suitable for the described situation - which one?
  2. whether it is necessary, and how - to bother with data integrity control?
Upd. While this option seems optimal, in blocks of 4 increments with one byte at the beginning to indicate the length of these increments. This is how the granulation to whole bytes is preserved:
01010101 01010101 01010101 01010101 - начальное значение
10110101                            - b - сколько байт в следующих 4 блоках
                                      (10 11 01 01 = 2, 3, 1, 1)
01010101 01010101                   - inc - инкремент
01010101 01010101 01010101          - inc
01010101                            - inc
01010101                            - inc

                                      Следующий блок из 4 инкрементов:
11011101                            - b - (11 01 11 01 = 3, 1, 3, 1)
01010101 01010101 01010101          - inc
01010101                            - inc
01010101 01010101 01010101          - inc
01010101                            - inc
...

At the end of the file, block zero-length increments, if those are superfluous.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
maaGames, 2014-12-06
@maaGames

At first I wanted to talk about storing WAV files in PCM format, but changed my mind. You are going to save on cookies - not the right approach.
I offer the best option + its modification.
1. Since the data is written and read sequentially, then God ordered you to use streaming compression (this is if the file does not fit into the RAM, otherwise it would be possible to zip stupidly). Without knowing the programming language, I can not suggest a library. In C++, you can decorate a file stream with an archiver to pack data on the fly. You should not write the archiver yourself, it is better to look for a ready-made implementation.
2. Since the numbers are ordered, you correctly suggested storing the difference between neighboring numbers, this will provide a lot of repeating values ​​that should be perfectly compressed by any archiver.
As a rule, archivers consider the checksum of the encoded data, so additional verification should not be required. If the data gets corrupted when uploading to/from the cloud, then this should be cleared up during unpacking.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question