A
A
alex_ak12016-12-12 19:11:17
Algorithms
alex_ak1, 2016-12-12 19:11:17

How to determine the number of bytes in an arithmetic compression iteration?

I am implementing an arithmetic compression algorithm.
Let there be a file with 1k characters and 20 different characters. I count the statistics and intervals of each - and here I already have some doubt - the intervals are too narrow, that is, after 3-4 compression iterations, my interval boundary practically does not change. That is, 4 bytes can be stuffed into the interval normally, and then - not very much, you need to output the left part to the file, reset the interval and start again.
How to determine when it is necessary to save the interval to the file - after 4 bytes, after 5, or maybe someday it is possible to save it after 10 bytes?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
V
vasiliev, 2016-12-18
@alex_ak1

You should check the length of the current interval so that it is not less than some value.
For example, in the binary MQ encoder from JPEG2000, the interval size is guaranteed to be between 0.75 and 1.5. If the current interval becomes less than 0.75, then renormalization occurs: the interval and codeword are increased by 2 times, and the bit counter is increased by 1 iteratively until the interval becomes greater than 0.75. When the bit counter becomes greater than 8+{a few extra bits}, the upper bits of the codeword are considered to have "settled down" and can be emitted to the output stream. However, sometimes situations arise when the remaining word overflows and it is required to increase by 1 bytes already issued to the output stream, this must also be taken into account. Potentially it may be necessary to change all previous bytes; in the MQ encoder, this moment is bypassed due to the bit-stuffing procedure.
painted by Amir Said, he also has a good reference implementation .

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question