Arithmetic compression/encoding?

A

AndreySlimus2014-12-26 15:17:17

Algorithms

AndreySlimus, 2014-12-26 15:17:17

Hello!
With arithmetic compression/encoding, it is required to encode the "information" message and then represent the message in binary form.
The method does not seem to be complicated, but I have some problems. In all examples that can be found on the Internet, the interval [0; 1) is used. And in all examples, the message fits well into this interval. It doesn't work that way with the "informational" message, since only three characters are repeated in it: these are "n" - 3, "i" - 2, "o" - 2.
How to be in this case?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

R

Rsa97, 2014-12-26
@AndreySlimus

What's the problem?
We determine the alphabet: (a, i, d, m, n, o, p, f, c, s).
We consider symbols (probability distribution).

а | и | й | м | н | о | р | ф | ц | ы
1 | 2 | 1 | 1 | 3 | 2 | 1 | 1 | 1 | 1

Building a savings account

а | и | й | м | н | о  | р  | ф  | ц  | ы
1 | 3 | 4 | 5 | 8 | 10 | 11 | 12 | 13 | 14

We bring to the segment [0,1) by dividing by the total amount (14)

а     | и     | й     | м     | н     | о     | р     | ф     | ц     |  ы
0.071 | 0.214 | 0.286 | 0.357 | 0.571 | 0.714 | 0.786 | 0.857 | 0.929 | 1

Encoding the message

старт - [0, 1)
и - [0.071, 0.214)
н - [0.122, 0,153)
ф - [0.1465, 0.1486)
...
й - [0.147980532221506, 0.147980532221545)

We take the middle of the interval, we get the encoded message 0.147980532221526

M

maaGames, 2014-12-26
@maaGames

The interval [0;1) is not a float or a double. And any message will "squeeze" into a long number. Only the compression quality will be bad.
It is possible to compress not by symbols, but by 4 bits, for example. Or even bit by bit. Maybe the result will be better. Or maybe worse.