Arithmetic coding implementation in python?

C

coldunox2018-03-18 18:01:09

Python

coldunox, 2018-03-18 18:01:09

Sample coding in Excel

I already have a frequency count

import collections

co = collections.Counter()
file_txt = open("test.txt","r", encoding='utf-8')
for line in file_txt:
    co.update(line.lower())

total, lo = sum(co.values()), 0
for k, v in co.most_common():
    hi = lo + v
    print('%f\t%c\t%f' % (lo / total, k, hi / total))
    lo = hi

Console output

0.000000	ш	0.272727
0.272727	у	0.545455
0.545455	м	0.727273
0.727273	р	0.818182
0.818182	 	0.909091
0.909091	о	1.000000

As you can see from the screenshot, the text is letter-by-letter on the right.
And the next step changes the values of the initial and final encoded character according to the rule:
High=Lowold+(Highold-Lowold)*RangeHigh(x),
Low=Lowold+(Highold-Lowold)*RangeLow(x),
where Lowold is the lower bound of the interval,
Highold is the upper bound of the RangeHigh interval
, and RangeLow are the upper and lower bounds of the encoded symbol.
The result is the number on the left.
1) What is the best way to save the sample, since it will be needed when recalculating the values? Dictionary, json ?
2) How to read a text file letter by letter, applying the AK algorithm to it

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

Alexander, 2018-03-19
@Survtur

json is fast enough and human readable. You only read it once, and not at each iteration. So everything is ok.
Read file letter by letter:

# Открыть файл для чтения в текстовом формате с кодировкой UTF-8
with open(ФАЙЛ, mode='tr', encoding='utf8') as f:
    # Повторять вечно
    while True:
        # считать один символ
        c = f.read(1)
        # если ничего не считано, выходим из повторения
        if not c:
            break
        
        # обработка символа с