C
C
coldunox2018-03-18 18:01:09
Python
coldunox, 2018-03-18 18:01:09

Arithmetic coding implementation in python?

Sample coding in Excel
5aae7c228c188431167972.png
I already have a frequency count

import collections

co = collections.Counter()
file_txt = open("test.txt","r", encoding='utf-8')
for line in file_txt:
    co.update(line.lower())

total, lo = sum(co.values()), 0
for k, v in co.most_common():
    hi = lo + v
    print('%f\t%c\t%f' % (lo / total, k, hi / total))
    lo = hi

Console output
0.000000	ш	0.272727
0.272727	у	0.545455
0.545455	м	0.727273
0.727273	р	0.818182
0.818182	 	0.909091
0.909091	о	1.000000

As you can see from the screenshot, the text is letter-by-letter on the right.
And the next step changes the values ​​of the initial and final encoded character according to the rule:
High=Lowold+(Highold-Lowold)*RangeHigh(x),
Low=Lowold+(Highold-Lowold)*RangeLow(x),
where Lowold is the lower bound of the interval,
Highold is the upper bound of the RangeHigh interval
, and RangeLow are the upper and lower bounds of the encoded symbol.
The result is the number on the left.
1) What is the best way to save the sample, since it will be needed when recalculating the values? Dictionary, json ?
2) How to read a text file letter by letter, applying the AK algorithm to it

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Alexander, 2018-03-19
@Survtur

json is fast enough and human readable. You only read it once, and not at each iteration. So everything is ok.
Read file letter by letter:

# Открыть файл для чтения в текстовом формате с кодировкой UTF-8
with open(ФАЙЛ, mode='tr', encoding='utf8') as f:
    # Повторять вечно
    while True:
        # считать один символ
        c = f.read(1)
        # если ничего не считано, выходим из повторения
        if not c:
            break
        
        # обработка символа с

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question