An efficient implementation of the Huffman algorithm?

A

agorkov2012-01-04 12:38:34

Search Engine Optimization

agorkov, 2012-01-04 12:38:34

In the course of my scientific work, it became necessary to calculate the efficiency of the Huffman code for an alphabet of 300 million elements.
The algorithm itself is trivial, and its implementation is not difficult. But on a real problem, the algorithm works for an unacceptably long time. In this regard, I have two questions:
1. Are there any speed optimizations for the algorithm?
2. To what extent will the use of Shannon-Fano codes be an adequate estimate from above?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

W

Weageoo, 2012-01-04
@Weageoo

1) Arithmetic coding is more efficient.
2) The Huffman algorithm (or other entropy encoding algorithm) is used at the actual compression step in the statistical lossless data compression algorithm PPM (this algorithm already has many modifications; for example, PPMd is used in Rar, 7zip, WinZip).
3) There is an adaptive Huffman algorithm . There is also its implementation in C . If you test (compare with a regular huffman), then write a few words about the results.

M

mayorovp, 2012-01-04
@mayorovp

The classic way to speed up the Huffman algorithm is to merge with a virtual list.
Namely, two lists are created - the original list and the queue.
At the beginning of the algorithm, the initial list is sorted in ascending order of frequencies, the queue is empty.
During operation, the following invariant is maintained: the element with the minimum frequency lies either at the beginning of the queue or at the beginning of the initial list.
The step of the algorithm is as follows: two minimal elements are extracted (both from the original list, both from the queue, or one from the original, and the second from the queue) and “glued together”, the result is placed at the end of the queue.
When the last element is left in both lists, the required tree is built.