N
N
Nikita Gusakov2013-05-24 00:11:33
Programming
Nikita Gusakov, 2013-05-24 00:11:33

What is wrong with the implementation of the Huffman algorithm?

here is the code
Everything works except for converting the last character. In general, it is not converted to any, and I don’t understand why, although I suspect that I’m not working with files quite correctly.
PS Also, the question would be useful for beginners in the form of a tutorial? Describe the implementation of this algorithm? I've seen several articles, but none describe the reverse conversion process.

Answer the question

In order to leave comments, you need to log in

6 answer(s)
M
mayorovp, 2013-05-24
@hell0w0rd

So, what files are called "large"?
Offhand, I see two problems - files over 2GB (int overflow) and null characters (to which a code is erroneously not assigned).
The first problem is treated by using a larger data type, and the second by such a patch:

void Compressor::buildCodeTable(Node *root)
{
    if (root->hasChild())
    {
        code.push_back(0);
        buildCodeTable(root->getChild(true));
        code.pop_back();

        code.push_back(1);
        buildCodeTable(root->getChild(false));
        code.pop_back();
    } else {
        codeTable[letter] = root->getLetter();
    }
}

[
[email protected]><e, 2013-05-24
@barmaley_exe

For the last byte, you need to know the number of significant bits. After all, different Huffman-compressed characters have different bit lengths, and you can only write bytes.
Perhaps you have already taken this case into account, but at a superficial glance I did not see it.

M
mayorovp, 2013-05-24
@mayorovp

This piece should be added to the compress function before closing the stream:

if (count > 0)
    out << buf;

This is due to the fact that after the loop, buf could contain significant bits that do not form a full byte and therefore have not yet been written to the stream.

M
mayorovp, 2013-05-24
@mayorovp

Oh, and another structural note: I don't think it's a good idea to combine the compress and decompress methods in a single class that stores the input file as a field. It seems to me that even at the stage of creating an instance, the user of the library will know what to do with this file - to pack or unpack.
IMHO it is better to separate the charMap, charTree and inputFile fields into the base class, and scatter the rest among the Compressor and Decompressor classes. Even better, limit yourself to two global functions.
Another option is to remove the "input file" entity from the class fields. This will allow you to collect statistics on one stream, and compress on another, which can be useful for on-the-fly compression. Or you can store statistics separately from compressed data, which can also be used.

M
mayorovp, 2013-05-24
@mayorovp

Two more errors - you do not delete the created Node entities, which causes a memory leak. Also, running tree.sort in a loop is a terrible idea. The correct implementation of the algorithm takes advantage of the fact that the Node entities in the algorithm are created in ascending order of frequency, so that one sorting of the source nodes at the beginning can be dispensed with.

M
mayorovp, 2013-05-24
@mayorovp

Yes, just noticed: the decompress method can now decompress some garbage characters at the end of the file (try to pack a file containing only one character). It would be necessary to write the length of the original stream before the compressed stream, or at least its last three bits.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question