M
M
mrskam2013-08-15 16:17:12
Programming
mrskam, 2013-08-15 16:17:12

Tell me a decoder of encodings unknown to science

Please tell me an opensource java-library or a console utility (anything else is just fine) that can recover text with incorrectly applied encodings, for example koi8-R -> utf-8 -> win1251. Simply put, an analogue of the Lebedev decoder , only a server one. Thanks in advance.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
V
Vladimir Martyanov, 2013-08-15
@vilgeforce

enca+enconv?

R
rozhik, 2013-08-15
@rozhik

If (anything else is just fine) , then I'll tell you how it works:
We take the text, break it into words, and look for the first few in different encodings in ispell-dictionaries . As soon as a couple of words matched - profit.
There are several improvements to the idea.
1) use only the first 6 letters of the word.
2) use the frequency analysis data to obtain a sorted list of encoding transformations.
3) use chains for the list of encodings (we are looking for frequently occurring syllables).

K
Kaigorodov Alexey, 2013-08-15
@rfq

Type mail decoder in Google , the first and third links.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question