Answer the question
In order to leave comments, you need to log in
How to automatically detect that the text is in the wrong encoding?
There is a database in which a third-party program writes data, the task is to take data from it for reports. In general, everything is written and works, except for one inconvenient moment, the text is periodically saved in the table in the wrong encoding, that is, it looks like this or Microsoft PowerPoint - Презентация ремонты
that
Закупка ноябрь расходники.docx
and is treated by the usual recoding from 1251 to utf. °ЂЃ
? Maybe there is another, smarter way?
Answer the question
In order to leave comments, you need to log in
https://www.codeproject.com/Articles/17201/Detect-...
For example, you can peep the implementation of auto-detect encodings in the far manager. Well, or google something like that. Usually they store statistically characteristic character codes - they start reading the file until statistics are more or less unambiguous and assume an encoding. far determines the encoding quite successfully in most cases.
well, or when there are some hints like the file starts with Russian text, then you can stupidly count the number of characters in the list of Russian letters of characters in several transcoding options)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question