T
T
Therapyx2016-03-09 14:43:00
Java
Therapyx, 2016-03-09 14:43:00

How to determine file encoding format?

(More precisely, not even just Unicode, but "updated the title of the question")...
The task is as follows. When reading a file, you need to determine its encoding, for example, ANSI, UTF-8, UTF-8-BOM, and 3 other things. So far, only such thoughts come to mind (correct if I'm wrong).
Take at least the first line of the file, split it into 8 bits, and check further on this basis.
1) But how?
So far, I have made an array of bytes from the receiving file into a function
byte[] bFile = new byte[(int) file.length()];
that I display on the screen through

System.out.println(Integer.toBinaryString(b & 255 | 256).substring(1));

Let's take for example a file with 1 letter d - we get 01100100 on the output, take D - we get 01100100 on the output (notepad ++ defines the format as ANSI)
I create a new file - I'm already writing Russian. D - at the output I get 11010000 10010100 I
add to D English. a = "Yes(eng)" output is
11010000
10010100
01100001
Notepad++ defines as UTF-8.
And also for UTF-8-BOM there are already 4 bytes per character.
2) yes, there are a lot of different characters, it can be both Turkish letters, and Russian, and German letters like äüö and each character can stretch even for 4+ bytes. How can you make such a scheme that would determine the Unicode used in the file? (Without magic libraries, if there are any ...).
3) Or at least if anyone knows how you can split the output of 8 bit strings, so that at least for a start you can determine by eye that these 4 bytes 11010000 10010100 11010000 10010100 are 2 letters, not 1, etc., so that it comes out something like this
11010000 10010100
11010000 10010100
or so :)
11010000
10010100
11010000
10010100

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Anton Fedoryan, 2016-03-09
@AnnTHony

Look here
I also found such an algorithm.

N
nirvimel, 2016-03-09
@nirvimel

juniversalchardet

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question