D
D
DmitryPros2017-08-17 22:11:58
Java
DmitryPros, 2017-08-17 22:11:58

How to get the site encoding?

BufferedReader body = new BufferedReader(new InputStreamReader(con.getInputStream(), "utf-8");
String tempLine, outString = "";
while ((tempLine = body.readLine()) != null)
     outString += tempLine + " ";
body.close();
return outString;

By default, I read the stream in utf-8 encoding, but sites with a different encoding may come across, how can I determine it if its name is not returned in Headers?
I tried to write a separate method and read the first line, extract the name of the encoding from the body of the document, but in this case body.readLine () displays the text not from the second line, but from the middle of the document, the picture shows that tempLine takes information clearly not from the next line :
8e1e5ab251574de8b6756c02b6996812.PNG

Answer the question

In order to leave comments, you need to log in

2 answer(s)
L
Labunsky, 2017-08-17
@DmitryPros

The moment you receive a string, it is already stored in Java UTF-16 anyway. That is why the encoding is specified for the reader before reading begins.
With an already read line, if it was read in the wrong encoding and "corrupted", and the encoding is still unknown, nothing can be done.
In order to determine the source encoding of the site you are uploading, you must first get it not as a string, but as an array of bytes . After that, you can work with it and already determine the encoding either with the help of your crutches, or already existing strangers .

A
AStek, 2017-08-17
@AStek

From the response header (responce header-a).

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question