Answer the question
In order to leave comments, you need to log in
How to convert string encoding to standard?
Hey!
I'm parsing an html page using Jsoup, which is encoded in windows-1251 (this is indicated by a tag on the page itself).
The problem is that when I convert the paired piece of code to a string, and then I try to call String.contains("kinds"), it returns false to me, although such a substring exists in it.
Empirically, I realized that the word "kinds" in byte representation has the form {-30, -24, -28, -5}
How to be?
The code:
Document page = Jsoup.connect(URL + urlShop)
.timeout(20000)
.get(); //получаем html-страницу
Elements row = page.select("div.comp"); //выбираем div с классом comp из страницы
String print = row.text(); //удаляем все теги и преобразуем код в текст
System.out.println(print.contains("виды")); // возвращает false
String regex = new String(new byte[]{-30, -24, -28, -5});
System.out.println(print.contains(regex)); //возвращает true
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question