O
O
Olzhas Ilyubayev2015-05-11 13:22:25
Java
Olzhas Ilyubayev, 2015-05-11 13:22:25

How to convert string encoding to standard?

Hey!
I'm parsing an html page using Jsoup, which is encoded in windows-1251 (this is indicated by a tag on the page itself).
The problem is that when I convert the paired piece of code to a string, and then I try to call String.contains("kinds"), it returns false to me, although such a substring exists in it.
Empirically, I realized that the word "kinds" in byte representation has the form {-30, -24, -28, -5}
How to be?
The code:

Document page = Jsoup.connect(URL + urlShop)
                .timeout(20000)
                .get(); //получаем html-страницу
        Elements row = page.select("div.comp"); //выбираем div с классом comp из страницы
        String print = row.text(); //удаляем все теги и преобразуем код в текст
        
        System.out.println(print.contains("виды")); // возвращает false

        String regex = new String(new byte[]{-30, -24, -28, -5});
        System.out.println(print.contains(regex)); //возвращает true

Answer the question

In order to leave comments, you need to log in

1 answer(s)
O
one pavel, 2015-05-11
@ilyubayev

Do you have the word "views" in your code also in windows-1251?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question