T
T
Thymomenos Gata2018-02-25 18:40:36
Java
Thymomenos Gata, 2018-02-25 18:40:36

Why is the site not being parsed?

I'm trying to parse the site
egecalc.ru/?rus=100&mat=100&soc=100&phy=100&his=10...
blocks with final results for universities, etc.
but nothing works.
Other sites are scraped, but this one is not.
Here is the code:

public class ParseEgeCalc {
    private final static String URL = "http://egecalc.ru/?rus=100&mat=100&soc=100&phy=100&his=100&bio=100&che=100&lan=100&ict=100&geo=100&lit=100&sort_by=salary&city=all&page=1";

    public static void parser() throws IOException {
        Document doc = Jsoup.connect(URL).get();
        Elements el = doc.body().getElementsByClass("card card-outline-info");
        for(Element e : el)
            System.out.println(e.text());
    }
}

If anyone knows the solution, please tell me.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
Sergey Gornostaev, 2018-02-25
@prumin

It is enough to look at the source code of the page to notice that there are no blocks with the card class, and to understand that they are created by JavaScript. Jsoup does not execute JavaScript. Either analyze the work of the javascript code and make the same requests to the backend, or use Selenium.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question