Why is the site not being parsed?

T

Thymomenos Gata2018-02-25 18:40:36

Java

Thymomenos Gata, 2018-02-25 18:40:36

I'm trying to parse the site
egecalc.ru/?rus=100&mat=100&soc=100&phy=100&his=10...
blocks with final results for universities, etc.
but nothing works.
Other sites are scraped, but this one is not.
Here is the code:

public class ParseEgeCalc {
    private final static String URL = "http://egecalc.ru/?rus=100&mat=100&soc=100&phy=100&his=100&bio=100&che=100&lan=100&ict=100&geo=100&lit=100&sort_by=salary&city=all&page=1";

    public static void parser() throws IOException {
        Document doc = Jsoup.connect(URL).get();
        Elements el = doc.body().getElementsByClass("card card-outline-info");
        for(Element e : el)
            System.out.println(e.text());
    }
}

If anyone knows the solution, please tell me.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

Sergey Gornostaev, 2018-02-25
@prumin

It is enough to look at the source code of the page to notice that there are no blocks with the card class, and to understand that they are created by JavaScript. Jsoup does not execute JavaScript. Either analyze the work of the javascript code and make the same requests to the backend, or use Selenium.