E
E
Evgeny Tarnov2019-07-05 13:43:12
Java
Evgeny Tarnov, 2019-07-05 13:43:12

How to parse (Jsoup) an already fully loaded VK feed?

How to force Jsoup to load / parse the VK feed to the end?

public class Main {
    public static void main(String[] args) {
        List<Article> articleList = new ArrayList<>();
        Document doc;
        try {
            doc = Jsoup.connect("https://vk.com/team").get();
            Elements links = doc.getElementsByAttributeValue("class", "page_post_sized_thumbs  clear_fix");
 links.forEach(link -> {
                    Element url = link.child(0);
                    String a = url.attr("style");
                    Elements pngs = url.select(".jpg");
                    String v = pngs.text();
                    System.out.println(a);
                    articleList.add(new Article(v));
                });
                articleList.forEach(System.out::println);} catch (IOException e) { e.printStackTrace();}
       }
}

class Article {
    private String url;
    public Article(String url) {
        this.url = url;
    }

    public String getUrl() {
        return url;
    }

    public void setUrl(String url) {
        this.url = url;
    }
    @Override
    public String toString() {
        return "Article{" +
                "url='" + url + '\'' +
                '}';
    }
}

Console:
spoiler
width: 510px; height: 350px;background-image: url(https://sun1-83.userapi.com/c850020/v850020403/1a6...
width: 230px; height: 120px;background-image: url(https://pp.userapi.com/c635104/v635104578/46fbf/g_...
width: 510px; height: 340px;background-image: url(https://sun1-21.userapi.com/c846122/v846122555/1f3...
width: 510px; height: 392px;background-image: url(https://sun1-24.userapi.com/c846217/v846217285/1eb...
width: 191px; height: 120px;background-image: url(https://pp.userapi.com/c852132/v852132841/ff91a/Cb...
width: 510px; height: 340px;background-image: url(https://sun1-83.userapi.com/c849416/v849416678/151...
width: 510px; height: 318px;background-image: url(https://sun1-22.userapi.com/c849032/v849032241/13a...
width: 115px; height: 120px;background-image: url(https://pp.userapi.com/c849524/v849524452/1297b3/J...
width: 496px; height: 297px;background-image: url(https://sun1-18.userapi.com/c851224/v851224688/9f0...
Article{url=''}
Article{url=''}
Article{url=''}
Article{url=''}
Article{url=''}
Article{url=''}
Article{url=''}
Article{url=''}
Article{url=''}

Answer the question

In order to leave comments, you need to log in

2 answer(s)
C
Cheypnow, 2019-07-05
@etarnov

No way. It will not work to get dynamic elements with a simple GET request. This requires Selenium, PhantomJS or some similar tools.

S
sergey, 2019-07-05
kuzmin @sergueik

Yevgeny Tarnov is almost correct. Here's what I got after some minor tweaks.

jsoupDocument = Jsoup.parse(pageSource);

    jsoupElements = jsoupDocument.getElementsByAttributeValue("class",
        "people_cell");
    assertThat(jsoupElements.size(), greaterThan(0));
    jsoupElements.forEach(link -> {
      Element imgElement = link.getElementsByTag("img").get(0);
      assertThat(imgElement, notNullValue());
      // NOTE: url has empty string in html()

      attributeValue = imgElement.attr("src");
      assertThat(attributeValue, startsWith("https://"));
      String url = attributeValue.replaceAll(".jpg?.*$", "");
      articleList.add(new Article(url));
    });
    articleList.forEach(System.err::println);

Article{url='https://sun6-16.userapi.com/c628420/v628420404/e2ca/wKt7vWFzMOA'}
Article{url='https://pp.userapi.com/c841228/v841228591/113bc/UDcryzU7E6I'}
Article{url='https://pp.userapi.com/c9267/u00190/e_d11db660'}

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question