How to parse HTML in Java using HtmlUnit or JSOUP?

E

Evgeniy Kornyshev2019-07-30 22:48:28

Java

Evgeniy Kornyshev, 2019-07-30 22:48:28

Hello. There was the following problem with parsing sites: the get method in JSOUP and the corresponding mechanism in HtmlUnit return the source code of the page. But the necessary text content that I see in the browser is wired into the source code, but I don’t know how to extract it from there. Is it possible to get the final HTML page with all text content using Java tools, or is it all in a readable form? Thanks in advance, I hope I wrote clearly.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

Sergey c0re, 2019-07-31
@Kornyshev

I think you need "headless" chrome, see Introduction to Headless Chrome