How to parse a page with full javascript loading?

X

xdgadd2017-06-25 01:35:17

Python

xdgadd, 2017-06-25 01:35:17

I want to parse for the arhivach.org dataset.
The main content loads fine, but js scripts don't. Specifically, I'm interested in everything that lies in <span class="post_replies"></span>for each post. This part of the page is dynamically generated, but neither requests nor selenium load additional scripts. Using the method of scientific poke, I found out that custom.js and jquery.js are responsible for loading.
How can I get the page with loaded scripts?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

R

rPman, 2017-06-25
@xdgadd

Do not suffer, run a full-fledged browser (webkit is available for all platforms), while you have full access to the page being loaded, you can inject your code, you can simply get a document in the form of xml (not a file, namely the DOM model that is assembled, incl. javascript)
And most importantly, the website will not be able to do anything to prevent your attempts to automate the work with the site (except for statistical ones, of course, but this is already a matter of implementation and your requirements)

A

Anton, 2017-06-25
Reytarovsky @Antonchik

See where js sends requests to get content to do the same and get the data you need <span class="post_replies"></span>