Answer the question
In order to leave comments, you need to log in
How to parse a page with full javascript loading?
I want to parse for the arhivach.org dataset.
The main content loads fine, but js scripts don't. Specifically, I'm interested in everything that lies in <span class="post_replies"></span>
for each post. This part of the page is dynamically generated, but neither requests nor selenium load additional scripts. Using the method of scientific poke, I found out that custom.js and jquery.js are responsible for loading.
How can I get the page with loaded scripts?
Answer the question
In order to leave comments, you need to log in
Do not suffer, run a full-fledged browser (webkit is available for all platforms), while you have full access to the page being loaded, you can inject your code, you can simply get a document in the form of xml (not a file, namely the DOM model that is assembled, incl. javascript)
And most importantly, the website will not be able to do anything to prevent your attempts to automate the work with the site (except for statistical ones, of course, but this is already a matter of implementation and your requirements)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question