3
3
3143dec2016-10-31 12:54:16
JavaScript
3143dec, 2016-10-31 12:54:16

How to parse a dynamic page in C#?

It is required to parse a web page with dynamically loaded content. Entire parts of the page are loaded using JS code. There is a lot of this code and it is subject to obfuscation, so it is not an option to analyze it and rewrite it in C#. There is a lot of information on the Internet, but essentially little. Having a good understanding of how JS works, I assume that we need to put something that will reproduce the information we need, just waiting for the page to fully load. A lot of things, different webdrivers, phantomzhs, selenium, avesomium, etc. But, little information about them themselves. The implemented program must do this in threads, that is, n-th number of "browsers" can be open at the same time, all have their own cookies, etc. At the same time, a graphical display is not necessary, let's say, you need a browser without a head.

Please advise how to do this, in the worst case, you will have to implement all this through a browser, for example, Awesomium.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
3
3143dec, 2016-11-02
@3143dec

Solution: in my case there was a lot of obfuscated code on the page itself, in js. Otherwise, it would be possible, as mentioned above, to use different solutions:
I used PhantomJS + Selenium for my task . Technologies are developing, there are some things that are not implemented, there is little clear documentation. If you have any questions, please contact :)

R
Rou1997, 2016-10-31
@Rou1997

Of the headless browsers, phantom.js is one of the best, it is specifically designed for this.
But:
Not necessary! And you don’t know how JS works so well, if the data is loaded using AJAX, then it’s not necessary to execute JS, you can just simulate these HTTP requests, and to understand what to simulate, there are sniffers - Fiddler, Wireshark, Charles, and finally DevTools in the browser.
What you want is akin to reverse engineering, so look for information first of all in your head, in the form of skills and knowledge! Train!

A
Alexander Zaitsev, 2016-11-01
@nithrous

If the data is loaded via js with an ajax request, you can use fiddler to find the URLs that request the data you need and access them directly, without parsing the page

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question