S
S
slowking2012-10-30 18:20:02
Python
slowking, 2012-10-30 18:20:02

Parsing ASP.NET (using python)

I ran into such a problem: on the gzakupki website, you need to parse the Inns of suppliers. We go here gzakupki.ru/guide/supplier.aspx in the search we write, for example, "0" and we have 40k pages of issue. How to move from page to page? How to sniff a request correctly (more precisely, what)?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
B
batalex, 2012-10-30
@slowking

I also recently parsed an ASP.NET application. IMHO the procedure is complex and dreary, and not the fact that it can be done in a reasonable time. In my case, I got lucky. I did this: I took Firefox, installed Firebug, turned on request logging on the “Network” tab, looked at what requests were coming and what variables were being passed. Analyzed, carefully conveyed the same thing with his hands. The problem, as already said, is in all kinds of ViewState, but in my case it was possible to do without it.
I will offer an alternative solution - Firefox + Selenium WebDriver. Then you don’t have to worry about how pagination is implemented at all - we just tell Selenium to “click through” pages sequentially, and that’s it.

A
Alex Shkor, 2012-10-30
@AlexShkor

You're out of luck though. A page written in ASP.NET, and even not of very high quality - the best protection against parsing =)
Almost every action on an ASP.NET page is followed by a form post, with which the ViewState is pumped from the client to the server.
You will have to send POST requests to supplier.aspx while emulating the ViewState and its encrypting method.
Theoretically, the problem is solvable, but I would advise you to find another solution to your original problem.

S
slowking, 2012-11-01
@slowking

Run ChromeDriver + python-selenium - works great! Thanks everyone for the replies.

L
logan, 2012-10-31
@logan

scrapy is here to help.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question