S
S
Sergey Nizhny Novgorod2015-09-30 16:41:11
Python
Sergey Nizhny Novgorod, 2015-09-30 16:41:11

Parser in Python, how to implement?

Guys, has the following task:
1) A person enters a special page, enters a login and password into the form.
2) This data is used to enter another service.
3) The internal page of the service with data opens, where data begins to be collected.
4) When the analysis of one page ends, there is a transition to the next page, and so on until the stop conditions of the iteration.
5) The collected data is uploaded in the form of a table on a special website.
Can you guide me on how best to implement this?
___________________________________
Guys, in general, I found an approximate way, you can use beautifulsoup.
The question remains: How to pass the login and password to the parser from the form to enter the admin panel so that it enters?

Answer the question

In order to leave comments, you need to log in

6 answer(s)
U
un1t, 2015-09-30
@un1t

grab, scrapy are all nonsense, request + lxml is just right. In a more complex case, selenium and phantomjs are needed.

S
stenhot, 2015-09-30
@stenhot

Use the GRAB
Library and MySQL or SQLite Grab has a lot of possibilities. You can also see an example of a parser HERE

D
Dimonchik, 2015-09-30
@dimonchik2013

scrapy is the de facto standard now, but Grab is easier to learn

C
Cobalt the Terrible, 2015-09-30
@CobaltTheTerrible

You can also look at Scrapy.
From my own experience, I’ll immediately say that it makes sense to save all loaded pages in some kind of key-value storage. Helps a lot with debugging

T
Tanker John, 2015-10-01
@Kuzmichik

The question remains: How to pass the login and password to the parser from the form to enter the admin panel so that it enters?

doc.scrapy.org/en/1.0/topics/request-response.html...

A
alegast, 2015-10-01
@alegast

the parser is only half the work, beautifulsoup will do just fine.
The 2nd half requires authorization on the site, saving cookies (which will come in response headers) and transferring them with each subsequent request after authorization

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question