Answer the question
In order to leave comments, you need to log in
Creating a spider robot to collect data - where to look for information?
Now the task is to develop a spider to collect data from sites.
It is necessary to bypass sites, extract data and add it to the database.
Are there ready-made solutions and frameworks so as not to reinvent the wheel?
How were such problems solved?
Answer the question
In order to leave comments, you need to log in
If you don't mind the python-way, then Flask + BeautifulSoup + SQLAlchemy
A book dedicated to your question
Flask guide on Habré
BeautifulSoup guide in Russian
SQLAlchemy guide in Russian
It was enough for me to import bs4 and take data directly into views.py
from flask import render_template
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
@app.route("/links/")
def parse():
try:
html = urlopen("http://www.site.ru/").read()
except HTTPError as e:
print(e)
soup = BeautifulSoup(html, 'lxml');
links = soup.findAll('a')
return render_template('template.html', links=links)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question