What are the approaches for extracting data from websites?

V

Vetal Matitskiy2015-05-21 13:19:22

Python

Vetal Matitskiy, 2015-05-21 13:19:22

Good afternoon, dear development gurus
, tell me, please, what are the general approaches for programmatically extracting data from websites? You need to write a script (for example, in python / groovy) that, having scanned the en.wiktionary.org resource , would save the nouns from it to the file

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

S

sim3x, 2015-05-21
@sim3x

Scrapy.org is used to collect pages (crawling, scraping), it has a "built-in" html parser - https://pypi.python.org/pypi/lxml/ lxml.de
but for a wiki like resources this is not necessary https:// dumps.wikimedia.org/

L

lPolar, 2015-05-21
@lPolar

The collection process is called site scraping.
Take grab (bs4,requests,mechanize) and read articles on Habré - everything is described there.