V
V
Vetal Matitskiy2015-05-21 13:19:22
Python
Vetal Matitskiy, 2015-05-21 13:19:22

What are the approaches for extracting data from websites?

Good afternoon, dear development gurus
, tell me, please, what are the general approaches for programmatically extracting data from websites? You need to write a script (for example, in python / groovy) that, having scanned the en.wiktionary.org resource , would save the nouns from it to the file

Answer the question

In order to leave comments, you need to log in

2 answer(s)
S
sim3x, 2015-05-21
@sim3x

Scrapy.org is used to collect pages (crawling, scraping), it has a "built-in" html parser - https://pypi.python.org/pypi/lxml/ lxml.de
but for a wiki like resources this is not necessary https:// dumps.wikimedia.org/

L
lPolar, 2015-05-21
@lPolar

The collection process is called site scraping.
Take grab (bs4,requests,mechanize) and read articles on Habré - everything is described there.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question