J
J
JRazor2014-01-30 10:52:33
Python
JRazor, 2014-01-30 10:52:33

Python Web: Multiprocessing vs. Threads. What is better to use for parsing?

Apparently I do not understand the difference too much, but I do not understand - which is better to use for web parsing? Is there any Russian reading on this topic?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
S
Sardar, 2014-01-30
@Sardar

You can use Scrapy . Then you don't have to think about parallel processes, locks and IO in general. You just write the logic for parsing the page. The project itself is on Twisted.

Z
zxmd, 2014-01-30
@zxmd

One thing I want to say about lxml - do not use it to parse with the URL passing as the source. It is better to download the page with the same request and drive it into lxml via document_fromstring - save yourself a lot of nerve cells.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question