Answer the question
In order to leave comments, you need to log in
How to speed up parsing in Python?
Good evening, toasters! I wanted to parse bashorg, ithappens and zadolbali for scientific purposes. I wrote a banal parser on bs4, the speed at 4 threads is 10 posts per second. 100Mbit network (Montreal). Ping to bashorg 82ms. Time shows page loading speed in one thread for 365ms. In this situation, only the bashorg will take about a day. Are there ways to speed up the process?
Answer the question
In order to leave comments, you need to log in
You don’t need to speed up parsing
You need to speed up grabbing
And you can speed it up if you make the caching dns closer, place the grabber closer to the donor host
And if you do parsing in lxml, then there’s nothing to speed up there - it’s already written in C
if you use only multithreading, then you can run 20-40 threads or more if the processor allows.
increase the number of threads until you load 100Mbps or the processor
Download sites with a multi-threaded rocker (like winhttrack) and parse already from your disk
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question