A
A
Astrohas2017-08-01 22:09:35
Python
Astrohas, 2017-08-01 22:09:35

How to speed up parsing in Python?

Good evening, toasters! I wanted to parse bashorg, ithappens and zadolbali for scientific purposes. I wrote a banal parser on bs4, the speed at 4 threads is 10 posts per second. 100Mbit network (Montreal). Ping to bashorg 82ms. Time shows page loading speed in one thread for 365ms. In this situation, only the bashorg will take about a day. Are there ways to speed up the process?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
S
sim3x, 2017-08-01
@Astrohas

You don’t need to speed up parsing
You need to speed up grabbing
And you can speed it up if you make the caching dns closer, place the grabber closer to the donor host
And if you do parsing in lxml, then there’s nothing to speed up there - it’s already written in C

A
asd111, 2017-08-03
@asd111

if you use only multithreading, then you can run 20-40 threads or more if the processor allows.
increase the number of threads until you load 100Mbps or the processor

R
Roman Mindlin, 2017-08-02
@kgbplus

Download sites with a multi-threaded rocker (like winhttrack) and parse already from your disk

N
Nitrius, 2018-02-07
@Nitrus

bashorg 1300+ pages, pumped out quickly enough, and then a simple simple parsing. Same with other sites.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question