Answer the question
In order to leave comments, you need to log in
Will a repository work?
Good day
Tell me if I deploy a local repository for RHEL8 and RHEL7
This repository can be used by redhat versions RHEL8.1 RHEL7.4
Or do I need to deploy my own repository for each version according to versionality?
Answer the question
In order to leave comments, you need to log in
You need to increase the cache:
>>> import re
>>> re._MAXCACHE
100
>>> re._MAXCACHE = 3000
>>> re._MAXCACHE
3000
Are regular expressions precompiled?
If not, do it.
If yes, there will be no tenfold gain when switching to C ++, and do not hope.
The Python re engine is written in C and is comparable in performance to implementations in libraries in other languages.
So:
1. Preprocessing html - you can probably cut off unnecessary pieces and bite out unnecessary blocks before setting 3000 patterns on it
2. Even deeper preprocessing - splitting html into atomic fragments so that, once a fragment is identified, no more crawl.
Use asynchronous code execution in 1*x threads. The performance gain should be good.
"str(?:/([\\d.]+))"
It looks like you forgot to escape the \. Well, two slashes before d look strange, did you really want to escape the slash, or wanted to insert an expression for numbers \d.
I didn't write regular expressions. they are generally on js initially, I ported them to python by changing the characters.
the code itself runs in hundreds of threads.
expressions are precompiled.
the meaning of the task is to run 3000 regular expressions on 100 million html documents as quickly as possible.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question