E
E
Eldrich2018-06-26 04:30:23
Python
Eldrich, 2018-06-26 04:30:23

How to speed up the Python word substitution process?

Good day.
It is necessary to replace words in a small array of text (slightly more than 1 million words), according to the dictionary.
Now I do it clumsily:

text = re.sub(r'\bз\b', '', text)
text = re.sub(r'\bзап\b', '', text)

The task is that it is necessary to make about 20,000 such replacements on the entire array of text, and this solution is clearly not optimal, since it takes an unforgivably long time to process.
Are there ways to speed up this process?
Thanks

Answer the question

In order to leave comments, you need to log in

3 answer(s)
E
Eugene, 2018-06-26
@immaculate

You can start simple:

bad_words_re = re.compile(r'\b(з(ап)?)\b')
bad_words_re.sub('', text)

Firstly, the expression is compiled once, and secondly, both words are replaced in one pass.

V
vikholodov, 2018-06-26
@vikholodov

I would add multithreading, give each thread its own part of the "array" to eat. Well, it’s not entirely clear in what form you store all this goodness, if an array means a list, then maybe you can use a generator with a check for entering the list of search words?

I
Igor Nikolaev, 2018-06-26
@nightvich

Try to abandon regular expressions in favor of replace.
replace is much faster.

str.replace() should be used whenever it's possible to. It's more explicit, simpler, and faster.

In [1]: import re

In [2]: text = """For python 2.5, 2.6, should I be using string.replace or re.sub for basic text replacements.
In PHP, this was explicitly stated but I can't find a similar note for python.
"""

In [3]: timeit text.replace('e', 'X')
1000000 loops, best of 3: 735 ns per loop

In [4]: timeit re.sub('e', 'X', text)
100000 loops, best of 3: 5.52 us per loop

https://stackoverflow.com/questions/5668947/use-py...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question