S
S
sir_Jack2012-09-03 13:24:22
Python
sir_Jack, 2012-09-03 13:24:22

How to search for several regular expressions in the text at the same time?

There are N texts and there are M regex expressions
For each expression, you need to find the texts corresponding to this expression.
M and N are large enough numbers to sequentially execute REGEX_i in TEXT_j and see the result
I would like to somehow view all regexes at the same time
Well, it is desirable to advise some modules, if not difficult

Answer the question

In order to leave comments, you need to log in

6 answer(s)
A
Alexander Davydov, 2012-09-03
@nyddle

github.com/dprokoptsev/pire
This library is aimed at checking a huge amount of text against
relatively many regular expressions.

M
MikhailEdoshin, 2012-09-03
@MikhailEdoshin

I did not specifically research, but the impression is that there is no ready-made tool. Although the construction seems obvious - there are M finite automata (regexes), we combine them into one (methods are known), plus in the initial and / or final state of each source automaton we set a callback with the identifier of this automaton and the position in the text.
Maybe take some KA library and try to make something out of it? For C, there is libfa (a port of the Java dk.brics.automaton ), it will perform the first half of the task, but I don’t know about the marked states and callback.

P
Pavel Tyslyatsky, 2012-09-03
@tbicr

It is unlikely that there are ready-made solutions, but you can do it not sequentially, but in parallel, processing in several processes / threads, since these operations are independent and well parallelized. If you haven't considered this option yet, then I think the multiprocessing module will help you.
If this is too slow, you can try rewriting it in C.

A
Alexander Korotkov, 2012-09-03
@smagen

I'm working on a similar task: an index search by regular expressions in a database. You can read and watch the presentation here:
www.pgcon.org/2012/schedule/events/383.en.html
As applied to your case, the algorithm will be as follows:
1. Fill N texts into the database.
2. Build an index.
3. Search by index M times.
It was interesting to see what texts and expressions you have, how applicable my approach is to this task.

A
Alukardd, 2012-09-03
@Alukardd

Хочется как-то одновременно просматривать все регексы
Ну так состряпайте сложный regexp с множественным ИЛИ.

M
Mithgol, 2012-09-03
@Mithgol

Составное регулярное выражение?
Пайтона не знаю, но в джаваскрипте выглядело бы эвона как:

/(regex1|regex2|regex3|regex4|regex5|regex6|regex7|regex8|…|regexM)/

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question