D
D
dadaniel2020-11-25 15:06:37
Python
dadaniel, 2020-11-25 15:06:37

Which algorithm for finding substrings in a list of tuples will be faster than comprehension?

Given: A list of about 100k tuples. Each tuple has two elements, the first is an id and the second is a string. It is necessary to select all id from those tuples where the strings contain a certain substring.

tuples = [(id1, 'cheese trees'), (id2, 'freezy breeze'),...]
vals = ['cheese', 'flees']
ids = {i[0] for i in tuples if any(val in i[1] for val in vals)}

output: {id1}

Now I do it through comprehension, but there must be a faster algorithmic solution! Here is what I am looking for.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
W
Wataru, 2020-11-25
@dadaniel

Aho-Korasik algorithm .
Put all your keys into the burr, as described in the link. Then for each tuple, search in the i[1] line, if found, then add your id to the answer.

D
Dr. Bacon, 2020-11-25
@bacon

To create a "search" index of these lines, you can get confused yourself, but it's easier to put it in the database and create an index there.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question