V
V
vante_scribaxxi2018-04-07 12:56:38
Python
vante_scribaxxi, 2018-04-07 12:56:38

How to select articles for 1000 keywords?

There are 1000 key phrases, 5-6 words maximum. There are also 20 thousand articles in which these phrases are diluted. You need to pick up a pair for each phrase, i.e. the article that uses that phrase the most.
I wrote a function that first breaks all phrases into words, removes duplicates and puts them in a dict in this format - {phrase:keywords}. Next, each article is taken in turn, divided into words and using the Levinstein method calculates the similarity of each key word with the word from the article, the total number is summed up and added to the list, and when all the texts are processed, the one that scored max is selected. number of points and thus a pair is found.
This function is too slow - I put text processing in multiproccecing.pool and it came out a maximum of 3-4 articles per second. And I only need to process 20k * 1000 = 20 million times, and this is a maximum of 15 minutes.
Help me please.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dimonchik, 2018-04-07
@dimonchik2013

act like a sledgehammer, then tune
with a sledgehammer can be
1) Sphinxsearch (and analogues like Elastica) + SPH_MATCH_ALL
2) Full-text search in MySQL / PostgreSQL
, strictly speaking, properly configured (1) can make tuning unnecessary

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question