Answer the question
In order to leave comments, you need to log in
How to select articles for 1000 keywords?
There are 1000 key phrases, 5-6 words maximum. There are also 20 thousand articles in which these phrases are diluted. You need to pick up a pair for each phrase, i.e. the article that uses that phrase the most.
I wrote a function that first breaks all phrases into words, removes duplicates and puts them in a dict in this format - {phrase:keywords}. Next, each article is taken in turn, divided into words and using the Levinstein method calculates the similarity of each key word with the word from the article, the total number is summed up and added to the list, and when all the texts are processed, the one that scored max is selected. number of points and thus a pair is found.
This function is too slow - I put text processing in multiproccecing.pool and it came out a maximum of 3-4 articles per second. And I only need to process 20k * 1000 = 20 million times, and this is a maximum of 15 minutes.
Help me please.
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question