Answer the question
In order to leave comments, you need to log in
How to automate the selection of tags for an article?
Colleagues, tell me about this issue:
We have a publication, we publish 5+ articles daily, at the moment the total number is more than 1500.
Now the tags for the article are affixed by hand, periodically we arrange analysis sessions and it can be seen from them that the path is not quite right, since many tags are tritely missed (+ the desire of some authors to "score" at this point).
The question is - are there any systems that would simplify this process, for example, suggest tags for the content of the article? And how do major media content providers (ria, etc.) use it?
Ps We use Elastic as a search engine, I realized from the video that it can somehow help in this task, but I don’t have enough knowledge (or rather, they don’t exist)
Thank you!
Answer the question
In order to leave comments, you need to log in
Exist.
It seems that Reuters was the pioneer. The solution is based on the use of machine learning methods. First, some classifier is built on the corresponding set of marked up articles. It is then used to assign new articles to a particular rubric or rubrics, which is exactly the task of tagging.
Well, offhand, just as an example:
https://towardsdatascience.com/applying-machine-le...
Elaslic is very far away here - only as a repository of information.
By the way, Reuters boasted that it saves millions on the implementation of this method, mainly on the salary of the overclocked department for almost a hundred employees who previously tagged the news manually there.
I do not know exactly how this is implemented in practice, but I would do the following:
1. Define the final set of tags.
2. I would make a dictionary of keywords for each of these tags - synonyms, words from the subject area, etc.
3. Would analyze each article for the presence of keywords and, if there were enough matches, would suggest adding a tag to the article.
This system is extremely easy to make.
1. All words in the article are split and reduced to lower case.
2. An index is compiled: a list of these words and the percentage of matching word groups for a particular article.
3. Labeling occurs until the match percentage is above the threshold value.
4. When the next article is checked, a comparison takes place and tags are automatically placed.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question