Answer the question
In order to leave comments, you need to log in
What are the methods for extracting the Surname Name Patronymic from the text?
What are the methods for extracting the Surname Name Patronymic from the text?
Of course, it is difficult to assume the existence of any universal algorithm; rather, practical developments and articles are of interest, giving an idea of the directions in which one can move.
Interested in the possibility of highlighting the full name in the format "Putin V.V. / Putin Vladminir Vladimirovich / Vladminir Vladimirovich" (in different cases), etc.
Answer the question
In order to leave comments, you need to log in
Well, here's a template for writing regexp right off the bat:
1. two or three words (separator: not a letter or number (or several))
2. all 1st letters are capitalized,
3. at least one word - does NOT consist of 1 letter.
4. if the word consists of the 1st letter - the next character is required: "." (dot)
5 Levenshtein distance according to the dictionary of names, surnames and patronymics (through their combinations).
Google is working on a semantic search, where each name and surname is assigned a separate kryakozyabr (/m/05qt0" - Politics, Putin is in Russian and even Mizulina) with dashes (well, the rest of the words, the search is already working, I have the same for several expressions) , in English it's easy and in Russian there are few words. v3 it's supposed to be looked for there. In 10 years, such a problem probably won't arise.
There should be cases with a list of all names / surnames / patronymics. At a minimum, you can automatically select from wikipedia or dbpedia (structured data from wikipedia).
1. We find cases or form our own.
2. We do a search for a match with at least one word (better, for a partial match - in case of typos and any declensions).
3. When we find it, we select the neighborhood of the word (a pair of words left-right) and analyze it with heuristics.
If the task is serious (not a hobby), then since Tomita was mentioned, I will also mention such a thresher of texts: ABBYY Tagger . Dictionaries and rules included. But it's not a budget option.
This is called the named entity recognition problem. In your case, the names of persons. Our Textocat API product can do this: see for yourself on the demo page for the Russian language or get a free API key after registering on our website.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question