Answer the question
In order to leave comments, you need to log in
Are there algorithms to automatically highlight similar numeric values in a set of texts?
All sorts of numerical values \u200b\u200bare often found in texts - dates, phone numbers, some unique numbers like a passport. Moreover, their format is variable - the user can write a date with a dot separator, maybe with a comma separator or a slash. In the case of a telephone, the spelling variability is even greater. And the same passport number can consist of one word; from two; optionally include the words "series" and "number" or only the symbol "#". The user can put an extra space somewhere, and skip somewhere on the contrary.
Are there any algorithms for automatic clustering of similar values?
I myself am still trying to come up with something with n-grams and word2vec, having previously replaced all the numbers with one. But a good result is not obtained.
Answer the question
In order to leave comments, you need to log in
In general, it sounds like a task for regular expressions, you should not shove neural networks everywhere.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question