Answer the question
In order to leave comments, you need to log in
Lemmatization in php?
What are the prospects for lemmatization in php today? So they didn’t come up with anything other than phpMorphy? Is it possible to reduce a word to its initial dictionary form without having a dictionary for each word separately? Is it possible to somehow modify the existing Porter stemmer scripts for the needs of lemmatization?
Of course, I would like something ready, but I'm ready to listen to theory and reflections.
In general, the task before me is trivial for a person: to find all the words repeated in the text and write them down in the initial word form.
Answer the question
In order to leave comments, you need to log in
This cannot be done without a dictionary. Otherwise, you get a stemmer with low accuracy.
No stemmer without a dictionary will understand that "bed" is a noun, not a verb.
And what about homonyms, even if there are two meanings for the word "at" in the dictionary:
at - preposition
at - verb from "to shove"
?
Or the word "simple" is both a noun (simple work) and an adjective (simple person) and even a verb (simple here half a day on your feet - you will die).
And there are thousands of such examples.
The problem of lemmatization does not exist now. All problems have already been solved
You can ask a counter question, there are ready-made solutions in C, why not turn them into a module for PHP, well, if there is such a need in the project?
If you need to analyze only the Russian language, then you can try MyStem with a small PHP wrapper.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question