P
P
posters2019-08-03 21:45:45
PHP
posters, 2019-08-03 21:45:45

Lemmatization in php?

What are the prospects for lemmatization in php today? So they didn’t come up with anything other than phpMorphy? Is it possible to reduce a word to its initial dictionary form without having a dictionary for each word separately? Is it possible to somehow modify the existing Porter stemmer scripts for the needs of lemmatization?
Of course, I would like something ready, but I'm ready to listen to theory and reflections.
In general, the task before me is trivial for a person: to find all the words repeated in the text and write them down in the initial word form.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
D
Developer, 2019-08-04
@samodum

This cannot be done without a dictionary. Otherwise, you get a stemmer with low accuracy.
No stemmer without a dictionary will understand that "bed" is a noun, not a verb.
And what about homonyms, even if there are two meanings for the word "at" in the dictionary:
at - preposition
at - verb from "to shove"
?
Or the word "simple" is both a noun (simple work) and an adjective (simple person) and even a verb (simple here half a day on your feet - you will die).
And there are thousands of such examples.
The problem of lemmatization does not exist now. All problems have already been solved

A
Alexey Prikazchikov, 2019-08-03
@alexprik07

You can ask a counter question, there are ready-made solutions in C, why not turn them into a module for PHP, well, if there is such a need in the project?

S
Semyon Ancherbak, 2019-08-05
@s_ancherbak

If you need to analyze only the Russian language, then you can try MyStem with a small PHP wrapper.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question