N
N
nv_vasilencov2019-01-24 09:49:25
Python
nv_vasilencov, 2019-01-24 09:49:25

How to group a list of English words into parts of speech and save them into separate files?

Good afternoon. Before that, the task was to parse the Russian dictionary. I used pyMorphy2 script like this:

Script
infile = Path(r"C:\Temp\slovar.txt")
words = infile.read_text(encoding="utf-8").splitlines()
print(words)
#['каждый', 'охотник', 'желает', 'знать', 'где', 'сидит', 'фазан']
morph = MorphAnalyzer()
items = [(str(morph.parse(w)[0].tag.POS), w) for w in words]

print(items)
#[('ADJF', 'каждый'), ('NOUN', 'охотник'), ('VERB', 'желает'), ('INFN', 'знать'), ('ADVB', 'где'), ('VERB', 'сидит'), ('NOUN', 'фазан')]
for g, it in groupby(sorted(items), key=lambda x: x[0]):
    otufile = infile.parent / f"{g}.txt"
    otufile.write_text("\n".join([word for pos, word in it]),
                       encoding="utf-8")

But bad luck, from this post https://toster.ru/q/305279, I found out that pyMorphy2 does not support English.
I don't know what to do, please help. The contents of the dictionary are words where each starts with a new line https://drive.google.com/file/d/1K9YGgGY1Nk86bhIGW...

Answer the question

In order to leave comments, you need to log in

2 answer(s)
E
Evgeny Akulinin, 2019-01-24
@forkhammer

Try using the NLTK library for English www.nltk.org/index.html

F
First Name Last Name, 2019-01-24
@tommygain

If you have a "do-it-forget" task, you need a crutch and do not need performance, try translating words through the Yandex translator API, and then determine the part of speech using pyMorphy2. Or use any dictionary with the ability to determine the part of speech. Of course, it will take more time than you probably would like.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question