How to group a list of English words into parts of speech and save them into separate files?

N

nv_vasilencov2019-01-24 09:49:25

Python

nv_vasilencov, 2019-01-24 09:49:25

Good afternoon. Before that, the task was to parse the Russian dictionary. I used pyMorphy2 script like this:

Script

infile = Path(r"C:\Temp\slovar.txt")
words = infile.read_text(encoding="utf-8").splitlines()
print(words)
#['каждый', 'охотник', 'желает', 'знать', 'где', 'сидит', 'фазан']
morph = MorphAnalyzer()
items = [(str(morph.parse(w)[0].tag.POS), w) for w in words]

print(items)
#[('ADJF', 'каждый'), ('NOUN', 'охотник'), ('VERB', 'желает'), ('INFN', 'знать'), ('ADVB', 'где'), ('VERB', 'сидит'), ('NOUN', 'фазан')]
for g, it in groupby(sorted(items), key=lambda x: x[0]):
    otufile = infile.parent / f"{g}.txt"
    otufile.write_text("\n".join([word for pos, word in it]),
                       encoding="utf-8")

But bad luck, from this post https://toster.ru/q/305279, I found out that pyMorphy2 does not support English.
I don't know what to do, please help. The contents of the dictionary are words where each starts with a new line https://drive.google.com/file/d/1K9YGgGY1Nk86bhIGW...

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

E

Evgeny Akulinin, 2019-01-24
@forkhammer

Try using the NLTK library for English www.nltk.org/index.html

F

First Name Last Name, 2019-01-24
@tommygain

If you have a "do-it-forget" task, you need a crutch and do not need performance, try translating words through the Yandex translator API, and then determine the part of speech using pyMorphy2. Or use any dictionary with the ability to determine the part of speech. Of course, it will take more time than you probably would like.