Answer the question
In order to leave comments, you need to log in
How to group a list of English words into parts of speech and save them into separate files?
Good afternoon. Before that, the task was to parse the Russian dictionary. I used pyMorphy2 script like this:
infile = Path(r"C:\Temp\slovar.txt")
words = infile.read_text(encoding="utf-8").splitlines()
print(words)
#['каждый', 'охотник', 'желает', 'знать', 'где', 'сидит', 'фазан']
morph = MorphAnalyzer()
items = [(str(morph.parse(w)[0].tag.POS), w) for w in words]
print(items)
#[('ADJF', 'каждый'), ('NOUN', 'охотник'), ('VERB', 'желает'), ('INFN', 'знать'), ('ADVB', 'где'), ('VERB', 'сидит'), ('NOUN', 'фазан')]
for g, it in groupby(sorted(items), key=lambda x: x[0]):
otufile = infile.parent / f"{g}.txt"
otufile.write_text("\n".join([word for pos, word in it]),
encoding="utf-8")
Answer the question
In order to leave comments, you need to log in
Try using the NLTK library for English www.nltk.org/index.html
If you have a "do-it-forget" task, you need a crutch and do not need performance, try translating words through the Yandex translator API, and then determine the part of speech using pyMorphy2. Or use any dictionary with the ability to determine the part of speech. Of course, it will take more time than you probably would like.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question