How to properly submit strings for lemmatization?

R

RockyMotion2019-07-19 18:42:43

Python

RockyMotion, 2019-07-19 18:42:43

I have tokenized a large text, now I am trying to submit these lines for lemmatization. Lemmatization is carried out using pymorphy2, the library accepts only the word. I can’t figure out how to submit a sentence by word, but so that he saves everything in the dataframe to me in the same way by sentences.

data_clear = pd.read_csv('C:\\Users\\ugrobug\\Desktop\\out_token.csv', sep='\t', encoding='utf-8')

def lemma(data_clear):
    morph = pymorphy2.MorphAnalyzer()
    final_data = pd.DataFrame({'Question'})

    for i in data_clear['0']:
        c = morph.parse(i)[0]
        lemmas = c.normal_form
        print(lemmas)
    final_data.loc[len(final_data)]=[lemmas]

    final_data.to_csv('C:\\Users\\ugrobug\\Desktop\\out_lemma.csv', sep='\t', encoding='utf-8')

lemma(data_clear)

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

SideWest, 2019-07-22
@SideWest

Has anyone understood anything?
Me not!
Show what exactly is in data clear
Then what does data_clear['0'
]
mean
?