D
D
D_DeYneko2018-05-24 20:13:41
R
D_DeYneko, 2018-05-24 20:13:41

How to use R for lemmatization, tokenization, stopword removal and subsequent parsing in lda?

Hello!
There is a database of a news site for ~20 years with headings, texts and a date under each in csv format. It's about a gigabyte in size.
Actually I would like to process it somehow, but I do not know how. I am only familiar with the console by downloading archives from github, so it's a difficult task.
Has anyone done something similar in R (I chose it because it has at least some kind of interface and more or less cheerfully works with ~ 700k rows of data), perhaps it will tell you the sequence of actions?
Perhaps there are applications in which this issue could be resolved with less bloodshed?
I also saw topicmine r from the tower, but he refuses to process csv and wants to be fed data in txt for each document separately, which is impossible in my situation.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question