How to use R for lemmatization, tokenization, stopword removal and subsequent parsing in lda?

D

D_DeYneko2018-05-24 20:13:41

R

D_DeYneko, 2018-05-24 20:13:41

Hello!
There is a database of a news site for ~20 years with headings, texts and a date under each in csv format. It's about a gigabyte in size.
Actually I would like to process it somehow, but I do not know how. I am only familiar with the console by downloading archives from github, so it's a difficult task.
Has anyone done something similar in R (I chose it because it has at least some kind of interface and more or less cheerfully works with ~ 700k rows of data), perhaps it will tell you the sequence of actions?
Perhaps there are applications in which this issue could be resolved with less bloodshed?
I also saw topicmine r from the tower, but he refuses to process csv and wants to be fed data in txt for each document separately, which is impossible in my situation.