Answer the question
In order to leave comments, you need to log in
How to parse large (>25GB) files (activity logs) and rank the received information, what technologies are better to use?
Hello colleagues.
There was a need to parse large log files (> 25GB), rank them in a certain way and present the UI to the end user to analyze this ranked data.
I have never solved such tasks, I don’t know what is better to use for this (Hadoop, elasticsearch, mongo) I’m spinning
in the java eco-system.
Ask for advice from experienced colleagues!
Answer the question
In order to leave comments, you need to log in
www.datacenterknowledge.com/archives/2012/03/08/th...
as long as you don't have a stream of such logs, you don't need to store and process history for 10+ years - you don't have bigdata
I don't know what to use for thisif you don't know what to use under the database - use postgres
Here is the answer to your question. Main idea:
log file -> parser -> logstash -> elastic search -> kibana
Yes, that's right, we read -> we process.
but most of the classical algorithms that we usually use for data processing (for example, sorting) have a class of "offline" algorithms -> where you need to provide all the data at once to get an answer, which is sometimes simply not possible.
look at the class of online algorithms, and streaming data processing.
for example here www.cs.dartmouth.edu/~ac/Teach/CS85-Fall09/Notes/l...
well, or try to use streaming frameworks like spark.
for processing logs, of course, it is easier and faster to write your own algorithms than spark collective farms.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question