Answer the question
In order to leave comments, you need to log in
Answer the question
In order to leave comments, you need to log in
One computer is not enough to work, you need a cluster. And to learn how to work with big data, big data itself is not needed.
The very concept of 'big data' implies that there is SO MUCH data that conventional approaches and tools do not roll.
For example (all numbers are from the bullshit, just to show the order of the problem), you need to process your web server logs, your scripts shovel the data for a day of attendance - for half an hour on your home computer. And now try to process the logs of some avito or Yandex, even loading all your home computers, phones, routers, computers of your friends, relatives and even computers in your class, your script will still not have time to process them, as they will be received more than an order of magnitude.
This is big data.
Those. tasks in this area are such as to look for non-standard approaches to the solution, or change the algorithm so that it allows you to increase the processing efficiency by orders of magnitude (that is, this is not code optimization, but a change in approach), or you need a really large cluster of machines, but it is expensive.
Big data itself is not needed to explore and experiment, although samples from it are needed to test algorithms.
for studying, you can use a light laptop, the schemes and approaches are the same.
the same Spark, you can run it locally even on a laptop
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question