Answer the question
In order to leave comments, you need to log in
What hardware node configuration to choose for a Hadoop cluster?
Previously, Vertica + Tableau coped with the tasks of analytics, but the data is already more than 1TB + there are many problems with the load, streaming data processing + needs to be expanded. All this should be handled well by a Hadoop cluster. From the start there will be 3-8TB of data. There will be Spark Streaming to process the data flow from the site with real-time visualization of the main product indicators (Payback, activity, etc.). There is very little information on the internet. Everything that I managed to find on English-language resources I tried to comprehend and display for the future generation in this article: bigdata-intips.blogspot.com/2015/10/hadoopwith-spa.... But these are only the basic concepts of what a cluster hardware configuration is, and the indicators vary greatly from source to source. You need to take it with a margin of half a year and you don’t want to spend money and get a bottleneck in the CPU or in the Network or in the memory. If you have experience in administering a Hadoop cluster, then please tell me the main options for the characteristics of NameNode, DateNode and other necessary servers in a production environment with the most relevant price / performance ratio. Thanks for the help!
Answer the question
In order to leave comments, you need to log in
The question, as always, is the budget.
Although they say that khadup is cheap, but this is not entirely true.
The Name/Managment Node needs powerful hardware. (4-8 cores, 48+GB of frames, RAID5(6) with hotspare)
The Data/Worker can be simpler but with a good disk system and more RAM (4-6 cores 48+GB of frames , RAID1 system, JBOD ter 10).
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question