N
N
nickolas_php2015-10-15 15:58:34
big data
nickolas_php, 2015-10-15 15:58:34

What hardware node configuration to choose for a Hadoop cluster?

Previously, Vertica + Tableau coped with the tasks of analytics, but the data is already more than 1TB + there are many problems with the load, streaming data processing + needs to be expanded. All this should be handled well by a Hadoop cluster. From the start there will be 3-8TB of data. There will be Spark Streaming to process the data flow from the site with real-time visualization of the main product indicators (Payback, activity, etc.). There is very little information on the internet. Everything that I managed to find on English-language resources I tried to comprehend and display for the future generation in this article: bigdata-intips.blogspot.com/2015/10/hadoopwith-spa.... But these are only the basic concepts of what a cluster hardware configuration is, and the indicators vary greatly from source to source. You need to take it with a margin of half a year and you don’t want to spend money and get a bottleneck in the CPU or in the Network or in the memory. If you have experience in administering a Hadoop cluster, then please tell me the main options for the characteristics of NameNode, DateNode and other necessary servers in a production environment with the most relevant price / performance ratio. Thanks for the help!

Answer the question

In order to leave comments, you need to log in

1 answer(s)
U
UNIm95, 2015-10-22
@nickolas_php

The question, as always, is the budget.
Although they say that khadup is cheap, but this is not entirely true.
The Name/Managment Node needs powerful hardware. (4-8 cores, 48+GB of frames, RAID5(6) with hotspare)
The Data/Worker can be simpler but with a good disk system and more RAM (4-6 cores 48+GB of frames , RAID1 system, JBOD ter 10).

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question