Answer the question
In order to leave comments, you need to log in
Computing with Apache Spark. A powerful server or a cluster of the same power?
Hello. I can't "google" the information and I want to understand...
In the case of Apache Spark computing, you need a lot of RAM, a computing cluster, and so on.
But there comes a question that I don't understand a little.
In the case of MapReduce, how will be calculated faster. If one server has the same parameters as.. say 5 nodes in a cluster?
What is the approximate compute performance of a single server and a cluster with the same parameters?
For example, a server with 10 cores and 256 RAM,
5 cluster nodes with 2 cores each and 51 GB of RAM.
Thank you!
Answer the question
In order to leave comments, you need to log in
Depends on your calculations. In the general case, the cluster will be faster due to the parallel processing of data blocks (more often, calculations do not rest on the CPU and memory, but on disks and the network).
If all the data fits on the disks of one server, then, due to the lack of data transfer over the network, this option may turn out to be more productive.
The more memory, the better, but this is not critical, because in a spark, any large task can and should be divided into a series of small ones that are performed in memory.
Regarding the difference between a cluster and a single powerful server, a powerful server will be faster, due to the higher speed in doing "synchronization" between "workers".
But with well-designed tasks, synchronization costs will be extremely small and can be performed in parallel with the execution of the task itself, which in general will not particularly affect performance.
In any case, scaling your service by increasing the capacity of one server is a dead end. So I wouldn't even bother with one very very powerful and super expensive server.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question