D
D
Dmitry Logvinenko2018-02-20 11:14:22
Java
Dmitry Logvinenko, 2018-02-20 11:14:22

How to parallelize a Java application computationally across multiple machines?

Given:
Some computational (not a web!) application on bare Java SE 8, consuming about 500 GB of RAM and a solid part of the Intel Xeon E5-2 *** resources in the process and having something like this, God forgive me, structure:

"Structure" to
lnuo1ezfqxr6dxlsugeebwqfweg.png

That is, one mighty jar-nickname is launched on a Linux server, drags a set of data from the database, which it subjects to some arithmetic executions (of which some are parallelized in no way, some are very good and are now broken down by Thread), and the result is sent back to base.
Of course, there are some problems with this substance:
  1. Weak fault tolerance (what has fallen, we start again)
  2. Zero scalability (options to "throw memory / processors" will soon stop working)
  3. Monitoring only by logs or - during debugging - by running VisualVM
  4. Control only through jar-nick and pkill command-line options

I would like to wrap it in some kind of application server, to manage control, balancing, at the same time distributing the load of the parallel stage of calculations on n machines (where n > 1). In my hazy notions, the new abstract application structure should look like this:
"Structure" after
5lr9eonmg1_jx7qbzv-yb0vsgqm.png

where an uncountable number of Slaves are separate computers to which the Master distributes data (in question. Probably, the slaves themselves can tighten them), distributes the load (if one of the machines has already counted, it is given something else), manages fault tolerance (one left the hosts to smoke - we transfer his task to a more working one), aggregates data from already calculated results and dumps them into the database.
But! But operational Google showed that typical Java application servers like Wildfly, GlassFish, WebSphere, WebLogic are used precisely to serve the needs of web applications, and monsters like Hadoop, Ignite are needed for number grinders. Yes? Or not?
What would you use in this case?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
S
Sergey Gornostaev, 2018-02-20
@sergey-gornostaev

I would just use Spark or Ignite so as not to reinvent the wheel.

D
Dmitry Alexandrov, 2018-02-20
@jamakasi666

Of course, I did not come across in practice, but I read a lot about solving such problems and here are a couple of thoughts:
1) If everything is in the database, then why not start paralleling from it. Let's say in a database (possibly a separate one) to mark who took the data for themselves, then the nodes cling to the database, take a bunch of data and mark that they are already in work, i.e. node 2 will not take data that is already on node 1. This is one of the simple solutions in the forehead.
2) All subsequent implementations are in libraries, gridgain , regular RMI, Apache Ignite, Apache River.

D
Denis Kostousov, 2018-03-01
@sandello

Alternatively, use AKKA. It is available for both rock and Java. It's quite easy to "connect multiple machines". But actually parallelization, transferring the application to another concept (asynchronous messaging instead of a direct call) - this will have to break your head.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question