N
N
Nikita2016-02-25 12:15:09
C++ / C#
Nikita, 2016-02-25 12:15:09

MPI. How to interrupt the execution of all processes?

Suppose an error occurred in one of the running processes, due to which the further execution of the entire parallel program is basically impossible, as a result of which it is necessary to correctly terminate the work of all processes.
I know about the MPI_Abort function provided for these purposes, but it is not very convenient ...
In general, I would like to put a correct emergency exit from the process directly into the program execution algorithm. Something like this design for handling the occurrence of an emergency:

try
{
  ...
}
catch (std::exception &err)
{
  errCode = 2;

  for (int i = 0; i < procCount; i++)
  {
    if (i == procRank)
      continue;

    const int msg = 0;
    MPI_Send(&msg, 1, MPI_INT, i, MSG_WORK, MPI_COMM_WORLD);
  }

  std::cerr << "Error : " << err.what() << '\n';
}

And then, accordingly, in other processes, organize an interrupt when the corresponding message arrives. The problem is that I can't terminate the process before other processes have received the appropriate "emergency" messages. This can lead to deadlock (for example, if several processes "collapsed" at once, then they send messages to everyone according to the code I have given above, but do not receive them themselves)
I am more than sure that I am not the first to ask this question. There must be some solution...

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
motral, 2016-02-25
@motral

architecturally change the solution, make a separate monitoring process

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question