P
P
Pavel2016-10-03 00:30:47
C++ / C#
Pavel, 2016-10-03 00:30:47

How to process files in c++ in parallel?

Good day!
We gave a test task, I get a folder with files at the input. It is necessary to read all the text files in the folder (an integer is written in the files), sum it up. Output the file name and its contents to stdout as you read it, at the end output the total amount.
Read files in parallel, after reading the file, the stream is put to sleep for 1 second.
The fact is that I have not yet worked with parallel computing in C++. As a result, I was able to write this:

#include "boost/filesystem.hpp" ///For reading directory
#include "boost/lexical_cast.hpp" ///For converting and checking data in file
#include <iostream> ///Cout
#include <fstream> ///Open file
#include <atomic> ///For total sum
#include <thread> ///Parallel working


using namespace std;
using namespace boost::filesystem;


atomic_int TotalSum;
/*
* Func for reading and checking file content.Return readed value.
*/
void ReadFile(path InputFileWithPath)
{
  using boost::lexical_cast;
  using boost::bad_lexical_cast;
  int Answer = 0;
  std::ifstream InputFile(InputFileWithPath.string());
  string tmpString;
  if(InputFile.is_open())
  {
    while(!InputFile.eof())
    {
      getline(InputFile, tmpString);
    }
    
  }
  try {
    Answer=lexical_cast<int>(tmpString);
    cout << InputFileWithPath.filename() << ": " << Answer << endl;
    TotalSum += Answer;
    this_thread::sleep_for(chrono::seconds(1));
  }
  catch (const bad_lexical_cast &) {
    ///Do nothing
  }
}

int main(int argc, char *argv[])
{
  TotalSum = 0;

  path InputPath(argv[1]);

  directory_iterator EndIterator;

  for(directory_iterator FileIterator(InputPath);FileIterator!=EndIterator;++FileIterator)
  {
    if(is_regular_file(FileIterator->path()))
    {
      thread ReadFileThread(ReadFile, move(FileIterator->path()));
      ReadFileThread.detach();
    }
  }
  cout << "Final sum: " << TotalSum << endl;

  return 0;
}

But judging by the result of the work, I see what I did wrong. For if you join the stream, then all reading will take place as in sequential mode. With detach, as written in the code above, it makes no sense to put the stream to sleep for 1 second. How can files be processed in parallel?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
N
nirvimel, 2016-10-03
@rusbaron

Start vector<thread>.
First, in a loop, spawn all the threads and store them in a vector.
Then another cycle to go through all and do to joineveryone.
PS: In real code, from a performance point of view, it is better to check is_regular_file inside the thread too. And it makes no sense to read all the lines from the file in a loop for the value of the last (maybe it would be better - the first) line. What if a multi-gigabyte file comes across?

M
mayton2019, 2020-06-18
@mayton2019

This appears to be a learning task. There is little practical sense of parallelism here.
A typical disk subsystem on a home laptop consists of 1 HDD/SDD. And it doesn't parallel. That is, this is such a zhlobsky device that at 1 point in time can serve the reading and writing of 1 block of the file system (or sector or cluster, it doesn’t matter). Therefore, parallelism really does not give anything. However, if you have some kind of RAID arrays or storage area network, then it can provide such actions.
What else in the problem is bad.

while(!InputFile.eof())
    {
      getline(InputFile, tmpString);
    }

Непонятно что здесь происходит? Перемотка в конец файла? Почему так дорого? Почему мы должны каждый раз получать строки? Ведь нам нужна только последняя? Может надо было взять первую? Вобщем непонятно.
Вот если-бы автор решал задачи наподобие map-reduce где работают с большими файлами тогда параллелизм был бы понятен. Там чтение блока файла чередуется в вычислениями.
Здесь вычисления - мелкие. И большая часть процессорного времени зря сгорит в старт-стопах потоков и в финальном join потоков (которые я кстати не вижу).

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question