Multithreading algorithm?

C

csar2017-06-25 21:02:12

C++ / C#

csar, 2017-06-25 21:02:12

There is a program that bypasses all disks, and when certain types are found, it creates a hash of the file. I want to implement multithreading to speed up the program. How to do it right?
Create a separate thread for each found file? Those. in pseudo-language

if(Filefound)
   new_thread(gethash(file))

not an option, because load percent.
Interested in the algorithm, in any language. Maybe pseudo.
Thank you.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

N

nirvimel, 2017-06-25
@csar

No need to create a separate thread for each file. Create two thread pools.
Pool #1 (computing): the number of threads is equal to the number of CPU cores. Blocks of data are poured into the input queue, for which the pool threads calculate hashes.
Pool number 2 (synchronous reading from disk): the number of threads is equal to the number of cores multiplied by some constant (in the sources of various libraries, I saw values from 2 to 10). The names of files are poured into the input queue, which the threads of the pool open, read and send the read blocks to the input queue of pool No. 1.
Note: memory consumption is regulated by limiting the maximum length of the input queue of pool #1. In practice, it turns out that pool #1 limits the load on pool #2, which is normally underloaded.
A separate thread that traverses the directory tree and sends the found file names to the input queue of pool #2. The length of this queue can also be limited, but not so hard (I would set the size to several hundred).
PS: All queues with a length limit must, of course, be with locks (not lock-free), since through them the load is regulated (otherwise all threads will be loaded at 100%).
The size of the data blocks coming to the input of pool No. 1 should not be made too small (I would set 64 kilobytes, for example).