V
V
Viktor Taran2021-08-11 16:01:28
bash
Viktor Taran, 2021-08-11 16:01:28

How to copy to multiple threads?

The task is to copy 54 million files
, 95% of the disk is already occupied. I
wrote a script that separately compressed folders and copied them to a remote server and unpacked them.
But the problem is that even for 1 folder there is no space, and some of the subdirectories have the same situation.
In general, too many exceptions in the cycles had to be done, and so on.
If you copy to 1 thread, then 320 hours
Actually, since when copying files, the problem is not in size but in quantity, that is, the idea is to copy, say, in 100 threads, thereby reducing it to 3 hours, which is quite viable for IO nvme disks, this is not at all disturb.
in theory, something through find (with max dep3) and xargs
can someone tell me how best to do it, ideally an example.
All this fun is fueled by work 24/7 and then I will need to copy the discrepancies separately; in general, it is desirable that if something is all this, it is also repeatable.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
S
Saboteur, 2021-08-11
@shambler81

Wrote a script that individually compressed folders and copied to a remote server unpacked there

So compress immediately to a remote server
tar cvfz - mydirectory|ssh [email protected]_server "cd target_directory;tar xvfz -"

Or even simpler would be to run multiple instances of rsync in the background via xargs or parallel

A
Armenian Radio, 2021-08-11
@gbg

Something like this using GNU Parallel
find . -print0 | parallel -0 -j10 cp {} destdir
Source

V
Vladimir, 2021-08-11
@MechanID

I do not quite understand what prevents the lack of space on the source.
What to do depends on the structure of the directories and the distribution of data in them, for example, make a list of directories of the 2nd or 3rd or Nth level and run your own rsync for each directory.

H
hint000, 2021-08-11
@hint000

Just a direction, no example.
Generate a text file (for 54 million lines) with a list of files.
Split it into 100 files of 540 thousand lines; Roughly, the number of lines will be different, but in this case it doesn't matter: split -n l/10 имя_файла
In a loop (over 100 files), start copying processes in the background that receive a list from a file.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question