V
V
Vertexis2021-04-03 11:38:05
PHP
Vertexis, 2021-04-03 11:38:05

Asynchronous work ZipArchive (::addFromString) loses files, how to win?

Input:
OS: Debian 9
PHP: v7.4
There is a directory with ~1KK {unixtime}.json files up to 1Mb in size.
I do renaming of files with its internal contents changing, and then I try to pack them all into one archive using the addFromString method.
The essence of the problem is that when using the ZipArchive methods asynchronously, I get the necessary archive at the output, but only with the last processed file within the framework of asynchrony (To be completely precise, N / t files get into the archive, where N is the total files to be added , t - number of processing threads)

I give below a simplified scheme of work ($dir,$newFileName, $stringData are defined at this point):

$za = new \ZipArchive();
 ($za->open($zipFile, \ZipArchive::CREATE) !== TRUE) {
      throw new \Exception('Cannot create a zip file');
 } 
$arFile = $dir . '/' . $newFileName . '.json';
$result = $za->addFromString($arFile, $stringData);
$error = $za->getStatusString( );
$arIndex = $za->locateName($arFile);
$arInfo = $za->statName($arFile);
$closeResult = $za->close();
 var_dump($error, $arIndex, $arInfo, $closeResult);

Conclusion
4 => string(8) "No error"
int(0)
array(8) {
["name"]=>
string(23) "/orders/01_01_2021.json"
["index"]=>
int(0)
["crc"]=>
int(0)
["size"]=>
int(1006)
["mtime"]=>
int(1617438268)
["comp_size"]=>
int(1006)
["comp_method"]=>
int(0)
["encryption_method"]=>
int(0)
}
bool(true)


As you can see, the file is successfully added, has its own index and exists until the archive is closed.
Another point, with asynchronous addition, it turns out that, within the framework of the number of threads, the files being added have the same index (in my case, 0), which apparently leads to the "loss" of the files being added. Perhaps this is not the correct assignment of indexes, but I cannot influence it. How can this be overcome?
PS - I would not want to write intermediate files. The initial presence of the archive does not change the situation.
Thanks for the help!

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
s5656, 2021-04-08
@s5656

Well, let's turn to the documentation and try to understand at what point changes occur in the final file:

ZipArchive::close — Close opened or created archive and save changes. This method is automatically called at the end of the script.

That is, either when ZipArchive::close is called, or at the end of the script (since ZipArchive::close is automatically called).
ZipArchive itself is synchronous.
If in one thread, then you create and edit one archive, through one ZipArchive object, and when you call close, everything is saved to a file, to disk.
If you work in several threads, then you create several ZipArchives that do not know anything about each other, and when you call close, each of them tries to save their changes, and here the last one and dad saved their version.
You either need to somehow pass a ZipArchive instance between your threads, or write your own (or find a ready-made) library for asynchronous work with zip archives.
As a simpler option, do work with files asynchronously (in a temporary directory?), And then archive them through the same zip (console utility).

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question