Cron, wget and biting off part of a file - why can the stack not work again?

G

Grigory Vasilkov2017-02-10 14:05:14

linux

Grigory Vasilkov, 2017-02-10 14:05:14

Hello, I ran into a strange situation - a script was written that, when directly accessing the URL, works out correctly in 100% of cases.
If you run it from the server console via wget, everything also works fine.
But if you put it on cron using isp-manager, then it sometimes works as it should, sometimes with errors, sometimes it doesn’t work at all, and sometimes it works half, and the rest later, breaking the whole logic of the program.
The script in short works like this:
1. Read the price list file with the file() command into an array and save a copy of the file
2. Pick up 100 lines that are defined as products, and any number of others that were in between.
3. We work with this hundred lines - we write to mysql one by one as expected
4. Save the remaining data to the original file
5. At the next start, the file is no longer copied, but only read
Since each file call is tied to a specific price list, in order to process the entire script at once, use header("Location: " ) to itself and the die() function, while passing parameters - step size and file name, to avoid accidental parsing of the "wrong file".
If you run the script through the browser - it works out perfectly in 100% of cases, the file gnaws at itself, the data is uploaded.
If you run it via shell wget --max-redirect=1000 "url? params" - everything also works fine It's
worth hanging it on Cron
There is nothing in the shell logs, as it should be
In the logs of the file, when it worked crookedly - everything is as it should be, the report, the inscription "script is completed"
But in the cron logs it catches 504 Timeout from the nginx server, and perhaps some kind of badyaga is happening here (probably)
That is, it is a browser also catches 504, but the script continues to run.
But according to the crone, it seems that without waiting for the end of the script, it follows the redirect and starts the second thread of execution, forgetting that the first one is still being done.
As a result, the following situations are observed:
1. The file was copied, everything was loaded, but the original does not change. I thought it was right, put 666, the same
2. The file was copied, they gnawed off a piece, uploaded this piece and that's it, it became a stake
3. The file was copied, everything worked fine, but after a while one of the pieces of the original appears again in the folder, as if it was first executed, and then finishes the work asynchronously
and similar miracles.
Who can tell me how to debug this, why errors are possible?
I haven't tried running the "php" command with attributes via the shell, only wget so far. But it is interesting that with a direct launch, everything works, but according to the crown, some of the files were loaded, others were not loaded. It worked in one folder, but in the next one, exactly the same code gives one of the above errors
The most annoying thing is that a dozen more crowns work on the same principle nearby, and they work perfectly, because there is no "biting off the file", but the line number and category IDs in the database are simply remembered, which creates a bunch of links and the inability to modify the script - there are 10 thousand lines of code, in my case 900.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

G

Grigory Vasilkov, 2017-02-17
@gzhegow

Вероятный ответ напишу, пока не отмечаю:
1. В коде имелась функция register_shutdown_function() которая писала логи при завершении программы
2. После редиректа происходил вызов статического метода Import::end() который добавлял в лог "Спасибо, скрипт все" и вызывал exit();
3. Несмотря на это пришлось разобраться в консоли дебиана и поиграть с командами top/free, поизучать логи, посмотреть, что все вроде бы работает...
4. На мысли наводила только надоедающая "Mysql server is gone away", но казалось бы - почему?
С одной стороны я на Битриксе такую ошибку часто ловлю - код у них паршивый, раз требуется в 10 раз более мощный сервак, чем для любого интернет-приложения.
С другой стороны сам скрипт работает с Редиректом, и несмотря на то, что транзакция закрывается и от сервера отключаемся - что-то шло не так.
Когда я вывел процессы через top, и посмотрел оперативку через free - оказалось оперативки нет, более того - после отключения сервера mysql - он даже не захотел включаться, недостаточно памяти.
Память увеличил на виртуалке, и все пошло правильно работать. Тут то и пришла идея - а что если. Вывел top и увидел ужас- на меня одного запущено 32 апача, как будто он вообще не завершал процессы.
А ведь функция exit() все таки есть. Видимо где-то происходит косяк.
Я сделаю пару тестов без основного кода загрузчика, попытавшись поймать жука, но на кроне, чтобы убедится, что в этом все дело. Если да - отмечу решением.