PHP script with a huge execution time - how to write this correctly?

Spider842013-07-30 12:42:18

PHP

Spider84, 2013-07-30 12:42:18

Greetings to all Khabrovites

There is a script for parsing an xml file, followed by writing to the database and uploading images by urls from this file. The script uses SimpleXML. I’ll make a reservation right away that I didn’t write it, I’m only now bringing it to the state we need by adding the necessary things.
The problem is that the volume of the file that we are parsing is very large - 8000-8500 records. Plus, for each entry, from 3 to 5 pictures that he downloads. As a result, the script parses about 6500 records and then quietly stops. I tried to run this script on a hosting where you can raise the maximum execution time - it helps but not completely - it parses somewhere around 7500-7800 records and stops. Are there any other restrictions that need to be raised?

Please tell me about the methods of writing such scripts that process large amounts of data. It will not be possible to launch it through cron - since this is an extension to one of the cms.
I would be grateful for any thoughts and ideas

Answer the question

In order to leave comments, you need to log in

10 answer(s)

Igogo2012, 2013-07-30
@Igogo2012

I designed such scripts as a PHP console command.
And everything worked according to this principle:
1) We start the command, it writes to the database (in a special plate) that it is running
2) Ajax is launched on the frontend, which from time to time checks the plate with the status of the command by its identifier.
3) The command ended with an error and recorded its “error” status in the table, for example, and an error message, the ajax request saw this and already reported to the web interface.
4) The command ended in success - similar to paragraph (3)

MaxUp, 2013-07-30
@MaxUp

how does it start? In automatic mode or a person starts?
If the latter is AJAX + use the register_shutdown_function () function to catch the moment the task is interrupted, return the flag that the import is not finished + the number of the last record. Repeat requests automatically until the task is completed.

CrazySquirrel, 2013-07-30
@CrazySquirrel

Of course, I understand that giving such advice when the software is already written is not the best idea, but still I advise you to consider the possibility of parallelizing the process, for example, using gearman, this will increase the performance of your script.

begemot_nn, 2013-07-30
@begemot_nn

if the script is launched from the browser (whether by Ajax, or just by pulling the link) do not forget to set
ignore_user_abort(true);

rozhik, 2013-07-30
@rozhik

The SAX parser will be much more memory efficient, and likely much faster.

Iskander Giniyatullin, 2013-07-31
@rednaxi

You can write a script so that it parses, for example, 100 records and then saves the current position and restarts, starting parsing from the saved position.
So solve the problem with runtime and everything else

lubezniy, 2013-07-30
@lubezniy

Here's another, albeit perverted and crutch, but in some cases almost indispensable option (basic knowledge of PHP and HTML / JavaScript is required):
1. A script is written that parses XML and compiles a table with image URLs in HTML form.
2. A script is written, to which two parameters are passed: the value of the line in the table from paragraph 1 and the number (index) of the line in this table. The task of the script: in the PHP part, fill in the image and write what is needed to the database, and in the onload on body (javascript) register a redirect to itself, but with a value (obtained from javascript) and the index of the next line of the plate. If the index is equal to the number of rows, alert that the fill is over.
3. A simple html page is written from two frames: in the first frame, the script from paragraph 1, in the second - the script from paragraph 2 with starting values. This page opens in the browser, after which you can go to sleep. It is desirable to keep the computer on a reliable channel in the Internet and an uninterruptible service so that the connection is not interrupted.

mihavxc, 2013-07-30
@mihavxc

My php scripts run in the background via cron sometimes work for a week (work with a third-party API + pumping out images + writing to the database).
I get by with one
set_time_limit(0);
If it does not help, look in the web server logs, your database may be falling off due to a too long session. The error will be something like this
MySQL server has gone away

Alexey Sundukov, 2013-07-30
@alekciy

It is absolutely not necessary to download pictures with the same script. I usually just save the url to a file which is then given to wget (the -i flag). It also happens that the external binding (CMS) saves files in some kind of structure. Therefore, for such a case, you can pre-pump the pictures into the local file system with wget. In order for the image address in the FS to be the same as the URL, you should use the -x option. Here is a typical download example: It downloads
wget -x -b --user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6" --referer="http://examplke.com/" -i img_url_list.txt
very quickly + it is easy to parallelize if you split the file into pieces and run from different servers. I have 200,000 images from two servers with a total volume of ~ 8GB loaded for about an hour.

UTD, 2014-07-18
@UTD

Used like this:
set_time_limit(0);
ini_set('max_execution_time', 0);
like this:
set_time_limit(0);
like this:
set_time_limit(9000);
after some time it gives out 504 Gateway Time-out
how can I overcome this so that the script would still complete its functions outside of time and open the page back without this error?
Or is it possible somehow that he would break the work of the script into desks, let's say 15k lines are written to the database, and that he would restart the timeout counter when approaching the timeout?