There is a huge csv file with data (over 100k lines). How to drive all this goodness in PHP, without SSH, into MySQL?

alex stephen2014-02-06 18:10:46

PHP

alex stephen, 2014-02-06 18:10:46

Hello. Task: I have a huge csv file with data (over 100k lines).
You need all this goodness in PHP, without SSH to drive into MySQL database ... Limit 30 sec.
I tried to read a file of 1000 lines (each time I have to open it), of course, it does not fit in 30 seconds.
How do gurus do such things?

Answer the question

In order to leave comments, you need to log in

10 answer(s)

Sergey, 2014-02-06
@berezuev

Read about LOAD DATA INFILE ( dev.mysql.com/doc/refman/5.1/en/load-data.html )
for example, with its help, 40 million 4kb rows were loaded in 40 minutes (and the bottleneck was php, which generated this data). .in your situation it will be much faster I think.

lizergil, 2014-02-06
@lizergil

The algorithm in your case will be as follows:
1. Deleting all indexes from the table where the data is planned to be written.
2. Opening a file (fopen).
3. Read m lines (fgets) until end of file.
4. Compiling a query in the form of a single batch (batch): INSERT INTO ... VALUES ( %row1%, %row2%, ... , %rowm%);
5. Execution of the request.
6. Go to step 3.
7. End of file, close file, build remote indexes.
If steps 3 and 4 are performed in parallel, you can save on memory.
Regarding the limit: the complexity of the algorithm is O (n) - i.e. linearly depends on the number of lines in the file, either optimization (using low-level utilities for inserting data, but this data must be prepared in advance) will help speed up (if not enough) ), or the use of more productive hardware (client, network, server).

Vladimir Merk, 2014-02-06
@VladimirMerk

I had to parse files with a huge number of e-mails, under similar conditions. I used ajax as a pad. From one, php took the data to the client, and then sent it in batches to another, where it inserted it all into the database.
I can send you this script by mail, although it is rather clumsy, it was done in a hurry. To use it, it is better to split the file into several parts and start parsing in several windows to make it faster. If you add anything, as you need.

svd71, 2014-02-06
@svd71

I tried to read a file of 1000 lines (each time I have to open it), of course, it does not fit in 30 seconds ...

Such things are done through AJAX . The file is opened on the server and information about the number of lines is sent to the client's browser. Further, the client calls short tasks via AJAX: enter lines from 1 to 10 into the database. Even with a large file, short operations do not take such a long time. After executing this little piece, the server reports the result of the operation via AJAX: successfully entered, an error in the NNN line, or something else.
The downside is that the client must have a browser session open all the time. When it closes, nothing else is sent to the server and operations stop.

lubezniy, 2014-02-06
@lubezniy

I usually beat the file into parts on the client side, copy it to ftp. And there a special script with a GET parameter, which contains a counter, reads the first part first, gives a redirect to the second; second to third - and so on until the end of the fill.

vapmaster, 2014-02-06
@vapmaster

Do not open the entire file, but in chunks of N bytes. For example, using CURL.

Alexander Zelenin, 2014-02-06
@zelenin

and open the file once there is not enough memory or what?

easyman, 2014-02-06
@easyman

And what prevents the php script from reading the file not from the beginning and writing on which line (on which byte in the stream) did they stop?

Fayozzhon Berdiev, 2015-12-28
@CybernatiC

Dear comrades, I have the same dregs :-D save

Egor Kazantsev, 2015-12-29
@saintbyte

100k - gently excel into pieces (excel pulls up to 999 999). then phpmyadmin neatly pieces into the base.