J
J
jorshjorsh952018-07-04 17:44:56
PHP
jorshjorsh95, 2018-07-04 17:44:56

How to do heavy import from excel 800k items?

Good evening, there was a problem to make unloading.
We have:
1. Photo
2. excel file with 800k products
3. empty sql table
Everything is fine, everything is simple:
1. connect any lib for working php with excel
2. parse information from the document
3. upload photo (file name = article)
4. fill in the sql table.
But maybe I underestimate the speed and capabilities of the most common hosting (I don’t know the characteristics of the hardware, but let’s take the most classic rubles for 200, 2 GB of memory, hdd)
, but I feel like when I try to do all this, the server will collapse and refuse to do it all 800.000 times, is this true, and how to find a way out of this?
As an option to break the whole process into parts, 50k goods at a time, but the question is how to implement this?
I have only 1 option in my head: Take
a slice of 50k records with an offset, and find out the offset by the GET parameter, for example:
site.ru/import?step=2 I think that I am doing some kind of collective farm nonsense with my slices with the get
parameter, can I somehow do it normally?

Answer the question

In order to leave comments, you need to log in

17 answer(s)
A
Alexander, 2018-07-04
@jorshjorsh95

1. Do you have an EXEL file or is it a CSV file that you open on the desktop using Excel?
2. If all the same EXEL file. Is there too much heaped up there, like layouts and formulas or bare tables?
3. If all the same bare tables. Can you make a CSV file?
If the data is in CVS format, then you can load everything using MYSQL and not use PHP or its libraries for processing. Then the result will be many times higher than if you sort through PHP and then feed it in MSQL
When I once faced the problem of loading a product file into the database, there were several million units, then this solution was optimal > LOAD DATA

A piece of my old MySQL code, for clarity
// Загружаем кашерный файл
LOAD DATA LOCAL INFILE '/srv/cms_cpa/files/adimport_items.csv' INTO TABLE adimport_tmp CHARACTER SET utf8 FIELDS TERMINATED BY '|' ENCLOSED BY "'" LINES TERMINATED BY '\n' IGNORE 1 LINES (id_adimport,article,available,currencyId,delivery,description,id,name,oldprice,param,picture,price,url,vendor,advcampaign_id,advcampaign_name);

// Загружаем только нужные поля (!!!)
LOAD DATA LOCAL INFILE '/srv/cms_cpa/files/adimport_items.csv' INTO TABLE adimport_tmp CHARACTER SET utf8 FIELDS TERMINATED BY '|' ENCLOSED BY "'" LINES TERMINATED BY '\n' IGNORE 1 LINES (id_adimport,@ISBN,@adult,@age,article,@attrs,@author,available,@barcode,@binding,@brand,@categoryId,@country_of_origin,currencyId,delivery,description,@downloadable,@format,@gender,id,@local_delivery_cost,@manufacturer_warranty,@market_category,@model,@modified_time,name,oldprice,@orderingTime,@page_extent,param,@performed_by,@pickup,picture,price,@publisher,@sales_notes,@series,@store,@syns,@topseller,@type,@typePrefix,url,vendor,@vendorCode,@weight,@year,advcampaign_id,advcampaign_name,@deeplink);

// Все поля
LOAD DATA LOCAL INFILE '/srv/cms_cpa/files/adimport_items.csv' INTO TABLE adimport_tmp CHARACTER SET utf8 FIELDS TERMINATED BY '|' ENCLOSED BY "'" LINES TERMINATED BY '\n' IGNORE 1 LINES (id_adimport,ISBN,adult,age,article,attrs,author,available,barcode,binding,brand,categoryId,country_of_origin,currencyId,delivery,description,downloadable,format,gender,id,local_delivery_cost,manufacturer_warranty,market_category,model,modified_time,name,oldprice,orderingTime,page_extent,param,performed_by,pickup,picture,price,publisher,sales_notes,series,store,syns,topseller,type,typePrefix,url,vendor,vendorCode,weight,year,advcampaign_id,advcampaign_name,deeplink);

M
Maxim Timofeev, 2018-07-04
@webinar

1. xls is a resource thief, we resave to csv
2. check the file for size and limits when sending via post
3. probably it should be split into parts, and not process the whole
4. probably it’s worth hanging processing tasks on cron
5. you can and even you need to set a longer timeout
If there is a web interface for this task, I would generally entrust the breakdown to the client. I sent it in small parts with ajax, I would receive a response, draw % of completion and send it further.

M
Max, 2018-07-04
@MaxDukov

xls save to csv, then LOAD DATA INFILE in mysql console. Uploaded files in this way for tens of millions of lines - flies with a bang.

M
Mykola Ivashchuk, 2018-07-04
@mykolaim

Imported 1.5 lyama lines into the database - everything is pretty fast.
1 - console application, why would you torture the server.
2 - I advise you to read https://github.com/box/spout
3 - upload the photo as an archive to the host, unpack it there, and in the script, only substitute the correct path.
And no magic is needed.

A
Alexey P, 2018-07-04
@lynxp9

Methods:
1. See if there are any restrictions on POST requests on your server (in my opinion, you can easily invest in them) and just insert everything with one request as Dmitry Bogdanov wrote using BATCH INSERT.
2. Compose an SQL query with imported data and write it to a file. Copy via scp to server and execute.

A
Antonio Solo, 2018-07-05
@solotony

optimal in terms of performance - LOAD DATA INFILE. but the "minus" - validation is not performed, and the update is also
needed if validation is needed - you need to parse it yourself. To speed up SQL, do batch inserts
insert ignore ..... values ​​(),(),() ...
if you need to update
insert ignore ..... values ​​(),(),() ... on duplicate key update
if the script is cut off by limits - create a task, and process
800K crowns, this is not much.

A
AlexSer, 2018-07-05
@AlexSer

put DBFORGE make loading from the XLS data, and write requests to the table.

D
Dmitry, 2018-07-04
@php10

do BATCH INSERT via PHP CLI

A
Alexander Zubarev, 2018-07-04
@zualex

It seems to me that it is better to split one large file into several files and not take a steam bath

P
Petr Vasiliev, 2018-07-05
@danial72

Convert xls to csv and use pipe approach. Read the file line by line without fully loading it into memory.
https://m.habr.com/post/345024/
It perfectly describes what you need

P
pingo, 2018-07-04
@pingo

>> we connect any liba
here the question is just .. I had something that I just didn’t try, from unset in each iteration to all other tambourines, until I changed the lib.
there were ~ 400k lines with 16 fields each, I don’t remember what I changed to, I just did composer require new \ officelib and ruled the class, but everything quickly happened on the second or third

A
Alexander Taratin, 2018-07-04
@Taraflex

https://github.com/box/spout is capable of streaming reading and writing.
Turn off indexing in the database before inserting. Insert in batches, not one record at a time.

W
WTERH, 2018-07-04
@Expany

Che passes, a cycle for the number of iterations, with a pause, no?
For example, for 100 iterations and a pause of 1s, no?

A
Andrew, 2018-07-05
@iCoderXXI

Convert XLS using a script to SQL in batches of 50k lines and shoot into the database from the console.

I
ivanovnickolay, 2018-07-12
@ivanovnickolay

https://github.com/box/spout is a great solution for reading xlxs excel files. I use it myself to load data with pre-validation. In reality, more than 12 MB of memory is not used. Compared to phpExcell, it's quite a fast solution.

A
Alex-1917, 2018-07-12
@alex-1917

Everything is fine, everything is simple:
1. connect any lib for working php with excel
2. parse information from the document
3. upload a photo (file name = article)
4. fill in the sql table.

)))))
1. Install www.mysqlfront.de
2. Feed your excel pixel
3. Start selling your 800k goods
ps There is little information about photos, so you yourself somehow)))
why not enough? well, if only because 800k photos in one folder to stick - a dead end. You need at least 1-3k per folder. Find any file manager and scatter it into folders, the simplest - the first three characters from the file name will be the name of the folder. Here again, collisions are possible, as there will be repetitions .. or maybe not ...
Although you have already done half the work (photo name = article).
This packaging must be done before uploading, then to the archive and uploading via FTP, unpack the file on the hosting .. Although 800k * 50kb = 40GB and this is if there is one photo per product! What kind of 2GB are you talking about?)))

M
Maxim Koreev, 2018-07-12
@MAXiDROME

Save as xlsx, rename to zip and unzip. Find xml there and do whatever you want with it

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question