How to properly sync products with CSV?

O

Outsider V.2019-12-29 15:00:25

Magento

Outsider V., 2019-12-29 15:00:25

Стади кейс, который мне привели на собесе, и который я провалил, т. к. никогда ни с чем подобным не сталкивался.
Есть файл CSV, с несколькими десятками тысяч продуктов, есть Magento 2 с такого же порядка количеством продуктов в базе. Задача - максимально эффективно, не положив сервак (из-за перегрузки по памяти) и не сломав базу, сделать следующее:
1. Обновить атрибуты продуктов, SKU которых уже есть в базе.
2. Добавить продукты, которых еще нет.
3. Удалить продукты из базы Magento, SKU которых нет в CSV.
Вопрос- каковым будет вцелом алогритм работы кода, который будет это делать. (а делать он это будет по крону скорей всего).
Может есть готовые решения (алгоритмы, подходы, паттерны), о которых я не знаю?
To all my attempts to write an algorithm, there were objections either that php would eat up a lot of memory, or that MySQL would not process such a huge request.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

L

Lev Zabudkin, 2019-12-29
@zabudkin

Who told you about memory - in the furnace.
1. All handles, or rather requests.
2. Everything.

A

Anton Shamanov, 2019-12-29
@SilenceOfWinter

blog.nagaychenko.com/2010/04/29/%D0%BA%D0%B0%D0%BA... most likely they wanted something like this

D

Dmitry Sviridov, 2019-12-29
@dimuska139

1 and 2. Given that the SKU, as I understand it, is a value that is unique in meaning, a UNIQUE index hangs on it. Thus, you can read the csv file line by line and use on duplicate key update . This will allow you to update attributes or add missing products to the table with just one cycle.
3.To remove products from the Magento database, the SKUs of which are not in the CSV, you need to read the product table in the database in blocks (for example, 100 lines each) and check for the presence of these SKUs in the csv file. If not found, write id to an array. After that, delete all rows by these id's using the sql-operator IN. If there are a lot of IDs in the array, it makes sense to also divide them into blocks so that IN does not have a lot of IDs. This can be optimized if, at steps 1 and 2, write the SKUs read from the SKU file into the php array (even if there are 100k lines, this will not eat up much memory in this case) - then you won’t even need to search in the csv file, it will be enough just check for the presence of the SKU taken from the database in this array.
PSabout 3. Perhaps it would be even more efficient to add a column to the table with goods, in which, at steps 1 and 2 in the same sql query, write (or rewrite the old value) the current date and time for processed records. After that, with just one query to the database, cut out everything that is old from the table (because if the old date and time are recorded, it means that there was no product with this SKU in the csv file, and it can be deleted).