I
I
Igor Katkov2015-09-01 09:21:57
symfony
Igor Katkov, 2015-09-01 09:21:57

Why does the script slow down?

Good afternoon!
I'm working on an XML parser algorithm that should extract products and add them to the database. Faced with the problem that the more documents (more products) the slower information is added to the database. For example, 2k products are added in 20 minutes, and 16k in 20 (!) Hours. A similar picture is observed even if the xml document is reassembled into an array. What could be the problem? What do I need to learn to solve speed drawdown?
P.S Sample document

Answer the question

In order to leave comments, you need to log in

4 answer(s)
S
Sergey, 2015-09-01
@iKatkovJS

optimize the work with the base. I am 99% sure that:
- products are inserted one at a time
- before each insertion, you check the presence of categories and other things through the database, and you don’t have indexes in the database or mysql (do you use mysql?) is configured in a default way and a lot of readings come out from the disk
Although even in this situation, 20 hours for 16K elements is somehow very long...

V
Vlad Pasechnik, 2015-09-01
@jumper423

It seems to me that you are doing this with separate INSERTs, if so, then you need to optimize it. To do this, compose one-time INSERTs with the addition of, for example, thousands of records at a time.

O
Oleg Shevelev, 2015-09-01
@mantyr

The speed of inserting into the database can be checked separately by looking at SHOW PROCESSLIST;
However, based on the description, your problem is not with inserting into the database, but with processing a large XML file. jaxel correctly suggests that it's bad practice to load all XML into memory, but that's probably what you're doing.
Armenian Radio correctly points out that it is necessary to use monitoring and profiling tools. Try to install and master xhprof, it will be very useful for you in terms of a general understanding of what works and how and you can answer the question exactly - where to spend the most time, without guesswork and conjecture.
Google on the topic "xml streaming parsing", with it you will be able to parse with almost no restrictions and with minimal resource consumption, within the limits of what is possible in PHP, of course.
For comparison, the case of parsing a book catalog from ozon.ru to Golang:
- 3.9 gigabytes xml file
- 2.3 million books (each book has a title, authors, description, link to the cover, list of languages, book price, list of categories and a separate list of all categories to books)
- 10 minutes of streaming parsing with saving to the database
- without first saving to disk
- with auto-resume in case of network breaks

A
Armenian Radio, 2015-09-01
@gbg

Run atop and see who is slowing down and what is eating - disk / network / CPU?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question