B
B
burguy832019-03-21 12:04:37
SQL
burguy83, 2019-03-21 12:04:37

How to speed up inserts in sql database?

We have xml with 1.5 million records. This file weighs about 3GB. I parse it DOMdocument and immediately upload it to the database in a loop.
The code scheme is like this.

foreach ($leiLEIobj as $leiLEIs)
{
            foreach ($leiLEIobj as $leiLEIs)
            {
               $LEI = $leiLEIs->nodeValue;
               $arResult[$LEI][$leiLEIs->nodeName] = $LEI; 
                //$LEIs[$row->nodeName] = $row->nodeValue;
..... много много всякий форейчев
           
            } 
$qwery = "INSERT INTO `leis`($fields) VALUES ($values)
}

It turns out that I write to the database every iteration (READ-WRITTEN). In total, 1.5 million arrays are obtained with 30 fields each. The base is empty and at first it pours quite quickly at 1300 every second, but then, approaching 100.000 records, it starts a sharp drop to 30 inserts. And then generally falls to the 1st. Server is decent. And this whole process to fill in the database takes several days. I have a Mayasin table, there is one index field with an id primer.
I need options for accelerating inserts. There are those that I tried to apply but did not help.
For example, load in file is not suitable. in csv data is written crookedly, in terms of values ​​​​moving down to other fields, since in the xml document in each array there is a different number of data.
I heard about batch insertion, but I don't know how to implement it. What are the thoughts. Thank you.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
A
Alexander Kuznetsov, 2019-03-21
@DarkRaven

Your best bet is to do a batch insert, or else convert the xml to the corresponding sql (with the same batch inserts) and execute it via
Pay attention to -p password, between them, if my memory serves me, there is no space.
For the SQL version, form like this:

INSERT INTO tbl_name
    (a,b,c)
VALUES
    (1,2,3),
    (4,5,6),
    (7,8,9);

Only not three lines, but 100-500 each - I think you will choose the ratio.
Sample shaping code based on your example:
<?php
$batchSize = 1000;
$counter = 0;
$valuesBatch = array();

foreach ($leiLEIobj as $leiLEIs)
{
    foreach ($leiLEIobj as $leiLEIs)
    {
        $LEI = $leiLEIs->nodeValue;
        $arResult[$LEI][$leiLEIs->nodeName] = $LEI; 
        //$LEIs[$row->nodeName] = $row->nodeValue;
    }

    $valuesBatch[] = "($values)";
    $counter ++;
    
    if ($counter==$batchSize)
    {
        $qwery = strtr(
            "INSERT INTO `leis`($fields) VALUES ($values) :text",
            array(
                ':text' => implode(",\r\n", $valuesBatch)
            )
        );

        // Выполнить запрос или записать его в общую переменную-накопитель
        $counter = 0;
        $valuesBatch = array();
    }
}

PS csv can also be generated as needed from XML, adding default values ​​to those that are not in the xml string, and then something like this:
LOAD DATA LOCAL INFILE 'abc.csv' INTO TABLE abc
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"' 
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(col1, col2, col3, col4, col5...);

R
res2001, 2019-03-21
@res2001

Specify which DBMS you are using.
To increase insert performance, you can disable/delete indexes on the table where the insert is taking place. After the operation is completed, turn it on again. Recreating the index after an insert will be faster than recalculating it after each value is inserted.
In addition, DBMS usually support bulk insert operations (batch insert), for this you may have to convert the file into a format that the DBMS understands. MySQL does not seem to have this, in this case only the option proposed by Dmitry remains

Z
zhaar, 2019-03-21
@zhaar

In MSSQL, Bulk Insert works faster than the usual insert into ... values ​​(), so maybe it makes sense to convert xml into a plain text file and import it into the database already?

P
ponaehal, 2019-03-22
@ponaehal

I'm confused by the gradual drop in performance.
According to my unconfirmed feelings, the reason lies in the mechanism for supporting the transactional nature of the database you use.
You are trying to insert a bunch of records in one transaction, the mechanisms for supporting transaction rollback (in case a rollback command is executed or a session crash) require significant database resources. At about 100.000, these resources are enough, and then some kind of trash begins (you need to read depending on the database).
What would I advise to start with:
1. Using the database administration tools, look at the expenditure of resources (to understand what is spent on).
2. If my guesses are confirmed, then a batch insert is unlikely to help you (although it can be implemented differently in different databases).
3. Try inserting a commit every 50,000 entries. Thus, you will relieve the load on the mechanisms for supporting transaction rollback.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question