S
S
Spooky 20202020-12-22 00:12:39
Python
Spooky 2020, 2020-12-22 00:12:39

Why does the Python MySQL Connector DB driver work an order of magnitude faster than in other languages?

Tested various technologies to automate data insertion into the database, under the following conditions:
- MyISAM storage system
- MySQL 8/MariaDB 10.4 DBMS
- Insertion of 33,000 rows in 5 fields
- C#/PHP/Node/Python code
- both prepared and normal direct insertion of static values
​​- type query

INSERT INTO tablename (fieldA, fieldB, fieldC, fieldD, fieldE) VALUES (a, b, c, d, e)

Table
CREATE TABLE `zip_code` (
    `id` INT(11) NOT NULL AUTO_INCREMENT,
    `zip` CHAR(5) NOT NULL,
    `city` VARCHAR(50) NOT NULL,
    `stid` CHAR(2) NOT NULL,
    `state` VARCHAR(50) NOT NULL,
    PRIMARY KEY (`id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM
;

The result for C#/PHP/Node is 14-15 (fourteen - fifteen) seconds.
The result for Python is 1.5 (one and a half) seconds!

Why such a difference in insertion speed?
In python, I used cursor.executemany() passing an array with 1000 lines.
In PHP (PDO), I looked at all the options, but nothing came up for more optimization.
C#/Noda - I used it there as indicated in the docs, I did not find any additional options.

What kind of special mechanisms work in Python MySQL Connector (from Oracle) that allows you to
do the rest?
Which is a bit of a shame - after all, I believed in PHP and compiled C# so much.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
G
galaxy, 2020-12-22
@spooky_2020

Offhand: cursor.executemany() connects queries into one big INSERT. Other drivers themselves do not seem to be able to do this.
I think if you manually assemble a multi-row INSERT, then something like this will come out:

INSERT INTO tablename (fieldA, fieldB, fieldC, fieldD, fieldE)
VALUES (a, b, c, d, e),
(a1, b1, c1, d1, e1),
(a2, b2, c2, d2, e2),
...

I
index0h, 2020-12-22
@index0h

1k lines are usually inserted by separate inserts when performance is of importance somewhere in the distant and bright future, or from lack of experience.
If the issue of performance is still important - either we perform a multiple insert in one request, or we wrap it in a transaction.
Author, run your own test, but with a transaction around your thousand inserts, the results will be different.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question