M
M
maxism2012-02-02 18:16:46
MySQL
maxism, 2012-02-02 18:16:46

Adding large amounts of data to MySQL from Node.JS

Hello!

Task:
A bunch of large text files (from 1.5 to 5 GB), which store the necessary information, need to be parsed in Node.JS and uploaded to the MySQL database. I solved the parsing task using fs.createReadStream() and the node-lazy module, but I can’t cope with the task of adding parsed data to the database. By executing in a loop through all the lines of the insert file in the database, Node.JS starts to get fat to unimaginable sizes and eventually, being in 0.8 GB of RAM, falls out of memory.

I can’t figure out what the problem is and gradually come to the conclusion that it’s in me, since a banal cycle of static inserts of the same type does not cause obesity in Node.

As a connector to MySQL I use the standard node-mysql. The problem is solved on Windows. Here is the code:

var lazy = require("lazy"),
fs = require("fs"),
mysql = require('mysql');

var client = mysql.createClient({
user: 'root',
password: 'root',
database: 'db'
});

new lazy(fs.createReadStream('./dump1.txt')).lines.forEach(function (line) {
var a = line.split('t');
client.query('INSERT INTO `table` (`a`, `b`, `c`) VALUES (?, ?, ?)', [a[0], a[1], a[2]]);
});

Answer the question

In order to leave comments, you need to log in

4 answer(s)
P
phasma, 2012-02-02
@phasma

MySQL has LOAD DATA

Z
zapimir, 2012-02-02
@zapimir

Check in parts, for example, comment out the query in MySQL, will the script work? If it works, instead of executing the query, write them to a file
Add many lines at a time
INSERT INTO `table` (`a`, `b`, `c`) VALUES (1, 1, 1), (2, 2, 2),… (n, n, n);
Disable indexes at the start of adding.

B
balloon, 2012-05-12
@balloon

Apparently lazy sins with a memory leak. Try writing your own stream.on('data',… ) handler that will break blocks of text into lines and process them immediately. An example can be seen here: github.com/j03m/node-csv2xml/blob/master/lineByline.js . As a result, get control over memory usage.

T
Tenkoff, 2013-05-08
@Tenkoff

you produce data for the insert faster than the node pushes it, respectively. everything that does not have time to shove into the buffer
ps It's not about libs, no lib can cope with such a task, change the approach to data processing

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question