B
B
BarneyGumble2021-05-20 11:56:55
PHP
BarneyGumble, 2021-05-20 11:56:55

How to import large JSON (18gb) into MySQL?

There is a 18GB JSON file. It is essentially a list of line-by-line objects and is not valid JSON.

Looks like this:

{"address": "Gabriele-Tergit-Promenade 19, Berlin", "amenity_groups": [{"amenities": ...}
{"address": "Passeig De Gracia, 68, Barcelona", "amenity_groups": [{"amenities": ...}
...
{"address": "Great Cumberland Place, London", "amenity_groups": [{"amenities": ...}


I need to drive this whole thing into MySQL. What are the best ways to do this?

Writing a PHP script comes to mind, which reads the json file and then line-by-line receives the json of each hotel, distills it into an array, parses and inserts the necessary data into the database. The problem is that the source file is 18GB and my script will die on memory usage at the start of reading the file, no matter what I set it to (I have 4 GB RAM on VDS).

What other ideas and options do you have?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
F
Flying, 2021-05-20
@BarneyGumble

You don't need to load the file into memory, there is threading for that:

<?php
$fp = fopen('big-file.json', 'r');
if (!is_resource($fp)) {
    throw new \RuntimeException('Failed to open file for reading');
}
while (!feof($fp)) {
    $line = fgets($fp, 32768);  // Укажите лимит на одну запись
    if (empty($line)) {
        // Пропускаем пустые строки
        continue;
    }
    $data = json_decode($line, true, 512, JSON_THROW_ON_ERROR);
    // ... записываем в базу
}
fclose($fp);

N
Never Ever, 2021-05-20
@Target1

One of the first solutions that came up was to use SplFileObject .
Open as a file and in batches of 500/1000 writes to the database

$file = './file.json';
$spl = new SplFileObject($file);
$spl->seek(177777);
echo $spl->current()

R
Romses Panagiotis, 2021-05-20
@romesses

This is a JSONL format file .
JSON is the native format for MySQL 5.7+ and each row can be entered into a JSON database column as is, without prior decoding in PHP. It is possible and in the temporary table, and then to extract specific fields.
That is, it is enough to read the lines and, using yield, send them to the buffer-collector. When the size is reached, let's say 1000 lines, execute , flushing the contents of the buffer. https://www.w3schools.com/sql/sql_insert.asp https://www.w3schools.com/sql/sql_insert_into_sele... Or import the file using the Mysql Shell utility . INSERT INTO ... VALUES (...)

A
Akina, 2021-05-20
@Akina

I need to drive this whole thing into MySQL. What are the best ways to do this?

The best way is to put this file where MySQL can reach. Import using LOAD DATA INFILE into a temporary table. Parse with a simple query into working tables (as I understand it, although the entire file is not valid as JSON, each individual line of the file is JSON valid). And to beat the temporary table.
For everything about all three requests. If you really want to, you can also execute them through php, of course. But I would push them into a stored procedure (especially if the task of importing updated data will be regular) and call it - then generally one request CALL proc_name; .
And you can keep within one request if you use LOAD DATA INFILE with preprocessing. Then the validity of JSON per line is unimportant =- as long as the data format in the line does not float from one line to another.
And to drive 18 hectares from a disk in PHP, and then from PHP to MySQL - well, it's not serious.
PS. Loading using LOAD DATA INFILE is very undemanding to the amount of RAM. It doesn't matter what size the source file is.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question