How to bypass a large json file and update all related records in MySQL, with a <2 sec lag behind the file?

E

Evgeny Rudchenko2019-10-05 23:15:58

PHP

Evgeny Rudchenko, 2019-10-05 23:15:58

Hello. There is a parser script (parse.php) that saves the parsing result on the server in a file and JSON format. This parser updates the data every 1.5-3 seconds. The cron runs another PHP script (work.php), which gets the actual JSON every 2 seconds, parses its json_decode and loops through the array.
The array structure is like this:

events : {
              1 : {
                      id, data, title и т.д
              },
              2 : {
                      id, data, title и т.д
              },
}

For each iteration, work.php sends an ID to the getEvent.php script using fsockopen.
In getEvent is present ignore_user_abort(true);
In getEvent.php, again, get the actual JSON, the decode and the script looks for the "events" ID that work.php sent to it, then processes this data and updates it in the MYSQL database, the data in Mysql should not lag behind the data in JSON file for more than 2 seconds.
What is the essence of the question?) These manipulations consume 3GB of RAM and load the 4 x Xeon E5 2099.998 MHz processor by 90%! I need some alternative solution to do all this.
The RAM is being eaten because in work.php every 2 seconds json is received + in 600-800 open getEvent.php the same json is also received. Still in this way I have about 700 active processes on the server. Who can tell me how to simplify my code in terms of load? or suggest some other way to bypass such an array and update the data in MySql so that it does not lag behind the data in the json file by more than 2 seconds.
Help, save, my head is breaking already)) I looked towards the demons, but this is again a bunch of RAM and a bunch of processes on the server.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

G

Grigory Vasilkov, 2019-10-06
@Space88

First, if it is possible to change the parser - remove json
It is read into memory entirely, it cannot be "read line by line" using an iterator moving through the file reading 1 line
at a time It looks like you have implemented a queue, but only your own. Maybe take a queue (adapt a radish, there are simple queues in the kit - or see what other queues there are - there are a dozen of them in my opinion), so that the tasks fall into the list, and the second script sorts out the list, starting on time or when the queue starts executing the task, instead of hanging in memory and while (true) waited until he was told "work" ...
Yandex solves the problem of analytics of who clicked on what exactly in queues, dropping the "lag" from "2 seconds" to "what difference does it make to me, as the percentage is freed, we'll do it" - yes, tasks begin to be processed sequentially, not in parallel.
You can also write your script on the node, asynchronous can help there. or do it with the help of
new class extends \Threaded by including the pthreads.so/.dll extension.
The principle is to make several threads in one script that do not know about each other do your task.
But keep in mind that writing and updating in SQL is still queued, so there is a speed limit

A

Alexander, 2019-10-06
Madzhugin @Suntechnic

The correct solution is that the parser should immediately write to the database.
If this is not possible, you should try to eliminate overlays and double treatments as much as possible:

For each iteration, work.php sends an ID to the getEvent.php script using fsockopen.
... In getEvent.php, again, there is a get of the actual JSON, the decode and the script looks for the "events" ID that work.php sent to it, then processes this data and updates it in the MYSQL database

What the hell is this? Why can't work.php send data to getEvent immediately? Why does getEvent re-get JSON and re-look for an ID there?
And then you complain that:
And how many gigabytes are in your JSON that the process can't bypass it and send the muscle in 2 seconds?

M

mayton2019, 2019-10-05
@mayton2019

With such a setting - really nothing can be done. It's just not designed for such operations. But at least the first time it must be tightened into the base. And always store this data there and process it there.

A

Andrey, 2019-10-06
@VladimirAndreev

1. Why are there crons at all? Implement both the parser and the worker as ever-running scripts
2. Why the file if there is rabbitmq?
3. M.b. does it make sense to add a third worker, which would only write to the database, but "wholesale"?