M
M
muhasa2020-01-09 14:22:27
linux
muhasa, 2020-01-09 14:22:27

Will php-fpm + ngnix help with our parsing load problem?

People, we can’t figure out where the thin place is and where our parsing algorithm breaks.
There is a third-party site that hosts the results of sports and eSports events. We take the results from him. But not by the crown, because there you can pull a maximum of once a minute, but with the help of a self-written daimon.
The code is something like this

$parser_result = json_decode(file_get_contents($url), true);

foreach ($parser_result['Value'] as $game){
     $cdate_unixtime = $game['S'];
     $gid = intval($game['I']);
     $urls[] = $url."&game_id=".$gid;
}

// array of curl handles
$multiCurl = [];
// data to be returned
$result = [];
// multi handle
$mh = curl_multi_init();
foreach ($urls as $i => $url) {
  // URL from which data will be fetched
  $multiCurl[$i] = curl_init();
  curl_setopt($multiCurl[$i], CURLOPT_URL,$url);
  curl_setopt($multiCurl[$i], CURLOPT_HEADER,0);
  curl_setopt($multiCurl[$i], CURLOPT_RETURNTRANSFER,1);
  curl_multi_add_handle($mh, $multiCurl[$i]);
}

$index=null;
do {
  curl_multi_exec($mh,$index);
} while($index > 0);
// get content and remove handles
foreach($multiCurl as $k => $ch) {
  $req = curl_multi_getcontent($ch);
  $parser_kf = json_decode($req, true);
  
  // здесь мы вычленяем данные, кладем их в базу, а потом ниже удаляем хендлер соединения, весь код я опустил
  
  curl_multi_remove_handle($mh, $ch);
}
// закрываем соединение
curl_multi_close($mh);

We run this code like this
#!/bin/bash
while true;
do 
/opt/php72/bin/php parser.php;
sleep 5;
done

and the launch itself via nohup
nohup ./daemon.sh >/dev/null 2>&1 &
We have 5 such parsers, and daimons too. That is, in the background of Linux, 5 processes hang around the clock. At some point, the RAM (2 GB on the OpenVZ VPS) got clogged up, the process killer worked and unloaded our parser. The site is up.
We will probably have even more parsers and daimons. On the OpenVZ platform, it is not possible to reset the RAM, unlike KVM virtualization (we checked it in a bunch of ways, but if you know how, tell me). The cache of the RAM is constantly clogged.
Now the most important thing
Do I understand correctly that there can be several reasons and solutions?
1) We are currently running apache . I heard that ngixworks better with processes, but did not delve into what class of tasks in question. Also heard about better performance of php-fpm. Will the ngnix + php-fpm bundle help us in solving this issue? Right now, we have doher php and httpd processes running in the background and maybe they work out so shitty.
2) Maybe the code is written crookedly and we under-optimized something somewhere? Or is there a problem with this curl_multi_init ?.. We must simultaneously poll from 4 to 100 addresses (more precisely, address 1, it changes only the game parameter in the url), sequential polling is not suitable for us.
3) Is it possible to solve the problem by telling Linux that these processes should not be killed? And how can this be done, if possible?
4) Maybe the option with nohup is not the most effective? How would you launch such a daimon?..
If a competent admin reads this topic, who knows how to profile such things and find unbalanced areas, we will consider a commercial solution to the problem.

Answer the question

In order to leave comments, you need to log in

5 answer(s)
S
Sanes, 2020-01-09
@Sanes

What's with the web server? You are parsing through php-cli

V
Victor Taran, 2020-01-09
@shambler81

no it won't solve the problem php-fpm will give you 20% speed but won't solve the problem.
but if you have 5.6, then switching to 7.2 will give you a real 100% speed boost and a real load reduction.
As a reduction option, run them sequentially

E
Eugene, 2020-01-09
@Nc_Soft

And why run through nohup? Use something adequate like supervisord

A
Alexander Filippenko, 2020-01-09
@alexfilus

nginx has nothing to do with it at all.
It looks like memory is leaking somewhere, without debugging, I can advise you to either cram more unset, or rewrite it so that when the process crashes, it restarts and starts from where it left off (for example, implement a certain queue).

S
shtirmuz, 2020-01-09
@shtirmuz

Fpm will help free up scarce memory. Help to adapt the engine for pure nginx without any problems. Well, it doesn’t hurt to take 4GB with a margin.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question