How to diagnose hanging php-fpm processes?

S

Sergey Sokolov2019-10-12 19:53:06

linux

Sergey Sokolov, 2019-10-12 19:53:06

The VPS is running Ubuntu 18, nginx, mysql, redis, php7.2-fpm running a Laravel web application. Long time and fine.
Suddenly today the php-fpm processes fell into the "D" status (uninterruptible sleep (usually IO)) and are kill -9not killed.
Options or wait is not clear what. Either reboot.
The first time I sudo systemctl rebootrebooted the server.
The second time I failed in a few minutes. I had to run Power cycle through the hosting panel.
Three times already there was such situation demanding reboot today. This has never happened before, and here it is again.
In a similar question on SO, they found out that they had an executable code related to updating the cache running in parallel on all php-fpm instances.
I did not find anything suspicious-unusual in the logs before the next freezes. The application is actively used, there are several requests per second, but everything is as always.
I looked at the logs of nginx, php-fpm and Laravel applications.
php-fpm , as workers precipitated, launched new ones until it hit the limit:

[12-Oct-2019 14:26:07] WARNING: [pool www] server reached pm.max_children setting (8), consider raising it

nginx before the problem or already as a result of it began to report about timeout:

[error] 1053#1053: *13891 upstream timed out (110: Connection timed out) while reading response header from upstream

dmesg writes about incomprehensible jbd2/sda-8and immediately after that also about php-fpm:

484.254707] INFO: task jbd2/sda-8:1540 blocked for more than 120 seconds.
[  484.262192]       Not tainted 4.15.0-65-generic #74-Ubuntu
[  484.272558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.280122] jbd2/sda-8      D    0  1540      2 0x80000000
...
[  484.280256] INFO: task php-fpm7.2:1584 blocked for more than 120 seconds.
[  484.286958]       Not tainted 4.15.0-65-generic #74-Ubuntu
[  484.292249] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.305238] php-fpm7.2      D    0  1584    858 0x00000000

VPS (droplet) on DigitalOcean, block storage Volume is connected to it - like dev/sda. First post dmesgabout him? Because of this connected volume, there is a plug? How can you try-catch it?
What to watch, how to understand the cause of the situation?
Upd. technical support replied that the problem was in the physical equipment of the server where the instance was located. They fixed it and the problem went away. At the same time, the droplet was transferred to another physical. equipment just in case. Question removed. Very good answer Roman Mirr helped figure it out, thanks!

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

R

Roman Mirilaczvili, 2019-10-12
@sergiks

jbd2 is the subsystem that works with ext4.
It looks like high I/O activity.
To learn more, you need to have a history of events. atop is able to keep track of processes and resources, allowing you to replay the story later to find out the cause of the problem.
https://haydenjames.io/use-atop-linux-server-perfo...
https://haydenjames.io/linux-server-performance-di...