H
H
Hint2019-11-21 11:22:45
linux
Hint, 2019-11-21 11:22:45

Why do processes on CentOS sometimes freeze for 100-300ms?

There is a Processor 2x Intel Xeon E5-2650v4 server, RAM 128 GB, Disks 4x480GB SSD, CentOS 7.3. It runs mysql, php-fpm, redis, nginx.
php-fpm status (morning):

Processes active: 6, idle: 144, Requests: 724543849, slow: 0, Traffic: 753req/sec

top:
top - 11:09:19 up 447 days,  4:47,  1 user,  load average: 9.66, 8.10, 7.90
Tasks: 651 total,   8 running, 643 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.9 us,  4.0 sy,  0.0 ni, 93.1 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 13145292+total,  2053168 free, 38556552 used, 90843208 buff/cache
KiB Swap:  4194300 total,  3122176 free,  1072124 used. 88618272 avail Mem

Everything works more or less normally, requests to the database are optimized (rarely longer than 1 ms), php scripts are executed on average in 10-30 ms.
The problem is that sometimes there is a lag of 200-300 ms (roughly speaking, 1 http request out of 10). Through experiments, it was possible to understand that this lag occurs out of the blue (for example, passing through one small array with the formation of another). Through strace php, we managed to catch the following picture:
1574322420.597333 fstat(3, {st_mode=S_IFREG|0644, st_size=76666, ...}) = 0 <0.000010>
1574322420.597358 mmap(NULL, 76666, PROT_READ, MAP_SHARED, 3, 0) = 0x7fbe917c2000 <0.000015>
1574322420.598163 mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbe7aa00000 <0.000014>
1574322420.894121 munmap(0x7fbe917c2000, 76666) = 0 <0.000024>
1574322420.894191 close(3)              = 0 <0.000011>

In the current log after mmap there is a delay of 296 ms, although mmap itself is fast. Sometimes the delay occurs after munmap. It turns out that the script is executed in 90% of cases, roughly speaking, in 50 ms. And in 10% of cases, a one-time lag of 300 ms slips (350 in total).
Question. Is this normal behavior for a busy server (although it doesn't seem to be heavily loaded), or is there some kind of problem? How to get to the bottom? How to optimize?
PS There were suggestions to start optimizing the application. Again, the application has nothing to do with it. While the script is running, there is a one-time long lag of 200-300 ms. What happens in a place where there are no external calls that could cause blocking. This lag was originally seen when processing an http request in php-fpm. During the experiments, the script was already executed from the console directly via php with the analysis of the execution time of each line. The script runs normally in a few ms (1-5). For about one call out of 10, the time rises to 200-300, an additional delay occurs out of the blue. For example, there is a loop for 100 iterations, where at each iteration a new element is added to the array (not from somewhere in the database, but, for example, a string that simply contains the iteration number). Usually the whole loop is executed instantly in microseconds, but sometimes the script freezes for 200-300 ms at one of the iterations. This script was run via strace to understand on which system call the delay occurs, the result is higher in the text (about mmap and munmap). When run through the time command, user and sys time is minimal (1-5 ms), and all additional delay is included in real.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
H
Hint, 2019-12-03
@Hint

Thank you all for your participation. I decided to just restart the server to check (uptime was more than 450 days), the lags disappeared.

V
Vitaly Karasik, 2019-11-21
@vitaly_il1

Not exactly an answer, more of a tip - I'd try instrumenting the code (New Relic, AppDynamics, ...) and look at profiling results.
strace is certainly a good thing, but too low-level and old-school :-)

R
Roman Mirilaczvili, 2019-11-23
@2ord

I agree with Vitaly Karasik about profiling a PHP application. You need to start by finding bottlenecks in the application and try to optimize them taking into account the load on the system.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question