S
S
Stas_Telnov2013-11-12 11:03:51
PHP
Stas_Telnov, 2013-11-12 11:03:51

Help to find the cause of problems with apache

Greetings. I have a project that I inherited - or rather, I volunteered and climbed into the back)
The project has a fairly powerful dedicated server: Xeon E3-1240 3.7GHz / 16Gb RAM / 2x2000Gb SATA / IPMI with IP-KVM
The project is php + mysql.
But there are always problems with apache - it crashes or slows down. The server has Monit installed .
There are not many users - about a hundred more or less active. There are about 20 people online at the peak (but some may be sitting in several windows) and constant Ajax requests from each user (I know that I need a comet-server in a good way, but so far my hands have not reached).

And periodically falls apache, Yandex metric reports the unavailability of the site. It usually lasts no more than a few seconds - then apache is restored and works normally, albeit slowly, and then it can lie down again. Although apache itself does not really crash - it just slows down. Restarting apache does not help - after a minute the situation starts to repeat itself.

Monit sends 2 types of notification in such cases:

Connection failed Service apache
Date: Tue, 12 Nov 2013 11:00:52
Action: alert
Host: MYHOST
Description: failed protocol test [HTTP] at INET[MYPROJECT:80] via TCP - HTTP: Error receiving data - Resource temporarily unavailable

Your faithful employee,
Monit

or

Resource limit matched Service apache
Date: Tue, 12 Nov 2013 01:56:42
Action: alert
Host: MYHOST
Description: loadavg(5min) of 44.5 matches resource limit [loadavg(5min)>20.0]

Your faithful employee,
Monit

The second type of message is extremely rare - during peak hours of user presence. And it's more or less clear what happened

. And the first message is more of a mystery to me. Especially considering that at the same time there were no special abnormal indicators of CPU and memory load from apache, rather, on the contrary, Apache's resource consumption sagged closer to zero values.

apache settings:
Timeout 500

KeepAlive On
MaxKeepAliveRequests 500
KeepAliveTimeout 15

<IfModule mpm_prefork_module>
    StartServers          5
    MinSpareServers       5
    MaxSpareServers      50
    MaxClients          450
    MaxRequestsPerChild   100
</IfModule>

<IfModule mpm_worker_module>
    StartServers          5
    MinSpareThreads      5
    MaxSpareThreads      75 
    ThreadLimit          100
    ThreadsPerChild      35
    MaxClients          450
    MaxRequestsPerChild   100
</IfModule>

<IfModule mpm_event_module>
    StartServers          5
    MinSpareThreads      5
    MaxSpareThreads      75 
    ThreadLimit          100
    ThreadsPerChild      35
    MaxClients          450
    MaxRequestsPerChild   100
</IfModule>


And error.log in such intervals I do not see anything. There were some problems in 11.02 and here is what is in the log about this:

[Mon Nov 11 10:34:06 2013] [notice] suEXEC mechanism enabled (wrapper: /usr/lib/apache2/suexec)
[Mon Nov 11 10:34:06 2013] [warn] RSA server certificate CommonName (CN) `MYHOST' does NOT match server name!?
[Mon Nov 11 10:34:06 2013] [notice] Apache/2.2.22 (Debian) PHP/5.4.4-14+deb7u5 mod_ssl/2.2.22 OpenSSL/1.0.1e configured — resuming normal operations
[Mon Nov 11 11:25:26 2013] [notice] caught SIGTERM, shutting down
[Mon Nov 11 11:25:26 2013] [warn] RSA server certificate CommonName (CN) `MYHOST' does NOT match server name!?
[Mon Nov 11 11:25:26 2013] [notice] suEXEC mechanism enabled (wrapper: /usr/lib/apache2/suexec)


Here are the monit settings for monitoring apache:
check process apache with pidfile /var/run/apache2.pid
   start program = "/etc/init.d/apache2 start"
   stop program  = "/etc/init.d/apache2 stop"
   if failed host MYPROJECT port 80 protocol http then alert
   if cpu > 60% for 2 cycles then alert
   if cpu > 85% for 5 cycles then restart
   if totalmem > 2048 MB for 5 cycles then alert
   if children > 250 then alert
   if loadavg(5min) greater than 20 for 10 cycles then alert
   if 3 restarts within 5 cycles then timeout


Can you tell me what is wrong with me? Give advice on what configuration values ​​to change and what to analyze?
Please, if possible, write as for a blonde. I had no administration skills, as well as interaction with unix, in fact, before starting work with this project.

Thank you.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
M
merlin-vrn, 2013-11-12
@Stas_Telnov

It can't be a log rotation at this point? It looks like Apache has restarted.
In general, I would also look at dmesg (for segfaults and the like) and system-wide logs, there, messages, cron log and so on. What actually happened in the system.

P
phpdude, 2013-11-13
@phpdude

It would be necessary to monitor the load with your eyes, maybe you run into some system limits, since there is a lot of ajax, maybe some requests block the network, somehow use the server suboptimally, block the database (as a frequent case of working with MyISAM storage backend in mysql), preventing the following requests work, Apache can fall in such situations - I saw this in due time.

I would: 1. monitor the load, draw some conclusions about possible blocking of resources. error #1 is similar to blocking the ability to accept new connections due to a limit (system or at the apache level) of connections (possibly dependent on a database lock). 2. I switched to nginx + php-fpm in any case, the bundle is faster and it consumes less resources, it is more finely configured. Well, this is IMHO.

L
lubezniy, 2013-11-12
@lubezniy

Is the RAID array (if used) OK?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question