Is it possible to start a process on OOM in Linux?

Yaroslav2019-10-02 20:54:29

linux

Yaroslav, 2019-10-02 20:54:29

Let's say the server is overloaded with something (we know through monitoring that the load average is wild, but we really don't know anything more). We can’t view via SSH, because unable to log in. Either it immediately closes the connection, or a TCP connection is established, but does not give an SSH banner even for half an hour.
A very disappointing situation, since the server is generally working, even responding to simple HTTP requests. There would be ssh - everything could be sorted out and repaired. (login, ps, kill, kill, kill). But he is not.
Well, seven troubles - one answer, the hoster will overload. But there are questions.
Question 1 - what exactly is the reason SSH is not logging in? No free memory? (my main version) Or is the processor so lacking? (but I waited half an hour - so I didn’t even wait for a banner from SSH)
Question 2 (admin) - maybe there is some possibility for this to avoid this problem in the future? Theoretically, if SSH immediately reserves the extra ten megabytes on the server for login as root, and uses them when logging in, this would help a lot. Maybe there is some trick for this?
Question 3 (programmer. Relevant if there is no good answer to 2) - is it possible in Linux in principle? If our program (our ssh daemon or getty) launches another program (shell) and for this it needs some memory, can we take it in advance, and at the time of launch somehow indicate that it can be used to shell - the process started guaranteed? Maybe (as a perverted trick) immediately launch bash (when the server starts) and only connect the user and bash somehow when logging in?

Answer the question

In order to leave comments, you need to log in

6 answer(s)

sim3x, 2019-10-02
@xenon

1. Yes, memory. If the process is alive, but responds for a long time - most often the problem is in the disk io, and then the CPU.
Or a network, or a tricky ddos
But you can have your own case
2. https://www.google.com.ua/search?q=oom+killer+excl...
https://backdrift.org/oom-killer -how-to-create-oom...
3. cgroups
https://superuser.com/questions/1026708/is-there-a...

Dmitry Aleksandrov, 2019-10-02
@jamakasi666

It would be more correct to find the source of the problem, network \ CPU \ RAM \ io overload. Then move on from her. And if it’s absolutely right, then find this problem, find the cause and eliminate it. most likely it will be covered in crooked configs.

Alexander, 2019-10-02
Madzhugin @Suntechnic

2 Run sshd with nice -19 ?
It seems to be 20 by default.
man screen

sirota, 2019-10-03
@sirota

Kvm in theory would save the situation. On kraynyak if a virtualka, then the manager of a virtualka. For good, set up monitoring with logs and already dig there what is the cause.

Valentine, 2019-10-10
@ProFfeSsoRr

Firstly, oom itself is configurable, you can set priorities for processes for oom.
Secondly - high la, because of which everything slows down, maybe for very different reasons. And in general, it's strange to think right away in the direction of oom - he would kill there and everything would be fine, but if it doesn't - it's more like a disk. Or something else.
Thirdly (although in fact you should start with this) - set up monitoring. At least put netdata, it's as fast as possible. But it’s better, of course, to a separate Prometheus server with Grafana, and to the problem server, respectively, node exporter and exporters for your specific applications. Well, i.e. in the general case, the task is solved by monitoring, again, the logs are also sent to another server. And in monitoring alerts, in order to have time to react to problems when they have just begun.

mayton2019, 2020-01-05
@mayton2019

I completely agree with the speakers about virtualization.
Regarding the situation that has already happened. Most likely, entering bash will not give you anything. Because any commands you run will start processes and you will get the same situation over and over again as with bash. That is, by some miracle they entered, but nothing can be done properly.
It is necessary to collect 100% logs and post-mortem snapshots of application memory. Or applications. Most likely it is one. And it is also the source of the problem. This application must be transferred to the docker with memory limits and run there.
Memory dumps need to be analyzed and understood what is flooding. From the point of view of the application, there should be some guarantees or requirements for the normal operation. That is, if he needs 8G, then give him exactly 8 and no more.