Unexpected termination of a process started via nohup?

S

Snowindy2013-07-22 00:18:34

FreeBSD

Snowindy, 2013-07-22 00:18:34

Situation:
There is a web server whose process can spontaneously terminate (error, OutOfMemory, whatever).
In order for the server to rise automatically after a fall, I wrote a special script:
run-loop.sh

while :
do
echo "inifinite loop iteration..."
./start-server.sh
echo "inifinite loop iteration ended, sleeping for some time"
sleep 60
done

I start the whole thing like this from the ssh session of a special user (not root):
nohup ./run-loop.sh > logs/server-out.log &
The wrapper script works fine in tests, I checked: if you kill the server process via kill -9, it rises in a minute.
However, periodically (time floats, 1-7 days of normal operation usually) I find that none of the user processes are present: neither the wrapper process with a cycle, nor the server process.
There is nothing useful in the server logs, as if it was turned off by default, nothing in the server-out.log either.
How to understand who killed user processes? Can the server terminate with some special error code that kills all scripts in the chain? Where to look for logs and ends?
Uptime says that the iron server was not rebooted.
Operating system: FreeBSD 9.x.
UPDATE:
I removed the wrapper, implemented a service check via cron, it works.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

Y

Yaroslav, 2013-07-22
@Snowwindy

The line about OutOfMemory is a little confusing. Is this just an example, or does it actually happen sometimes? It’s just that on Linux, when it happens, it’s bad, and in order for the system to somehow live, the kernel beats “anyone”, a random process. So, if there is a similar scheme in the fra, maybe for the same reason the shell script itself is killed? In the logs, probably, OOM should be reflected.
But as an option, if the goal is not to figure it out, but simply to solve the problem, as long as it works, you can try without a wrapper. Just run a simple "check-respawn" script by cron, which will check, and if the server is lying (by a lock file, an open port, or even stupidly by ps'u) - run it again. In this case, the script will not die for some unknown reason. Unless cron dies from OOM, but if cron dies, you will already know for sure that the problem is not in the crooked cron code.

J

jj_killer, 2013-07-22
@jj_killer

There is such an option: github.com/visionmedia/mon

S

Sergey Petrikov, 2013-07-22
@RicoX

You can use monit
Try manually sending a SIGTERM to run-loop.sh, what's the response? In your version, there is a spinner that restarts the process, but there is no control over the work of the spinner itself, well, don’t produce bicycles, everything has been invented a long time ago (see above about monit)