How to properly fork a process in flask?

M

Maxim2016-03-11 20:23:13

Flask

Maxim, 2016-03-11 20:23:13

Hello.
The task is to make an API under which the script will be launched. The script, in turn, can be executed for a very, very long time - from 90 seconds. The API, as you might guess, should respond instantly.
Nginx will act as a web server with a proxy for the flask application.
As we found out, flask is not asynchronous at all.
The idea immediately arose to fork and perform the task in the background.

Threw for testing:

def start():
  pid1 = os.fork()
  if pid1 == 0:
    os.setsid()
    x = 30
    while x > 0:
      sleep(1)
      with open('/tmp/sleep.log', 'a') as fd:
        pid = str(os.getpid())
        print "child pid = ", pid 
        fd.write(pid  + ' PID \n')
      x -= 1
    os._exit(0)
  else:
    print os.getpid()
    while True:
      pass

start()

In this example, everything works great. A new process is created, the task will run in the background. After completion, the child process dies, the parent continues to run.

I tried to migrate under flask:

@app.route('/api/start', methods=['POST'])
def start():
  d1 = "DONE\n"
  pid1 = os.fork()
  if pid1 == 0:
    os.setsid()
    closer() #здесь закрываю все файловые дескрипторы унаследованные от родителя
    x = 30
    while x > 0:
      sleep(1)
      with open('/tmp/sleep.log', 'a') as fd:
        pid = str(os.getpid())
        print "child pid = ", pid 
        fd.write(pid  + ' PID \n')
      x -= 1
    os._exit(0)
  else:
    print os.getpid()
  return d1

Here is what happens in this example:

username    6158  0.0  0.4 106528 26192 ?        S    19:34   0:00 python -u /home/username/VCS/username/seek/lui/tcpdumper/dumper_api.py
username    6165  0.3  0.4 182876 26816 ?        Sl   19:34   0:05 /usr/bin/python /home/username/VCS/username/seek/lui/tcpdumper/dumper_api.py
username    6262  0.0  0.0      0     0 ?        Zs   19:34   0:00 [python] <defunct>

The child process immediately becomes a zombie. Accordingly, nothing is written to the file.

Actually, the main question is why this happens.
Please let me know if there are any other solutions to my problem.
I looked towards Tornado, subprocess (it is undesirable to use it).
Thank you.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

M

Maxim, 2016-03-14
@usbmonkey

Managed to figure out the problem. Maybe someone will help.
The fact is that the parent process is waiting for the completion of the child (return code).
I thought calling os._exit() was enough and you can see it by running strace:
It turned out that there are nuances.
The nuance is as follows. For the parent process, you need to install a signal handler. You can do this using the signal library.
The first argument is an int constant, and the second is the action for the signal.
After the child process terminates, it leaves in a zombie state (this is normal), as was the case in my case. After the handler is installed, the reaping mechanism starts working and the OS clears the process table on its own.
By the way, stackoverflow suggested a pretty good solution.
The idea is to make a daemon that will look into the task queue. In this case, the web server will respond with 202 code. Tasks will be sorted as they arrive, and the status of the task can always be found at a different URL.

T

thenno, 2016-03-11
@thenno

For starters, there are more sane ways to manage processes in Python - check out the multiprocess module.
It's impossible to guess, actually :) It is very unclear why such logic is needed - first answer the client with 'ok', and only then finish the operation. This is logically incorrect and confusing 99% of the time. In a good way, the api should accept the request, perform the action (at least for less than a second, at least for all 90 seconds), and then respond to the client with success or not.
The main thing here is to correctly configure nginx (worker_connections and timeouts at least, perhaps there is something more suitable in the official documentation), so that in the case of many slow requests, the service continues to work normally.