Architectural issues for creating an API for an asynchronous service

D

DeBass2013-12-27 00:53:02

Python

DeBass, 2013-12-27 00:53:02

It so happened that I alone need to develop an "asynchronous API". Let me explain what I mean by the term "asynchronous API", the general architecture is as follows:
There is a listener that accepts http requests to the API, based on the request, a message is generated and sent to the AQMP broker. There are workers that take a message from the queue (AMQP) and perform some actions that are specified in the received message. Queries can take a long time to complete. The result of the work can be XML or JSON data (in some very rare cases it can be PDF\RTF). The user makes a request, he is given a Task ID. Further, the user, with some periodicity, checks the status of this task (makes a Get request, for example, to the address host/api/task/123-1213-12121-121212 ).
Actually the question itself, how can you store and return the result of the work?
I see several solutions:
1) The result of the work is placed in a database table. A table, for example, of 3 fields - taskID, result, time (in time, store the time the result was written to the table, in order to delete it, for example, after 48 hours.)
2) Place the result of the work in a friend. queue on AMQP broker. But how do you give back? How to identify a specific message in a queue without iterating through all?
3) Another option that I haven't thought of yet.
I will be very glad for advice (on absolutely any points). If you are interested in specific technologies, then here:
1) Listener - python + bottle + pika
2) Workers - python + pika + a lot of things =)
3) As a Postgresql base
I really hope for your help, because. no one to consult. This project is for self-education.

Reply

Answer the question

In order to leave comments, you need to log in

5 answer(s)

D

DeBass, 2013-12-27
@DeBass

Thanks for answers!
Currently implemented like this:
A request comes to the listener, after all checks (tokens, json schema, valid parameters), the tasks(task_id varchar(50), status varchar(10), result text, timest timestamt ) table is filled with the task id and status "new", after transmission in the queue, the status changes to "queue". as soon as the worker gets a message from the queue, it looks at the taskId and changes the status to "inWork", after execution the status changes to "complete" and the result is written. Each time the status is updated, the timest field is updated. The user checks the status of the task with a get request, and if the status is !=complete, then the current status and timestamp are returned to him. if status = complete, then the result field is returned to the user and the status changes to "was_returned" (the script in the background splits records with the status "was_returned", which are older than 48 hours and "completed" records which are older than a week). How do you like this workflow?
Push options are very interesting, I will definitely try them later.

Z

zarincheg, 2013-12-27
@zarincheg

If the user works through a browser, then push notifications can be made using web sockets or long polling. Well, in fact, either directly transmit the data, or initiate a data request with this very notification. And the results are stored in the database normally.

S

sasha, 2013-12-27
@madmages

if there is no task to pull old browsers, then the websocket will be good, if old browsers, then still long poling (if you strain to keep a lot of connections, then put a proxy nodejs nearby, it requires few resources to support connections ), but if this does not fit, then your first the option is in your hands.

M

maxaon, 2013-12-27
@maxaon

If the message broker supports task status use it. Also, it is desirable to report the status of the task to the user.
Pass message statuses, as mentioned earlier, via websocket or longpolling.
The result is entered into the database, given by another request.
If the broker does not support state persistence, then duplicate the result(pending,progress,sucess,fail) in the database from the handler script.

S

skomoroh, 2014-01-03
@skomoroh

api on django + tastypie work well
in api 3 methods:
1. set a task - in response to the task id
2. find out the status of the task by id - in progress, error, done
3. get the result by id
listener:
for each task it takes id from the radish (increment)
immediately queues id+task indiscriminately
into the radish writes this id the status "in progress"
to all status requests and before fetching the result takes the status from the radish
if the status is "done", takes the result from the database (or from the file)
worker the first stage:
takes tasks from the queue,
checks the data, if it is erroneous (or the user's balance, etc.) - writes the "error" status to the radish with an explanation,
splits into micro-tasks, throws micro-tasks into a new queue,
writes information on the connections of tasks and microtasks ( task_id, sub_task_id, status, result ) to postgrew or radish
2nd stage workers:
take a micro-task from the queue
execute
throw id+ result in the results queue
of the 3rd stage worker:
takes the results of micro-tasks
in piles, groups them by tasks in postgre or radish
if all micro-tasks are ready, writes the result of the task to postgrew or the file
changes the status of the task to ready or an error (if the results of micro- task error)
set the storage time of the result through the TTL of the radish