A
A
Alex Mirgorodskiy2018-06-07 10:48:23
PHP
Alex Mirgorodskiy, 2018-06-07 10:48:23

How to quickly sort through a large amount of data?

Hello colleagues, there was such a task.
I am writing an Api, once a day it will receive data from the user base in a json array, I need to take these users and for which line (it is not yet known, id or phone number) to compare with their base and return a json array with the status whether they are signed These users are in my database or not. the problem is that once a day, right at one moment, it can come from 20 thousand records, how would I quickly go through them, and so that the server does not lie down ... Who faced? Can you tell me the best way to deal with this?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
D
Dmitry Kuznetsov, 2018-06-07
@dima9595

In my opinion it is necessary:
​​First: To sort out records not once, and parts. At the same time, save this data in the cache.
Second: Use queues.
PS: If there are other options, I would be glad to hear too.

I
Ivan, 2018-06-07
@LiguidCool

1) break the task into small ones, add up the result.
2) make a replica and choose in it.

S
Sergey Sokolov, 2018-06-07
@sergiks

Make the task asynchronous.

  1. they send a large json with user data, a comparison field, (authorization keys) and uuid of the task and the callback address where your api will send the response - later, later;
  2. the data was accepted, saved, created a task, placed it in the task queue, the client answered the request with "OK, accepted, we'll do it - we'll answer." After all, if there is more than one client, maybe at once a dozen clients would like to send a million of their users at midnight - a total of 10 million. “In line, bitch kids, in line!” “There are many of you, but I am alone!” :)
  3. one or more “workers” (servers, processes) each pick up one task from the queue and execute them as quickly as possible, prepare a response, and send it to the callback address. If the sending failed, something went wrong on the client side, a subtask is placed in the queue only to resend the response after N seconds, no more than M times.

V
Vitaly, 2018-06-07
@rim89

string (not yet known, id or phone number)

if the ID is a unique index, IMHO it will be cheaper
without delving into the details as an option:
the data and callback url came in,
the implementation of the exchange through long pooling (if the server does not have time to process according to the classical scheme), how the response array will be formed - send it to the callback url
from the database side (if SQL), as an option - creating a temporary table from the received data + LEFT JOIN into it from existing subscriptions: if subscribed, then the column will be 1, if not, then NULL

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question