Is the architecture of the online user statistics collection application using Prometheus correct?

D

Dasha Tsiklauri2020-10-16 22:49:19

Software design

Dasha Tsiklauri, 2020-10-16 22:49:19

Colleagues, I have a question for you that requires ideas / opinions / comments in the form of answers.

My idea: there is a social a network with a bunch of groups (five thousand or more), the group has two metrics: the number of subscribers and the number of online subscribers , I need to pull the social API. networks ( 1 group = 1 request = 1 response with both metrics ) to check these two metrics and write to your database , thus having statistics on numbers for a long period ( maximum 1 month ).
API data retrieval period = 5 minutes
Number of requests to API in 5 minutes = 5000+
Client of my productgets the data on the far right for a day/week/month by group name (group_name) in the form of a graph

Implementation moments (in my understanding):

there is some database with list of all groups (mongodb)
there is some worker that can make requests to the API and put data in the queue
there is some consumer of data received from the API, from the queue, which checks - if the data is newer than the previous ones (just processing the case when a later request received an earlier response than the previous one), then updates ; data update is planned to be done in redis , so redis can store a dictionary of the latest data, key=group_name value=a pair of metrics (number of users, number online)
Prometheus to store two metrics users_amount{group=} and users_online{group=}
monitoring groups can be added online
we will omit the implementation details of the API / web interface

Open questions:

Prometheus for such a task is ok?
thousands of labels in Prometheus - is that ok? given that they are independent requests , or it is better to start two metrics for each group of the form users_amount_, users_online_
if a pull model for two metrics is configured in Prometheus , then it will kick me every 5 minutes by making two requests for each metric and I give it a bunch of data for thousands of labels, do I have the right understanding? Or do you need to spread the load somehow? in my opinion, two requests for traffic will be better than for each label
it is desirable to spread requests leaving within five minutes as evenly as possible
5k req / 5min = 15-20 req / sec
What are the best ways to do this with a reserve for scaling workers ? I understand that every five minutes you need to queue five thousand elements, is it true then that we implement throttling inside the worker? then if the workers can’t cope, then the queue grows, then we react by adding another worker

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

Dimonchik, 2020-10-17
@dasha_programmist

Prometheus for such a task is ok?

if we are talking about time series - that's ok,
conceptually I'm confused what you have there - because "if, then updates" slipped through, and time series for that and series to add, not update, but + - build, and there yourself you will see
questions about bottlenecks and rps theoretically, alas, unsolvable, the same Redis / Mongo work great for insertion with a continuous stream of incoming streams until you have to dump it to disk ... hehe,
so everything that you think up is theoretically checked on test data, your test data , bottlenecks are defined and the config is updated
, of course, the memory that flows over time - alas - only experience, forums and sleepless nights of admins