Answer the question
In order to leave comments, you need to log in
Intelligence puzzle, how to solve it?
Given:
Million tags
Million users
Each user can select any number of tags, creating a unique group from them and subscribe to this group
Task:
When content is created, for example, with n tags, you need to find all groups that can be combined from these n tags in an amount of 2x or more up to n respectively, and select all users who have the same groups.
What approaches, sorting algorithms and database type should be used to optimize calculations?
Answer the question
In order to leave comments, you need to log in
Write a solution "on the forehead", do load testing, use it to determine the bottleneck, optimize it. And so repeat a bunch of times.
When saving the user's settings, we hash the group of tags to which he subscribed and save the hash. When adding content with a group of tags, we also calculate the hash and look for all subscribers with the same hash.
update:
I think I got it.
Create a table of subscriptions (subscribers) of the form: user, tag
Next, make a selection from this table, something like this:
SELECT user, count(tag) as tagcounter
FROM subscribers
WHERE tag IN (%taglist)
GROUP BY user
HAVING tagcounter>1
Million tags
Each user can subscribe to any group of tags
Million users
There are so many excellent analyzes of such architectures on YouTube, you would know. In short, it will only be a separate subscription service. It listens to events from the system via the Event Bus and generates lists and notifications for itself. In general, queues and multiple subscribers are involved for parallel processing.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question