A
A
Alexander2020-07-21 06:12:44
Database
Alexander, 2020-07-21 06:12:44

Intelligence puzzle, how to solve it?

Given:
Million tags
Million users
Each user can select any number of tags, creating a unique group from them and subscribe to this group

Task:
When content is created, for example, with n tags, you need to find all groups that can be combined from these n tags in an amount of 2x or more up to n respectively, and select all users who have the same groups.

What approaches, sorting algorithms and database type should be used to optimize calculations?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
D
Dr. Bacon, 2020-07-21
@bacon

Write a solution "on the forehead", do load testing, use it to determine the bottleneck, optimize it. And so repeat a bunch of times.

S
Sergey Vodakov, 2020-07-21
@WaterSmith

When saving the user's settings, we hash the group of tags to which he subscribed and save the hash. When adding content with a group of tags, we also calculate the hash and look for all subscribers with the same hash.
update:
I think I got it.
Create a table of subscriptions (subscribers) of the form: user, tag
Next, make a selection from this table, something like this:

SELECT user, count(tag) as tagcounter
FROM subscribers 
WHERE tag IN (%taglist)
GROUP BY user
HAVING tagcounter>1

where %taglist is a list of article tags

D
Dimonchik, 2020-07-21
@dimonchik2013

Million tags
Each user can subscribe to any group of tags

IRL won't do that.
Million users

and this may not be, if not bots

I
Ivan Shumov, 2020-07-21
@inoise

There are so many excellent analyzes of such architectures on YouTube, you would know. In short, it will only be a separate subscription service. It listens to events from the system via the Event Bus and generates lists and notifications for itself. In general, queues and multiple subscribers are involved for parallel processing.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question