Answer the question
In order to leave comments, you need to log in
How to properly move analytics data to a separate storage?
There is a database on MySQL, which contains data on music tracks and analytics on them.
There are a lot of analytics, so at some point it was decided to move the table with analytics to ClickHouse.
This helped a lot. The speed of query execution has increased by an order of magnitude.
But this caused another problem. Now making a query from two tables is a big problem. For example, I need to select all tracks (table in MySQL) that were created in January and select the 30 most listened to (ClickHouse). In order to execute such a query, you need to either select track ids in MySQL, then substitute them into the ClickHouse query, or store a duplicate of the track table in ClickHouse. Both options are terrible.
In general, the transfer of analytics to CH is a great thing, but what about such inconveniences? How do you work with multiple stores for linked data?
Answer the question
In order to leave comments, you need to log in
Well, the problem is in a conceptual misunderstanding of what analytics and Warehouse or Data Lake are.
First, let's define how analytics differs from metrics, aggregates, and reports.
store a duplicate table with tracks in ClickHouse
Recommender systems typically do not necessarily provide real-time data. Therefore, I propose another option for working with data:
some background process will receive some metrics from the service API and will temporarily store data in Mysql in an amount sufficient for batch sending to ClickHouse. Another process will periodically make requests to ClickHouse, and store the results of recommendations in Mysql. Thus, all requests from the service API can be processed by referring only to Mysql.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question