Answer the question
In order to leave comments, you need to log in
What is the best way to sync 20 bd?
There are about 20 databases (sql, pg) with data on customer orders (sales). Every second there is a new order in each database. It is required to collect information to calculate all sales to the server (21st). Cron transfer directly from db to db. The algorithm is clear, we use timestamp, we check, we update. BUT! When transferring, information loss is possible. It is necessary to check whether all the data has been downloaded. In case of data loss, send a message. The question is, can I arrange a check through an intermediate file? Will it give speed/reliability? How to properly build architecture?
Answer the question
In order to leave comments, you need to log in
I would set up logical replication of the required table(s)
. postgres can't replicate to a table with a different name. That is, you get something like:
DB1 (schema1.table) -> DB21 (schema1.table)
DB2 (schema2.table) -> DB21 (schema2.table)
DB3 (schema2.table) -> DB21 (schema3.table) )
...
Further already unite the data in DB21 from schemaN.table as it is necessary to you. It is possible (like... to tell the truth did not check) to make tables schemaN.table partitions of the general table, for example. Or organize a materialized view.
You can also connect tables with DB1-20 through a foreign data wrapper and transfer data from them (here your task is to write a script / query so that nothing is lost and does not slow down).
This, of course, is all provided that your databases can connect to each other.
The data synchronization architecture is actually contained in the data structures of the leading DBMS creators. A great example to follow is SQL Server.
It, starting from version 2008 or 2012, has the option of reverse checking the page of the file storing data - the sum of multiple truncated values from 8 fields. If the function and value differ, the page is declared corrupted, after which the mechanism is activated that loads a copy of the same page from the machine where this page is "healthy".
How to split the data in the table into blocks / pages, which function is better for counting and how to communicate between databases, these are probably the parameters that you can suggest. If you are interested in this approach, please write to make a complete solution.
There is an option to use Kafka. In each of the 20 databases, a plugin is installed that publishes all data changes to Kafka. In Kafka, using KSQL, data streams are combined into one and the already combined resulting stream is written to the resulting database.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question