What is the best way to sync 20 bd?

A

Andrey Kuzin2021-01-08 09:20:13

PostgreSQL

Andrey Kuzin, 2021-01-08 09:20:13

There are about 20 databases (sql, pg) with data on customer orders (sales). Every second there is a new order in each database. It is required to collect information to calculate all sales to the server (21st). Cron transfer directly from db to db. The algorithm is clear, we use timestamp, we check, we update. BUT! When transferring, information loss is possible. It is necessary to check whether all the data has been downloaded. In case of data loss, send a message. The question is, can I arrange a check through an intermediate file? Will it give speed/reliability? How to properly build architecture?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

G

galaxy, 2021-01-08
@galaxy

I would set up logical replication of the required table(s)
. postgres can't replicate to a table with a different name. That is, you get something like:
DB1 (schema1.table) -> DB21 (schema1.table)
DB2 (schema2.table) -> DB21 (schema2.table)
DB3 (schema2.table) -> DB21 (schema3.table) )
...
Further already unite the data in DB21 from schemaN.table as it is necessary to you. It is possible (like... to tell the truth did not check) to make tables schemaN.table partitions of the general table, for example. Or organize a materialized view.
You can also connect tables with DB1-20 through a foreign data wrapper and transfer data from them (here your task is to write a script / query so that nothing is lost and does not slow down).
This, of course, is all provided that your databases can connect to each other.

M

Miron, 2021-01-08
@Miron11

The data synchronization architecture is actually contained in the data structures of the leading DBMS creators. A great example to follow is SQL Server.
It, starting from version 2008 or 2012, has the option of reverse checking the page of the file storing data - the sum of multiple truncated values from 8 fields. If the function and value differ, the page is declared corrupted, after which the mechanism is activated that loads a copy of the same page from the machine where this page is "healthy".
How to split the data in the table into blocks / pages, which function is better for counting and how to communicate between databases, these are probably the parameters that you can suggest. If you are interested in this approach, please write to make a complete solution.

S

Slava Rozhnev, 2021-01-08
@rozhnev

There is an option to use Kafka. In each of the 20 databases, a plugin is installed that publishes all data changes to Kafka. In Kafka, using KSQL, data streams are combined into one and the already combined resulting stream is written to the resulting database.