How to correctly synchronize data between client applications?

L

Leonid Rozhnov2018-04-05 19:11:17

Database

Leonid Rozhnov, 2018-04-05 19:11:17

There is a server application, for performance and non-attachment to the presence of the Internet, clients have local copies with the amount of data necessary for the client to work. Clients work with these servers, all changes are written to local databases, after which, once every 5 minutes, they are synchronized through the RabbitMQ queue to the main server and receive revision changes from it.
Of the important details - one client can have several local servers at once, the data between them is synchronized through the main server. The revision of changes is implemented at the application level, when opening a transaction, we get a new revision for this transaction from the database, mark all changed entities with it, and save it in the database.
Some time ago, we noticed that the approach leads to data loss during the exchange, if the amount of data coming in synchronization packets varies greatly (several thousand changes come from the local server A, which has sat without the Internet, and a few from server B).
At the same time, since we start importing from server A earlier - its changes are marked with a revision that is smaller than the data from server B, but since the import takes longer and the commit occurs much later - the revision issued to server B has already gone to the local servers and they believe that already know server A's changes, even though they are only being committed to the database.
We solved the problem in a simple way, by introducing pessimistic large-block locks, leading to the fact that only one thread can edit intersecting data.
And it seems like we lived with it until clients from 50-60 local servers began to appear, data from which we do not have time to import. Yes, now we have accelerated them in some way, added caching and temporarily solved the problem, and there is still room for improvement in the current solution (switch to more optimistic locks, for example). But, they began to wonder, perhaps our approach is fundamentally wrong and we are making crutches when there is a more or less standard solution for this problem?
In general, we found a solution at the database level, which would seem to suit us and allow us not to completely rewrite our data exchange mechanism ( https://docs.microsoft.com/en-us/sql/relational-da...But the problem is that in the next year we plan to switch from MS SQL (yes, because of the cost of licenses), and we would not want to get hung up on its specifics. For now, we are planning to switch to PostgreSQL, but in general we are ready to think about other alternatives if they would allow us to solve this (and others naturally) problem easier. Perhaps you can recommend articles/books on this topic, or has someone encountered such problems and can describe in which direction we should think?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

D

Dmitry Entelis, 2018-04-05
@DmitriyEntelis

From the description, it turns out that versioning goes at the level of the entire database, and not specific objects?
If it's not a secret, what kind of data?
About "thousands of pieces" - is this a metaphor, or is the flow of records really small?
And how do you resolve (plan to resolve) overwriting data in this architecture, even without taking into account technical problems?
Or is the data such that clients cannot overwrite it?
Then I would still look in the direction of introducing versioning for specific objects, although this is tempting to "rewrite the whole thing"

X

xmoonlight, 2018-04-06
@xmoonlight

If you have synchronization on the client, then you should distinguish between working with the database and modifying the database:
1. Working is SELECT.
2. Modification is everything else.
When modifying: read the logical "difference" between the current and modified local database and store in a pending modifying query stack (to be run on the central database after submission).
Example: A record was added and immediately deleted - the "difference" of meaningful data is zero. There are no changes in modifying queries.
After the appearance of the network (or by timeout) - send all modifying requests to the central database server for their subsequent launch there.
Before modifying the next data, check immediately for conflicts between all clients (incoming stacks). In case of conflict - take the stack of the latest (new timestamp) client with the modification of this data.
That is, you need to build a system for parsing synchronization requests from clients on the server with the search for intersections and building the correct queue for their execution in the correct order.