V
V
Vitaliy Orlov2018-12-24 23:40:10
Database
Vitaliy Orlov, 2018-12-24 23:40:10

How to split a transaction into microservices while maintaining data consistency?

Hello everyone and Merry Christmas!
During the interview, I was asked the following question:


The database had several tables in which data was written using a transaction. let it be adding data.
The project has grown and split into microservices. Now each table is behind a separate microservice, in a separate database.
How to split a transaction into requests to microservices while maintaining data consistency.

I did not encounter such a task, which of course I reported about, and suggested using:
- flags responsible for saving data
- a flag that data can be used
- a timestamp for cleaning data on a schedule
Example:
transaction
-----------
id
service_1_saved - метка сохранение данных на сервисах
service_2_saved
service_3_saved
service_1_transaction_complete - метка выполнения транзакции на сервисах
service_2_transaction_complete
service_3_transaction_complete
complete - метка выполнение транзакции
fail_at - метка времени

service_1(2,3)
-----------
id
data
transaction_id
transaction_complete

The essence of the work:

Step 1) We pass data to the services and set the flag service_1.transaction_complete = 0. As long as this flag is 0, the data cannot be used. Next, we send in the response that the data has been saved, thereby setting transaction.service_1_saved.
Stage 2) If all services have processed and saved the data (i.e. transaction.service_(1,2,3)_saved is filled), we consider the transaction successful and update the transaction completion flag on the services service_(1,2,3).transaction_complete = 1. In response, update transaction.service_(1,2,3)_transaction_complete = 1
Step 3) If all services have transaction.service_(1,2,3)_transaction_complete = 1, then complete the transaction by setting transaction.complete = 1
In case of failures, we clean the data using the connection through transaction.id = transaction_id

This is how I got a transaction with strong connectivity and not very high reliability :)
Question: how do they do it right?

Answer the question

In order to leave comments, you need to log in

5 answer(s)
⚡ Kotobotov ⚡, 2018-12-25
@orlov0562

what you described is called a two-phase commit, used to be very often used.
now they are more actively using a similar but slightly different approach, also related to the fact that they reserve certain resources (for example, money in the account, and goods in the warehouse), then they check the intermediate status of the operation, and then they carry out and confirm the operation - the difference is that nothing is overwritten, but continuously all requests are logged, and any rollbacks of the operation go through the addition of new request entries to the log (it is also the message queue)
----
there are a lot of subtleties, for example, you talked about time stamps, in general, timestamps are added - if you need to control the order of intermediate steps (but usually this is not so important, so the timestamp is not always added), but they add a unique operation ID, maybe in case of failure request (with, for example, a long wait for a response), the request may be "resent", and this label with a unique ID allows us not to duplicate the same operation.
=====
there are subtleties, for example, with how these microservices are divided, maybe it's just a duplication of the same service, but for example, each of them processes requests from different user segments, so there is no need to coordinate any operations between these microservices.
====
in my opinion, these are just tricky questions that do not have the right answer, schemes are selected specifically for the project and tasks, especially if you have not developed any payment system, such as Yandex.money, then it is generally useless to discuss something.
this is not a stone in your garden, generally few people really do this, I’m sure those who asked you this don’t understand much about it, but they ask such things to drain you.

I
index0h, 2018-12-25
@index0h

If there is a problem, there is a very high probability that the division into microservices was not correct and it is worth returning to the monolith.
As for distributed transactions. At a minimum, you can try to repeat requests N times, otherwise rollback on each of the services.
As an option, you can use all sorts of kafka to store the history of messages in order to further restore unprocessed transactions.
It is necessary to consider the reasons for the rollback of transactions by each of the services, for example, there is no money on the account - a payment transaction is impossible.
There is no correct option. It all depends on the project

S
Sergey Rogozhkin, 2019-01-05
@thecoder

The most primitive way that survives event forking is to close all microservices to one data bus (message queue), force them to support the cancellation of the operation done and force them to withstand duplicate calls. At the start of a multi-phase operation, an operation identifier is created that passes through all services (idempotency key), by which duplication is prevented and operations are rolled back. Moreover, this key can be driven even through the payment system.
Now imagine magic. You create an order that leads to a lot of parallel, sequential and very complex-branched tasks (reserving at the warehouse, sending notifications, debiting funds, etc.), in the depths of which something breaks. Since the services are isolated and know almost nothing about each other, all those involved must be forced to return "as it was" through a common channel. A broken microservice, knowing the operation key, throws the message "operation (id) failed, without details" into the bus. Further, all microservices: 1) roll back the operation by id, if they have already done it 2) stop responding to such an id if requests have not yet arrived.
Total: the system returned to its original state in its entirety.
You have to pay for everything: tests are required for each operation and double work to add reverse operations, sometimes non-trivial logic (for example, in terms of sending messages). If something went wrong, there should be very high-quality logs - the debugger cannot pass this.

A
Artemy, 2018-12-25
@MetaAbstract

Distributed transaction coordinator , and generally speaking, blockchain can probably be used)

K
khevse, 2019-01-17
@khevse

There is no single answer to this problem, because in addition to maintaining data consistency, there may be additional requirements. For example:
- the table is too large, so sharding is applied (aka partitioning);
- replication with one or more leading nodes is used;
- linearization is required;
- etc. etc. both individually and collectively.
I can only recommend this book (now I'm finishing it myself):
https://www.piter.com/product/vysokonagruzhennye-p...
It describes in great detail how to work with systems that work with large amounts of data.
I found the book in the question:
A book on distributed fault-tolerant systems?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question