How to implement data/virtual shard migration?

Q

quest20172017-05-18 18:29:20

SQL

quest2017, 2017-05-18 18:29:20

Who over SQL did the migration of data / shards online (for profit) please share your ideas!
as I understand it, you need to create a directory / function that says on which server a specific key is located (an entry with a specific identifier). then the migration process should exclusively capture the keys on server "A" (select for update), atomically hide the keys in a hidden table and not atomically move them to server "B" in a hidden table, where they should be atomically available and change the location of the keys from the server in the directory "A" to server "B". The hidden table is in theory needed because moving keys between servers is not atomic and it may happen that at some point server “B” will have only a part of the keys (if we move not just a single key, but many keys - a virtual shard).
what if at the time when the migration began, the client had already arrived at the server "A" and did not find the data? stupidly hammering into the directory then to the server "A" until the server address is updated in the directory and the client does not go to the new server "B"? Can I come up with some kind of notification here instead of polling? How to make it?
How do you build a directory with the key corresponding to the server or the key to the virtual shard and the virtual shard to the server? By what algorithm and with what tools?
In particular, I read a story where people have gone from virtual shards simply to matching the key to the server, where should this match be stored?
Thanks in advance for ideas and answers!

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

R

rPman, 2017-05-18
@rPman

Make identifiers such that they can be guaranteed to be divided into groups, for example, with a step of N (max number of servers) and the starting value of sequences for each server is different (from 1 to N) - let's call this starting number 'index modulus', this way you will group data, and it will be possible to move them in groups, according to the index belonging to its index module (you can get it by taking the module from the index by N, if N is a power of two, then a bit mask will be enough).
It is enough for you to store which module index is located on which physical server, and move the data at once in these groups, naturally within one transaction (open a transaction on both servers, pour data, at the end, commit on one, note in the table that the data has moved, delete , commit on another.
To prevent data from appearing on the server during the transfer process, make mechanisms that disable this node from creating new records (a sort of read / write only) and logging the fact that the record was modified by id (the date of the last change in each table - or use the regular mechanisms of the low-level sql-log- server), i.e. the table that will be responsible for information about the placement of groups on servers must also contain this flag. And yes, replicate this table between servers with regular sql server tools.
By the way, I remember exactly that in oracle it was possible to set up data replication by condition ... I won’t be surprised if others have it, then the internal mechanism of the sql server will be responsible for transferring data between servers - this is an order of magnitude more efficient and reliable than self-written tools.