K
K
Konstantin2021-05-02 13:29:31
Database migration
Konstantin, 2021-05-02 13:29:31

What can and what can't a migration contain?

Greetings! Recently there was a question about data processing in the application. Share your experience, what do you do in this case.

1. For example, we need to add some data in an existing table, let's say a column, and based on complex calculations (which cannot be simply described with a simple SQL query), put the data there. For example, let's imagine that you need to put down with the help of an external service - whether the email address of the user that is specified in his profile is available.

2. We need to start processing all the images stored on the site, and resize them in a new resolution (the database is not involved at all in this case, suppose).

Usually I would do this by creating a migration script, throwing the necessary services into it, during deployment the script would work, there would be a necessary entry in the migration table and the goal would be achieved.
But I met a misunderstanding of colleagues who are sure that the sole responsibility of migrations is to make changes to the database schema, and not to the data itself, and even more so not to start some kind of image processing that is not directly related to the database.
During the discussion, we came to the conclusion that it is necessary to make a CLI command that would run once after the deployment of the data, make changes and cut it out during the next deployment so that it does not start again, or provide for a mechanism as in migrations - writing information to the table about that the command has already started.
It looks to me like we've reinvented the migration mechanism, only calling it non-migrations.

I have no purpose to defend my point of view, but rather to understand the issue. How is it customary to do it in your project.
Development is carried out in php, on one of the frameworks. But the question is relevant for any stack.
Perhaps there are articles or books that describe the best practices from your point of view, you can share the title / link.
Thank you.

Answer the question

In order to leave comments, you need to log in

5 answer(s)
P
Philipp, 2021-05-03
@kot999

Normal migration implies the following characteristics:

  • idempotency - when run, a series of migrations leads to the same result
  • reversibility - any migration step can be rolled back with application version rollback
  • integrity - migration transfers the application from a working state to a working state (bugs do not count)

A natural use of migrations is CI/CD. Those. in most cases, minimal application downtime is expected during deployment, which is why migrations mostly change data structures.
But this does not mean at all that the data cannot be changed. It often happens that some system directories need to be updated or user data needs to be changed. For example, one field was used for the first and last name, and now it is divided into two, as it should be. Data migration in this case is a natural and necessary process that can take a significant amount of time. Therefore, the planned process of rolling out the application with its announcement to users is used. Scheduled downtime is standard practice for large, complex applications that can afford it.
If your application cannot afford to stop serving users, then for such cases, auxiliary code is written that handles cases of non-migrated data and in general all sorts of crutches, if only it worked (the main source of crap).
Is image resizing part of the migration? Yes and no. It depends on a number of factors - whether the application will break if there is no picture or if there is an old picture, is this a critical function or not. An example of a critical function is Instagram.
If the application design is done correctly and the function is secondary, then the migration occurs only at the database level, and the resize itself occurs at the application level (script during rollout). An example is the new size of the userpic, no one will die if it is bigger or smaller, it may not look very good for a while, but eventually it will become normal. If a new column is added for a new size, then data from another column is copied into it during migration so as not to break the functionality.
So, my answers to your questions (as I would do):
  1. Would add migrations with adding/removing 2 new columns. The first is for email, the second is email_verified. After deployment, I would run the mail verification script (oh, this process is slow and unreliable). In a good way, on the first login, I would ask you to check and verify email by sending a code to it. I think you already understood that we are falling into the gray area of ​​​​that machine verification is inappropriate here. But let's say we used machine verification, we checked everything, everything needed to be done. In the next version, we are removing the email_verified column.
  2. As a rule, such things do not change the data in the database, which means there is no need for migrations, a script is enough. But, if you are ok with application downtime, then you can completely add this script to the migration, and also remember to roll back this migration (delete new sizes, for example).

As we all see, migration can be considered both in the context of the DBMS and in the context of the application as a whole.
A common practice is to consider migration in the context of databases. Everything else is scripts and crutches in the application itself.
There is no universal recipe. Everything is tied to the business logic and implementation of the application code. If it is impossible to separate complex data processing from the deployment process, then this logic should be built into the migration code. An example of such processing is getting some kind of system identifier that is used when the application is running, for example, rotating API keys when changing the authentication system.

V
Vitaly Karasik, 2021-05-02
@vitaly_il1

As far as I understand, the migration should include SQLs that are data independent.
Therefore, "processing all the pictures stored on the site, and resize in a new resolution" is 100% not a migration.
It can be a one-time deployment script.

A
Adamos, 2021-05-02
@Adamos

An important property of migrations is their reversibility, the ability to roll back changes and get the same state of the site as it was before.
With an additional column and a seed script that will calculate its value for existing records, everything is fine. Rolling back the migration will simply delete it along with all these values.
Then you might finish something and apply the same migration again. Migration is not a one-time script.
But the digestion of pictures you horseradish roll back. So it has nothing to do with migrations, it's just a one-time script.

R
Roman Mirilaczvili, 2021-05-02
@2ord

Data migrations to DBMS and image scaling are different tasks. Therefore, each of them has its own specificity.
1. In a framework DSL migration scenario, I usually described adding a column, then
requesting the type . When receiving a list, compose the list of addresses into an HTTP request to a third-party service and then update the column (there are some peculiarities here). 2. This does not look like a transition from one state to another, but like a provisioning scenario when deploying a web application. We don't write migrations to compile assets (webpack, etc.), do we?SELECT id, email FROM users LIMIT 100 OFFSET :x
That is, in the case of images, we do not change the state of the application. It is what it was, and remains so after running the scaling script. We deployed the application once, getting the desired image dimensions and that's it, that's enough. When a new version is released, the update should no longer be about re-running the script - just migrate the data.

S
Saboteur, 2021-05-02
@saboteur_kiev

Migration is
1) changing the format of data / tables
2) moving to another host / version, shard, when the data is moving somewhere.
In both cases, depending on the amount of data and the criticality of the application, you can either put the application at the time of migration, or provide for workarounds in the application itself.
And just resize images in the database, you can simply do it live, without affecting the application. Unless if something concerns the layout, you can do a check in the application in advance.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question