V
V
vitalets2012-04-12 14:44:26
MySQL
vitalets, 2012-04-12 14:44:26

Reducing the size of the database

Hello everyone,

I want to get your advice on the following task:
there is a mysql database in which recently added records (for the last 1-2 months) are mainly used for work. After use, the data essentially just lies in the database, taking up space and slowing down the work.
But you can't delete them, because sometimes (infrequently) you have to make reports for a long period, for example, for a year or more.
The question is how to optimize all this?

Option 1 :
Create a copy of the database, which will serve as an archive and once a week transfer data from the main database older than 2 months. Use sql script with expressions like:

insert archive.document select * from prod.document where create_date <= DATE_SUB(now(), interval 2 month);
delete * from prod.document where create_date <= DATE_SUB(now(), interval 2 month);

But how in this case to write queries for a long time?
Use complex scripts with UNION or build a common base from two parts? Both seem to be not very convenient...

Option 2 :
Again, create a copy of the database that will serve as an archive and set up replication on it. And then, in some tricky way, delete ancient lines from the main database, but so that these changes are not replicated . Is this possible in principle?

Maybe there are some other options?
Thank you!

Answer the question

In order to leave comments, you need to log in

7 answer(s)
D
Dunadan, 2012-04-12
@Dunadan

And what prevents to make aggregation of statistics?
For example, you count the number of views of a certain page.
Each hit is one entry in the database with the IP address of the visitor.
For a pivot table, a record of the form page_id, page_view_count, week_id will be enough, where page_id
is the page index
page_view_count is page views per week
week_id is the ordinal number of the week in the year.
For reports from the "week-month-six months-year" series, it will do. You can also count the number of hosts.
As a result, all statistics on one page per year will take as many as 52 records.

Z
ztxn, 2012-04-12
@ztxn

I correctly understand that using the terms "db", "base" you mean "table"?
If so, you can look towards partitioning .

V
vitalets, 2012-04-12
@vitalets

2 Dunadan: I
got the idea. The problem is that in advance I cannot know under what conditions the next report will be requested.
2 ztxn:
No, the base is just the base. Reporting data is stored in the main 4-5 tables.
Sectioning watched. As far as I understand, it will not reduce the size of the database during dumps, right?

Z
zuborg, 2012-04-12
@zuborg

Can make an insert trigger on the main table that will copy the row to the archive table. delete is appropriate to ignore. update/replace - see for yourself.
Make basic queries on the main table, and for reports - on the archive.

V
Vampiro, 2012-04-12
@Vampiro

Um… maybe something like
create or replace VIEW forFastReports as select * from mySlowTable where added_at < date_sub(now(),INTERVAL 1 DAY);?

V
Vladimir, 2012-04-12
@de1vin

I am not an educated person, I didn’t graduate from high school - I studied mainly from accessible books, so don’t kick too much :)
There is such a term as “data denormalization”. For such cases, I came up with “table denormalization” for myself.
For example, we have a news resource where news cannot be physically deleted. Those. remote news should be available if necessary from the control panel, but for the client side they are "missing". I made a duplicate of the table, in which, when deleting, I transferred the record with all the fields. This offloaded the work table, but loaded the application in case it needed to query a remote record. And in view of the fact that there are usually thousands of times fewer such requests, I settled on this approach.
Those. I used (if necessary) union together with caching in my solutions.

A
admin4eg, 2012-04-13
@admin4eg

I have a similar problem, there is a blog on WP in it, the database already weighs around 700-800mb , I
also thought about how to pour the old information into other databases, I thought that I should somehow rely on years if a person requested a post with the url /2008/ then we take the data from the database with the name site_2008
But we didn’t get to the practical implementation, we took the entire database to another server, where there are a lot of operas and resources.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question