Where and how to keep a large database?

intelligentpotato2015-02-27 20:27:38

MySQL

intelligentpotato, 2015-02-27 20:27:38

Hello Toaster!
I have no experience in system administration of large projects. Unfortunately, my "tip" at the moment is to set up LAMP according to instructions from the Internet.
It so happened that the MySQL database of my project has grown to 1.1TB. This terabyte is occupied by one MyISAM table with 340 million records. Apache and MySQL now live on a server from Kimsufi: i5-3570S, 16GB RAM, but this server is already barely able to cope. Average load - about 400 simultaneous connections, mostly INSERT. When overloaded, the base falls, because MyISAM and table lock, respectively. And my bydlokod plays far from the last role. One of the latest falls, which beat 93% of the indices, prompted me to take action.
I think the project has already outgrown the capabilities of a single server and something needs to be done for further growth, but here's what - I'll never know. Considered Google Cloud SQL/Amazon RDS, but it turns out too expensive. I would like to keep within $250-300 of the monthly budget, so that the project would at least go to zero. I suppose it makes sense to set up sharding. What is the optimal size of one shard and what server characteristics does it depend on? Perhaps, in general, it makes sense to switch to another DBMS?

Answer the question

In order to leave comments, you need to log in

6 answer(s)

Alexey Yakhnenko, 2015-02-27
@ayahnenko

But did not try to start the database slightly suboptimized? well there to get couple of new tables?
and then with this approach, no iron is enough.

lega, 2015-02-28
@lega

All the same, there is not enough information about what you have there in json and how you use the data.
Here are some tips:
1) Read about partitioning, maybe you can throw out your ownerID column altogether, and split all data into tables owner1, owner2..., this way you can save on indexes and data, + it will be easier to spread the database across servers (sharding ), and it will work faster that way.
2) Do archiving json, this can reduce the amount of data by 2..10 times.
3) Store old data in an archive, for example, a month has passed, make the resulting reports, caches, etc. what the client can request, and send the data themselves to the archive.
4) Try another database: with postrgresql - you can use compressed json on which you can make indexes, so your varchars are optimized. with nosql / mongodb there are also pluses, for example, 1 "record" will take 1 block of memory, and not several as in sql databases, + the write speed is higher here.
Also, according to the principle of partitioning, you can make data chunks, for example, if you need to select data by day and owner, then at the end of the day you can pack the data into chunks: data, ownerID, archived_json. thus, the size of indexes can decrease by 100 times, data by 10..20 times, + the speed of obtaining data can increase up to 50 times (I had a similar project).
With these tips, you can "turn" 1TB (for example) into 10GB - depends on data and usage.

Saboteur, 2015-02-27
@saboteur_kiev

For starters, why can't you make multiple tables?
If you just need to store the old data, occasionally reading it, and the main activity is inserting and working with the latest data, then you should have thought long ago how you can share the data.
You can monitor, find out what exactly is loaded the most - disk, memory, network?
Maybe just set up replication and split requests between two servers?

Shaks, 2015-02-27
@shaks

Take a look at mysql-slow.log, look at heavy queries, trick them and tweak them. there 146% there are nuts that can be tightened.
Log all requests a bit. Analyze each of them for the CORRECT use of indexes. Clean up the extra indexes (you'll probably free up 20-30 percent of the space). All in all, optimize. And only if there is nothing to optimize, then it is already worth thinking about shards.

Mark Rosenthal, 2015-02-27
@font

Cool men use NoSQL

Ivan, 2015-03-05
@sait4seo

You can also look in the direction of mysql forks like MariaDB, etc.
And so partition by for tables, as they wrote above.
Memcache or redis caching. Or the mongo option for individual tables or fields.