Huge mySQL database - what to learn?

Andrey2017-08-11 10:16:08

laptops

Andrey, 2017-08-11 10:16:08

It is necessary to make a workable database of huge size - it will contain more than 240 million records in one table, and several times more in another. The server is. Basic knowledge of databases.
What would you recommend to learn for a quick start?
Is it right to do such a project in mySQl?

Answer the question

In order to leave comments, you need to log in

7 answer(s)

Ronald McDonald, 2019-03-29
@Zoominger

If the slot is free, then who forbids something.

Viktor, 2019-03-30
@nehrung

My Asus Zenbook 2 is soldered and 8 added to a single slot. No problem! How it turns out with two-channel - I don’t know, and it doesn’t shake, it works smartly without it (or with it).

Ilya, 2017-08-11
@rpsv

Basically, everything depends not on the size of the database, but on what you need this data for, what structure it has and how you will process it.
First, you need to decide whether a relational database is needed or not.
Then choose a DBMS depending on your tasks.

Maxim Kudryavtsev, 2017-08-11
@kumaxim

The first - make the circuit of the DB. Do you know third normal form? Bring your data to it, for starters.
Then you need to define portion conditions. You can get acquainted with it here. In short, it allows you to store the database on more than one hard disk and adds speed when you have a lot of entries in the table.
Then you need to understand what operations you have most often - reading or writing. Based on this, choose a storage mechanism (MyISAM or InnoDB) and optimize the DBMS settings, incl. cache.
In general, your question is very vague. It is difficult to give a definite answer to it now. You write about 240 million records. Well, maybe this is a lot for you, 5 years ago I would have been scared too. You have to understand that the number of records doesn't matter much. It is necessary to understand their structure, composition, frequency of calls, nature of these calls, size, etc. Get the number out of your head and focus on implementation. Also solve problems as they come up. It's worth optimizing something when it's already working. You need to optimize what is understandable, and not some abstract numbers of non-existent database tables

Andrey, 2017-08-11
@andrshpa

I look at the noSQL side, in particular - Column Stores, namely Apache Cassandra.
But I'm not sure if that would be the right choice. Confuses the lack of the ability to make the usual selections, and also the fact that this DBMS is designed more for constantly adding data to it than for working with existing ones.
I'm somewhat confused. On the one hand, noSQL is very attractive in some aspects. But how to make such selections on it // The filter can be something like: select all accounts with + (1) on certain services (at the same time, there is an additional filter for services - by hard-level and domain), + this will also need to be screwed filter by dates, overlooked, but in general it does not change the essence.
I have little idea. How to do them in principle on relational databases is clearer.
A more detailed description of the task (double from my comment above):
"The database contains tables
accounts - in it "columns" - ID (auto-increment), email (varchar 255), password (varchar 255), login (varchar 255), and comment (TEXT) For comments, I understand that I need to put them in a separate table because there will be
few of them against the general background and this will increase performance.services
- ID(autoincrement), name (varchar 255), hard_level(tinyint), domain (varchar 255)
status - ID(auto-increment), account_id(int 11), service_id(int 11), value(tinyint)
I think it's clear how it all works:
Accounts are added to the accounts
table, services practically do not change in the services table.
In the status table, when information appears, the link account_id is entered - the corresponding account ID from acccounts, service_id - the corresponding service ID, and the value 0/1. For one account, there can be a number of records from several to a full number of services (160+)
Now the database has 18 million test records in accounts, based on which I tried to fill in the status. there, huge ID values began to be obtained, everything lags *. On a real number of accounts of 250 million+ - the ID values in the status table will become simply huge - roughly speaking that around 42500000000 is a 12-digit number, which seems to me sooooo not great.
Also, JOIN will be used to obtain the result with such a table architecture, which is also bad for performance.
The filter can be something like: select all accounts with +(1) on certain services (at the same time, there is an additional filter for services - by hard-level and domain), + you will also need to attach a filter by date to this, I lost sight of it, but in general, this does not change the essence.
*computer i7 3770, 8gb RAM, win7, Apache and mySQL installed separately (didn't set it up yet). the server is about the same capacity, you can install any OS and do whatever you want. "
"

xmoonlight, 2017-08-11
@xmoonlight

Learn how to test the performance of a database schema and how to profile the performance of queries against that database.
mysql is more than enough.
Read ONLY! office documentation!

asd111, 2017-09-02
@asd111

https://www.percona.com/blog/2006/06/09/why-mysql-...