V
V
vc4vc2017-10-03 03:06:35
Database
vc4vc, 2017-10-03 03:06:35

Database to store big data?

I am considering one of my Internet sites, the essence is this,
In total, a table of about 40-60 million rows will be assembled gradually (about 10 rows per second). And then it will also need to be updated or new data added to it.
There will be approximately the following loads on the database:
- search - is there a record in the database with the specified name, if so, we update the data there. Those. before adding a record, it will check if it is in the database and add / update data. The function is very necessary and massive, so I would not want it to run longer than 0.1 seconds.
- search in the database with the specified parameters (for example, so that such and such a parameter is greater than the specified value and similar conditions) and this function will be used quite often (in total, 100 times a day, and subsequently it will only increase many times depending on the number of site users). Here such a speed is not so critical, but it should also be relatively fast (say 5-20 seconds maximum)
Data will be stored in a simple form: id line, name, link to the site, date, date, numeric and text fields (up to things).
The database itself is very simple in structure, and can be stored in the 1st database in the 1st table. No relationships with other tables are expected. Or it can only store, say, categories and their number in a separate table. And in the main write only the number of the category.
The data will not be updated all at once, but gradually and constantly (several additions / changes per second or less).
The data in the table will be relevant only up to about 6 months, those who are older will no longer be needed and can be deleted.
The main writing language is presumably php.
The first part is for the user, which shows information on the parameters, the second part works constantly and gradually adds / updates information in the database.

What database to choose? Is MySQL suitable for these tasks?

Answer the question

In order to leave comments, you need to log in

5 answer(s)
E
Eugene Wolf, 2017-10-03
@Wolfnsex

What database to choose? Is MySQL suitable for these tasks?
If you have the proper experience with it, the skills of proper database design and a complete understanding of why to do "exactly like this" and "why not otherwise?", I think it's quite suitable. In general, databases are usually evaluated not by the number of records in the 1st (one) table, but by the total amount of data (in giga/petabytes) and some other parameters.
- search - is there a record in the database with the specified name, if so, we update the data there. Those. before adding a record (and let me remind you - at first it will be 5-40 million and will constantly increase) it will check if it is in the database and add / update data.
There are indexes for this, in all databases known to me. Presumably - a standard B-tree index, it works in almost the same way in all databases.
The database will be loaded like this:
You will have loads on the hardware and not on the base, if it withstands - then from the point of view of the database - logical problems for storing 40 million. records - I do not see.
I want to know in what ways it is possible to organize the structure of storing large information?
"Big Information" or Big Data? 40 million records - this is not necessarily a large volume. For example, an index on a numeric (INT) field for 40 million. records will take only a few megabytes. To store "large information" - you can take, for example, PostgreSQL, there is a ready-made mechanism, TOAST , designed specifically for this, or design the MySQL database in such a way that the necessary data would be separate from any "information garbage" ("tails" ), this will reduce the size of a separate table on disk and, as a result, increase the speed of working with it.

T
ThunderCat, 2017-10-03
@ThunderCat

Every day or a little less often, you will need to update data for 5-40 million records,
does it have to be done in one go?
I want to know in what ways it is possible to organize the structure of storing large information?
read "normal forms db".
What database to choose? Is MySQL suitable for these tasks?
Muscle or postges, here you already need to look at the hardware / software bundle, because your task is either very non-standard, or you are designing something incorrectly, you have a "super-secret task", respectively, a very finger-poking answer.
UPD:
The data will be stored in a simple form: line id,
I hope this is a typo, in the sense - id type integer?
Or it can only store, say, categories and their number in a separate table. And in the main write only the number of the category.
read about normalization, no, really, it's IMPORTANT.
Numeric data ALWAYS work faster than mixed data (alphanumeric) with equal length (in "characters"), I hope this is obvious. Accordingly, the selection where categoryid = 55 will work faster than where category = 'somecategoryname'. Other than that, I don't see much of a problem.

X
xmoonlight, 2017-10-03
@xmoonlight

Every day or a little less often it will be necessary to update data for 5-40 million records, and this figure will constantly grow, every month approximately 5-10% of records will be added to the table
This is a scrapping service (perhaps a search engine or some kind of social network analytics).
I'd be thinking about how you would get data before updating for 5-40 million records per day on a single server: that's ~500 requests per second just for getting data!
You do not want to shove all this on one server ?!
Limits on the number of lines .
The MyISAM storage engine supports 2^32 rows per table, but you can build MySQL with the --with-big-tables option to make it support up to 2^64 rows per table.

�
âš¡ Kotobotov âš¡, 2017-10-03
@angrySCV

I use mongo to store statistics on queries in search engines
name -> hashed index.
there are more than 2 billion records in the database - searching and updating takes less than 1 millisecond.
over 10 times faster than your requirements while storing 50 times more data.

V
Vyacheslav Uspensky, 2017-10-03
@Kwisatz

Everything above is correct except for one. There is no reason at all to use MySQL. IMHO take PostgreSQL and hire a good DBA

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question