How to work with large databases for web projects?

Goodver2012-06-25 17:31:34

PHP

Goodver, 2012-06-25 17:31:34

Website, there are several large databases with 10~20 million records. I use a bunch of php-mysql
In this case, if you need to make a SELECT selection from at least one database, then it takes up to several minutes.

I installed Sphinx, everything is fine with the search, but there is no live update, but I need it to work right away when new information is entered or the old one changes. That is, as in Habré, for example, when I create a new question or topic, I can immediately go to its page, and the Sphins needs to update the indexes for this.

How, having millions of records, let's say in Habré everything is indexed so quickly?

Prompt please tools, methods of work with the big DB. That is, I need to make the maximum sampling speed SELECT (INSERT, UPDATE is also necessary, but secondary), and so that the database is live, that is, if information is entered, then it can be used immediately. Maybe other databases can be used not by mysql ...

And yet, is it possible to at least roughly say the dependence of processing such queries on the availability of RAM. That is, roughly speaking, 2GB RAM - 2 minutes, 4GB RAM - 1 minute. Interested at least approximate dependence.

Answer the question

In order to leave comments, you need to log in

10 answer(s)

Alexander, 2012-06-25
@disc

Do you have indexes on tables?
Sphinx is a search engine, with a short index update time, you will have fresh information in the index.
But it seems to me that you are doing something wrong:
Why actually display data from the sphinx index, and not directly from the database? If the problem is only in speed, then you have problems with requests. It is necessary to analyze the queries through EXPALIN and check the presence and use of indexes.

edogs, 2012-06-25
@edogs

no indexes

Put the indexes.
Your K.O.

Dmitry Sergeev, 2012-06-25
@JetMaster

Sphinx has a real-time indexes
sphinxsearch.com/docs/2.0.4/rt-indexes.html
It also has a delta index
sphinxsearch.com/docs/2.0.4/delta-updates.html
this is when you have the main (main) index + small delta index, which stores the index by attribute, for example id>99999. When you change in the database, you update only the delta index and the search is carried out by the main+delta index

strib, 2012-06-25
@strib

Optimize queries and database.
Just writing a query is not enough, it is important to write it in such a way that the operations rise in the optimal order.
dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html
i.e. for example, doing data aggregation and then filtering is obviously longer than first filtering, and then merging, eliminating the use of the IN operator, etc., etc.
Partitions! Look towards sectioning.
Then you need to watch the performance of the server, the bottleneck may be the disk, see what iostat shows during the execution of the request.
No, you can not get attached to the amount of RAM. Depends on indexes, types of operations, disk load, and if the engine is InnoDB, then it makes sense to tune the parameters.
Briefly speaking. There were several such tables in MySQL 4.xx - everything worked fine if neat. In 5.x it should work much better, because there are more new funds. The volumes specified by you (10-20 million lines) normally processed on the notebook.

freeek, 2012-06-25
@freeek

In principle, working with PHP + MySQL, I made a selection not the most trivial of tables with more than a million records and it did not take more than two seconds, so I think that disc is right.
Of course, you can look towards Oracle DB, and, looking at Twitter, towards RoR. But, first, you need to optimize queries, check indexes, etc.

mithraen, 2012-06-25
@mithraen

1. The mysqltuner utility will help tune mysql itself
2. Use explain to see if indexes are used when executing queries
3. Your problem is probably related to the fact that the indexes needed to speed up your specific queries have not been created.
You can suggest something more only after seeing the structure of the database and the query itself.
I also recommend Martin Graber's book Understanding SQL. There is a lot of valuable information about query optimization.

hOtRush, 2012-06-25
@hOtRush

mine mysql is not the best choice for such volumes of data.
how much does your base weigh, for interest)

ToSHiC, 2012-06-25
@ToSHiC

The request is simple, without joins? What does he look like? Do you have indexes on those fields that are mentioned in the query conditions?

Dmitry, 2012-06-25
@Neir0

> That is, roughly speaking, 2GB RAM - 2 minutes, 4GB RAM - 1 minute
less than 200 ms, usually such requests are executed

dkurilenko, 2012-06-26
@dkurilenko

Write an insert trigger that will add data directly to the offline index (in this case, Sphinx).
An example of calling PHP code from a trigger: stackoverflow.com/questions/1467369/invoking-a-php-script-from-a-mysql-trigger