SELECT WHERE IN: Prompt for the best PHP interaction

gtAlexandr2012-01-19 11:45:26

PHP

gtAlexandr, 2012-01-19 11:45:26

SELECT WHERE IN: Prompt for the best PHP interaction - MySQL

Hello.

How to do it right, how to do it optimally?

Option 1:

Get the result of the first query:
SELECT `ID` FROM `TABLE_A`WHERE `FIELD_C` = '123';

Substitute the result of the first query into the second query:
SELECT * FROM `TABLE_B` WHERE `FIELD_A` IN ('result of the 1st query');

Total transactions: 2
The question arises: what if the result of the first query is 1k identifiers?

Option 2:

SELECT * FROM `TABLE_B` WHERE `FIELD_A` IN (SELECT `ID` FROM `TABLE_A`WHERE `FIELD_C` = '123');

Total transactions: 1
The question arises: is this query complex?

The meaning of my question is:in articles about the architecture of highly loaded systems (for example, such as twitter), it is recommended to use simple queries when interacting with the database, and to generate the result on computing servers (they are easier to scale).
So, what is better:
- 1 option with two simple queries to the database, but, perhaps, a second query that is difficult to parse, if the number of transmitted identifiers, for example, is a thousand?
- Option 2 with one request, but nested?

Thanks for answers!

Answer the question

In order to leave comments, you need to log in

8 answer(s)

shagguboy, 2012-01-19
@shagguboy

>in articles about the architecture of highly loaded systems (such as twitter)
unless you have loads like Twitter, don't do it. just make your life harder by writing your own joins in php

shagguboy, 2012-01-19
@shagguboy

personally, I am for inner joib because in (up to version 6 for sure) is optimized as depend subquery.

Vitaly Zheltyakov, 2012-01-19
@VitaZheltyakov

If the tables are of the same type (this applies to both fields and engines) and caching is not used (not effectively), then option 2 is better.
Otherwise, option 1.

Anatoly, 2012-01-19
@taliban

In general, if you have already started about the architect, then just do this:
tableB->getAll();
You shouldn't worry about how everything will work inside for the time being, the main thing is that this method (function) returns _always_ the same data. Then when (if?) It will slow down, or something will not suit you, you will redo this method as your heart desires.
So no matter what you do, the main thing is to highlight such contentious issues in one place, which _always_ returns the same data, no matter how you change it.

egorinsk, 2012-01-19
@egorinsk

If you choose the second option, it will not be possible to make a cheap optimization by spreading tables on different servers. That is, in this case, as the load increases, you will have to buy arrays of disks, multiprocessor cores, and other crap. And then wring the hands of the architect and rewrite the entire code.
And in the first case, we corny spread the tables across the servers and mess around for another year.
Also, with the first option, entities can be pulled partially from the cache with a multi-request. And the second option will be cached.
So do not listen to the joyners, then you will regret it.

png, 2012-01-19
@png

I am for option 2. if there is no option 3.
Option 1 may sometimes not work due to hitting the limits:
you can hit 2 limits
1. exceeding the maximum request length
2. exceeding the maximum number of parameters in IN. yes, there too there are restrictions
Restrictions are not only in a muscle, but also in , postgesse and other DB.
Option 2 is also slow if we have many records in the outer query.
Simple queries will work faster:
select * from table1,table2 where conditions
or query via
join
select * from table1 join table2 on condition where conditions
which of them will work faster see for yourself. The database settings are filed for the data and for the queries that will be executed on this data.

gro, 2012-01-19
@gro

>and if the result of the first query is 1k identifiers?
But what will be checked?

Sergey Beresnev, 2012-01-20
@sectus

The goal of these optimizations is to make the base do as little as possible, and the application that works with the base do as much as possible. After all, the database is difficult to scale, and the servers on which the application is running can be easily added.
If you have one server for the application and for the database, then such optimization is harmful, because the application will not do faster what the database will do in most cases. Therefore it is necessary to use inner join.
If you are interested for academic reasons or are ready to purchase new servers, then the first option is better.