Optimal implementation of searching and sorting objects with a fixed number of fields?

F

Fastto2012-02-15 12:41:15

MySQL

Fastto, 2012-02-15 12:41:15

There is an object with a fixed number of fields.
It is necessary to implement a search by a previously unknown number of fields and sorting by one of the fields.
Let's say there are fields field1, field2 and field3. At one point in time, you may need to search only by field1, at another - by field1 and field3, at the third by all together, and at the same time you need the ability to sort by one of these fields.
What is the solution? I don’t want to overload the table with a large number of indexes, going through all possible filter options (there are 15 fields in the real task)
Initial data: mySQL and PHP
As I see the solution:
create one common index
1) index idx1(field1, field2, field3)
and one index per each field that needs sorting:
2) index idx2(field2)
3) index idx3(field3)
Total sorting is available for each of the 3 fields, plus a full search
And the query should be formed so that all index components are always used.
Those. if you need to search only by field2 and field3, and field1 is for example foreign key (int, unsigned, null), i.e. cannot be 0, then where should be:
field1 > 0 AND field2 = 'searchValue2' AND field3 = 'searchValue3'
And so on, for all search options.
Is this bike the right choice?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

S

ShouldNotSeeMe, 2012-02-15
@ShouldNotSeeMe

If there are many columns (millions-billions), then the usual approach with indexes and WHERE will not work. Not so long ago I faced a similar problem. The options I see are:
1. Sphinx. Designed for full-text search, but can also search only by numeric fields (with docinfo=extern). If there are many records in the database, then it consumes a lot of memory. Works fast.
2. Tables in memory. Similar to item 1.
3. EAV or storing each field in a separate table. A little slower if the user has set few parameters, and much slower if there are many.
4. OLAP-like methods. The largest requirements for memory and disk space (with a large number of options - prohibitive), but fast search by a large number of parameters.
For myself, I decided that the best option is Sphinx (docinfo=inline) if text search is needed, and if there is no such need, a combined version of paragraphs 3 and 4, i.e. OLAP-like indexing only works for the most common field variations.

A

AxisPod, 2012-02-15
@AxisPod

Sphinx and you don’t need to invent something, in addition there is SphinxQL, which greatly facilitates migration. We have more than a million records, quite voluminous, searches very quickly.

E

egorinsk, 2012-02-15
@egorinsk

Ha, doing a user search on another sub-social network or dating site? Well, well, on Mayskul with a bunch of indexes, your search will work exactly up to 10,000 users. In general, it would be worthwhile to first study which parameters are used more often for searching, and make indexes on them.
With an increase in the number of users, as an option for a crutch for the first time, you can try to fasten a sphinx or some kind of Lucene, but with an increase in load, this functionality will still have to be moved to a separate custom module. Ask Vkontakteers, for example, how they made a search for people (a separate module in C).