How to suppress duplicate query results?

F

Flanker3812014-10-31 12:43:48

PHP

Flanker381, 2014-10-31 12:43:48

The task is more algorithmic than applied.
There is a table with full-text relevant search. The result of each unique request is recorded in a separate table (10 positions, request - result). The problem is that some requests may differ very slightly, declension, 1 short word, etc. At the same time, 5, 10, 20 or more similar requests - result is one. Actually an essence - to stop saving in the table of results of duplicates (a unique index not to offer, the content in requests changes) similar requests.
In my opinion, it is optimal to determine the degree of relevance of previous requests and simply not save if the result of the check is highly relevant. The search is implemented by the query:
SELECT *, MATCH `field` AGAINST ('$search') as relev FROM `table` ORDER BY relev DESC
How to pull the relev indicator into a php variable or set it to a relevance value of about 80% is not in the know.
I would be grateful for any thoughts on this.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

D

Dmitry Entelis, 2014-10-31
@DmitriyEntelis

1. mysql is not able to return search relevance.
2. The meaning of hard caching of query results escapes me.
3. Why not use search tools that are designed for this, such as sphinx.
And the search will be better, and most likely faster, and no cache will be needed.

F

Flanker381, 2014-10-31
@Flanker381

In general it is necessary for the analysis of requests. Accordingly, I save those where the result is unique.
In principle, I figured out how to do this with static content. We just take, let's say, the titles of the first 10 records, sum it up, make a hash (primary key not by id), and save it along with the request. With the following requests, we check whether there is already such a hash in the database.
But the task is complicated by the fact that the content is dynamic. A full cycle of updating content in the database is a month. Today we saved one hash, and tomorrow, as a result of this issue, one new entry and the hash is already different, while under yesterday's hash there will be exactly the same result -> again a double. UPDATE hash records can be done on each new identical request, but in this case there is a window between a new similar duplicate request and updating the old request.
(Sorry for the title, typo)

A

Andrey, 2014-10-31
@andreyvlru

Index your content with the sphinx, when compiling the index, it writes words there without ending
, that is, "car" and "cars" will be the same for it. If you search for such words, the results will be the same. At the output, the sphinx will give you a set of identifiers from the database, you can save it as you like.