N
N
Nikolai Kokoulin2018-03-30 16:57:01
Sphinx
Nikolai Kokoulin, 2018-03-30 16:57:01

Sphinx: How to increase the weight of the result with consecutive words?

There is a request:
How to connect at home (yes, without the letter H at the end) there are
results: how to connect a domain
to a site
how to connect mail for a domain at the same time, the output of all results was preserved ; now the weight of these results is the same , since all results contain both, connect and domain the current request to the sphinx

SELECT *, WEIGHT() AS w
FROM answers
WHERE MATCH('(\"как подключить домен*\"^5)|(как^20|как*^10|*как^10|*как*^5)|(подключить^20|подключить*^10|*подключить^10|*подключить*^5)|(домен^20|домен*^10|*домен^10|*домен*^5)')
LIMIT 0,5 
OPTION ranker=wordcount, field_weights = (quest=10, keys=2, answer=1);

in this context the keys and answers fields are empty
current output
{quest: "как подключить почту для домена", category: "2", w: "50"}
{quest: "как подключить домен к сайту", category: "2", w: "50"}
{quest: "Как подключить SSL-сертификат для домена", category: "8", w: "50"}

in general, I need to somehow take into account the distance between words, but I don’t understand how

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
ManticoreSearch, 2018-04-08
@Kokoulin

Hello. Everything should work by default:

mysql> select *, weight() from idx_min where match('Как подключить доме*');
+------+--------------------------------------------------------------------------+----------+
| id   | body                                                                     | weight() |
+------+--------------------------------------------------------------------------+----------+
|    1 | как подключить домен к сайту                                             |     3319 |
|    2 | как подключить почту для домена                                          |     2319 |
|    3 | Как подключить SSL-сертификат для домена                                 |     2319 |
+------+--------------------------------------------------------------------------+----------+
3 rows in set (0.01 sec)

but in principle what you are looking for is called LCS (Longest Common Subsequence) and using ranker expr you can manually adjust the influence of this factor. For example like this:
mysql> select *, weight() from idx_min where match('Как подключить доме* подключить') option ranker=expr('sum(lcs)');
+------+--------------------------------------------------------------------------+----------+
| id   | body                                                                     | weight() |
+------+--------------------------------------------------------------------------+----------+
|    1 | как подключить домен к сайту                                             |        3 |
|    2 | как подключить почту для домена                                          |        2 |
|    3 | Как подключить SSL-сертификат для домена                                 |        2 |
+------+--------------------------------------------------------------------------+----------+
3 rows in set (0.00 sec)

If you change the query, then all will have a weight of one, because this calculated value by a single factor lcs is equal to one for all:
mysql> select *, weight() from idx_min where match('Как доме* подключить') option ranker=expr('sum(lcs)');
+------+--------------------------------------------------------------------------+----------+
| id   | body                                                                     | weight() |
+------+--------------------------------------------------------------------------+----------+
|    1 | как подключить домен к сайту                                             |        1 |
|    2 | как подключить почту для домена                                          |        1 |
|    3 | Как подключить SSL-сертификат для домена                                 |        1 |
+------+--------------------------------------------------------------------------+----------+
3 rows in set (0.01 sec)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question