A
A
Alexander Karpov2019-07-12 12:18:20
Sphinx
Alexander Karpov, 2019-07-12 12:18:20

Why might wordforms Sphinx not work on RealTime indexes?

I started small, opened the official article about wordforms and created an index exactly as shown in the example:
https://sphinxsearch.com/blog/2014/12/04/how-to-us...
My config became like this:

indexer
{
        mem_limit               = 128M
}


searchd
{
        listen                  = 9312
        listen                  = 9306:mysql41
        log                     = /var/log/sphinx/searchd.log
        query_log               = /var/log/sphinx/query.log
        read_timeout            = 5
        max_children            = 30
        pid_file                = /var/run/sphinx/searchd.pid
        seamless_rotate         = 1
        preopen_indexes         = 1
        unlink_old              = 1
        workers                 = threads # for RT to work
        binlog_path             = /var/lib/sphinx/
}
source tsv_test
{
        type                            = tsvpipe
        tsvpipe_command                 = cat sample.tsv
        tsvpipe_field_string            = title

}

index tsv_test
{
        source          = tsv_test
        path            = /var/lib/sphinx/tsv_test
        wordforms = syns.txt
}

After that, I can successfully connect to mysql with the command:
mysql -h0 -P9306
and request to get
select * from tsv_test where match('c2d');
+------+------------+
| id   | title      |
+------+------------+
|    1 | Core 2 Duo |
+------+------------+
1 row in set (0.00 sec)

Great, I thought, everything is working, I'll try to connect wordforms to the real-time index. I created a realtime index according to the sphinxsearch.com/docs/current/rt-overview.html instructions and the config became like this:
indexer
{
        mem_limit               = 128M
}


searchd
{
        listen                  = 9312
        listen                  = 9306:mysql41
        log                     = /var/log/sphinx/searchd.log
        query_log               = /var/log/sphinx/query.log
        read_timeout            = 5
        max_children            = 30
        pid_file                = /var/run/sphinx/searchd.pid
        seamless_rotate         = 1
        preopen_indexes         = 1
        unlink_old              = 1
        workers                 = threads # for RT to work
        binlog_path             = /var/lib/sphinx/
}
source tsv_test
{
        type                            = tsvpipe
        tsvpipe_command                 = cat sample.tsv
        tsvpipe_field_string            = title

}

index tsv_test
{
        source          = tsv_test
        path            = /var/lib/sphinx/tsv_test
        wordforms = syns.txt
}

index rt
{
        type = rt
        path = /var/lib/sphinx/rt
        rt_field = title
        rt_field = content
        rt_attr_uint = gid
        wordforms = syns.txt
}

Connecting to MySQL and making an entry
INSERT INTO rt VALUES ( 1, 'Core 2 Duo', 'Core 2 Duo' , 5);

Checking:
mysql> SELECT * FROM rt WHERE MATCH('Core 2 Duo');
+------+------+
| id   | gid  |
+------+------+
|    1 |    5 |
+------+------+
1 row in set (0.00 sec)

I make a similar request, hoping that the synonym will work
mysql> SELECT * FROM rt WHERE MATCH('c2d');
And I get nothing. I
Empty set (0.00 sec)
'm sure that it's not in syns.txt, but I'll give it below:
c2d > Core 2 Duo
e6600 > Core 2 Duo
core 2duo > Core 2 Duo

Can someone help me. I haven't found anything in the documentation yet.
Although the rumor says that everything should work.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Alexander Karpov, 2019-07-23
@Inkognitoss

This article helped me:
chakrygin.ru/2013/07/sphinx-search.html
Namely, this:
option The wordforms option specifies the path to a custom wordform file, which has two main purposes.
First, this file can be used to specify the correct normalized form of a word in cases where the stemmer does it wrong. For example, if you need to specify that the word "girls" is still a word form from the word "girl", then the following line can be added to the word forms file:
Please note that when using a stemmer, after the => sign, it is the stem of the word (“girl”, not “girl”) that must go. it is on the basis of the word that the search will subsequently be performed. Also note that if you use an index in utf-8 encoding, then the wordform file must also be saved in the same encoding.
As a result, the words "girl" and "girls" will be reduced to the same stem "girls" and will be considered the same in the search.

M
ManticoreSearch, 2019-07-12
@ManticoreSearch

Most likely, you first created the RT index, and then wrote the wordforms. In this case, you should now see something like this:

mysql> show index rt settings;
+---------------+-----------------------+
| Variable_name | Value                 |
+---------------+-----------------------+
| settings      | charset_type = utf-8
 |
+---------------+-----------------------+
1 row in set (0.00 sec)

If so, then this should help:
After that, the status should be like this:
mysql> show index rt settings;
+---------------+--------------------------------------------+
| Variable_name | Value                                      |
+---------------+--------------------------------------------+
| settings      | charset_type = utf-8
wordforms = syns.txt
 |
+---------------+--------------------------------------------+
1 row in set (0.00 sec)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question