A
A
alexios2014-07-24 14:07:39
PHP
alexios, 2014-07-24 14:07:39

How to make sphinx morphology work on utf8?

I want to use real-time sphinx index. Below is the config:

index rt_compare
{
  type			= rt
  rt_mem_limit		= 32M

  charset_type = utf-8
  path			= /var/lib/sphinx/rt_compare
  morphology 		=  stem_enru

  rt_field		= content
  rt_attr_string		= content
  rt_attr_uint		= gid

  min_word_len = 3
}
searchd
{
  listen			= 9312
  listen			= 9306:mysql41
  log			= /var/log/sphinx/searchd.log
  query_log		= /var/log/sphinx/query.log
  read_timeout		= 5
  max_children		= 30
  pid_file		= /var/run/sphinx/searchd.pid
  max_matches		= 1000
  seamless_rotate		= 1
  preopen_indexes		= 1
  unlink_old		= 1
  workers			= threads # for RT to work
  binlog_path		= /var/lib/sphinx/
  collation_server = utf8_general_ci
}

I make a request from php:
$pdo = new PDO("mysql:host=127.0.0.1;port=9306;charset=UTF8", "", "");
  $r = $pdo->exec("SET NAMES 'utf8'");
  $r = $pdo->exec("SET CHARACTER_SET_RESULTS=utf8");
  $r = $pdo->query("CALL KEYWORDS('машины', 'rt_compare')" );
  while( $row = $r->fetch() ){
    print_r( $row );
  }

Result:
2bbea251819e4c6493fcb01fa0bbb0ea.png
Observation: data is put into /var/log/sphinx/query.log in windows1251:
b7da0bce91b946cd92ed211ea6794961.png
But why? Here are the locales:
003c454b28bc48fbba9d3a0f12859258.png
What was tried: morphology: stem_enru, lemmatize_ru. charset_table was prescribed, min_stemming_len = 3, dict = keywords in various combinations. Tried to send to windows-1251 - didn't work. System: Centos 6.
How to force sphinx to normalize?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
M
maddog670, 2016-04-21
@lavreno63

If you want to transfer to a new hosting. Then you had to create a new database for the site and a user with a password to maintain the same database and make changes to the file that is responsible for the database settings

R
riot26, 2016-04-21
@riot26

Incorrect login/password/server/dbname. And yes, stop using deprecated functions .

A
alexios, 2014-07-25
@alexios

Everything worked when I installed the beta, manually added lemmatizer_base to the directory where I previously placed the ru.pak dictionary.

I
Igor, 2015-06-24
@mulat

Sphinx 2.2.9 Release In the source
block , write:

sql_query_pre = SET NAMES utf8
sql_query_pre = SET CHARACTER SET utf8

In the index block :
morphology = stem_ru, stem_en

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question