Answer the question
In order to leave comments, you need to log in
How to properly connect to sphinx in python3 and not have encoding issues?
Hello!
Faced the problem of displaying data from sphinx. The problem is with the display of the Cyrillic alphabet in the console (namely, the data that comes from the sphinx). Oddly enough, it was not possible to solve the problem right away, googling also did not give results. I assume that somewhere there is a setting due to which the data comes in the "wrong" format.
So, what we have:
1. If we get the data (by which we build the index) directly from mysql and output everything to the console
2. If we connect to the sphinx in the console, the data is also displayed normally
3. If we connect via python, we get data from the sphinx and we output to the console, we get krakozyabry.
The script with the simplest query to the sphinx that displays "crazy":
# -*- coding: utf-8 -*-
import MySQLdb, MySQLdb.cursors
sphinx_db = MySQLdb.connect(host='127.0.0.1',port=9306,user='',passwd='',db='', charset='utf8', use_unicode = True, init_command='SET NAMES UTF8')
sphinx_cursor = sphinx_db.cursor(cursorclass=MySQLdb.cursors.DictCursor)
mysql_db = MySQLdb.connect(host='127.0.0.1' ,user='*****', passwd='*****', db='*****', charset='utf8')
mysql_cursor = mysql_db.cursor(cursorclass=MySQLdb.cursors.DictCursor)
def main():
# Выводит кракозябры
sql = """SELECT * FROM documents LIMIT 1"""
sphinx_cursor.execute(sql)
sphinx_db.commit()
data = sphinx_cursor.fetchone()
print(data['text'])
# Напрямую из mysql выводит нормально
sql = """SELECT * FROM documents LIMIT 1"""
mysql_cursor.execute(sql)
mysql_db.commit()
data = mysql_cursor.fetchone()
print(data['text'])
if __name__ == '__main__':
main()
common
{
lemmatizer_base = /home/www/sphinx_data/dict
}
source src_documents
{
type = mysql
sql_host = localhost
sql_user = *****
sql_pass = *****
sql_db = *****
sql_port = 3306
sql_query = SELECT id, text FROM documents
sql_query_pre = SET NAMES utf8
sql_query_pre = SET CHARACTER SET utf8
sql_query_pre = SET CHARACTER_SET_RESULTS=utf8
sql_field_string = text
}
index documents
{
source = src_documents
path = /home/www/sphinx_data/p1
docinfo = extern
charset_type = utf-8
morphology = stem_en, stem_ru, soundex
min_word_len = 3
enable_star = 1
min_infix_len = 3
wordforms = /home/www/sphinx_data/dict/words.txt
charset_table = 0..9, A..Z->a..z, _, a..z, \
U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+0435, U+451->U+0435, U+02D
}
searchd
{
listen = 127.0.0.1:9306:mysql41
log = /var/log/sphinxsearch/searchd.log
query_log = /var/log/sphinxsearch/query.log
read_timeout = 5
max_children = 30
pid_file = /home/www/sphinx_data/searchd.pid
max_matches = 1000
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
}
Answer the question
In order to leave comments, you need to log in
It turned out that the whole point is in the features of the connector to the database and when connecting, you need to specify use_unicode = False, then everything works fine if you convert the data via .decode ('utf8') when displaying
Install the latest version, in theory, you can remove everything about utf8 from the config
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question