Answer the question
In order to leave comments, you need to log in
How to increase search speed in sphinx?
There is a MySQL database, 200 GB in size. 4 billion text records + ID (AUTO_INCREMENT).
If you make a simple LIKE '%%' query, then it takes about 10 hours.
I installed Sphinx, indexed it, got a 600 GB index (I had to buy another HDD). Now the request goes from 5 to 10 minutes . Still long. Now the database is divided into shards, each shard contains 32 million documents (for some reason, the limit is 4 GB per 1 index).
Tell me, who is in the subject, what can be tweaked to increase the search speed?
config file:
#!/usr/bin/php
<?php
$nPerShard = 32*1000*1000;//32M
$shardCount = 120;
for ($i=0; $i<$shardCount; ++$i) { $first = $i*$nPerShard; $last = $first+$nPerShard;
?>
source shard<?=$i?>
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass = пароль
sql_db = база
sql_port = 3306
sql_query_pre = SET NAMES utf8
sql_query = SELECT ID, TEXT FROM таблица WHERE ID >= <?= $first ?> AND ID < <?= $last ?>
sql_field_string = TEXT
}
index idx_eng_keywords<?=$i?>
{
source = shard<?=$i?>
wordforms = /home/***/***/wordforms/en.txt
path = /media/***/c2dcfc1c-1656-4f82-b9fb-64a22058a278/eng-keywords-idx.<?=$i?>
#https://habrahabr.ru/post/147745/
expand_keywords = 1
index_exact_words = 1
}
<?php
}
?>
index index_main
{
type = distributed
<?php for ($i=0; $i<$shardCount; ++$i) { ?>
local = idx_eng_keywords<?=$i?>
<?php } ?>
}
indexer
{
mem_limit = 1024M
}
searchd
{
listen = 9312
listen = 9306:mysql41
log = /home/bogdan/poisk/sph/var/log/searchd-k.log
query_log = /home/bogdan/poisk/sph/var/log/query-k.log
binlog_path = # disable logging
read_timeout = 5
max_children = 30
pid_file = /home/bogdan/poisk/sph/var/log/searchd-k.pid
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
workers = threads # for RT to work
binlog_path = /home/bogdan/poisk/sph/var/data
dist_threads = 8
}
SELECT * FROM index_main WHERE MATCH('test');
[email protected]:/home/bogdan/poisk/sph/bin# hdparm -Tt /dev/sdc
/dev/sdc:
Timing cached reads: 4440 MB in 2.00 seconds = 2220.46 MB/sec
Timing buffered disk reads: 14 MB in 3.12 seconds = 4.49 MB/sec
[email protected]:/home/bogdan/poisk/sph/bin# hdparm -Tt /dev/sdc
/dev/sdc:
Timing cached reads: 5266 MB in 2.00 seconds = 2633.95 MB/sec
Timing buffered disk reads: 8 MB in 3.45 seconds = 2.32 MB/sec
[email protected]:/home/bogdan/poisk/sph/bin#
Answer the question
In order to leave comments, you need to log in
If you make a simple query LIKE '%%Let's start with the fact that this is the most incorrect method of working with text. That is, it is used only when there is absolutely no other way.
This, of course, is not like an answer, but it is not a mockery either.
1. Yandex, Google - give out a quick search in a much larger array.
But they are unlikely to use the Sphinx.
2. I did not think that I would write such a thing about ES, but it
seems that in your case it will solve the problem if it is prepared correctly. For in any case, with such volumes +/- 1-2-5, the cars do not look like excess.
If it's not difficult, can you tell me why "ES is not suitable"?
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question