S
S
synapse_people2017-07-01 14:38:22
MySQL
synapse_people, 2017-07-01 14:38:22

How to increase search speed in sphinx?

There is a MySQL database, 200 GB in size. 4 billion text records + ID (AUTO_INCREMENT).
If you make a simple LIKE '%%' query, then it takes about 10 hours.
I installed Sphinx, indexed it, got a 600 GB index (I had to buy another HDD). Now the request goes from 5 to 10 minutes . Still long. Now the database is divided into shards, each shard contains 32 million documents (for some reason, the limit is 4 GB per 1 index).
Tell me, who is in the subject, what can be tweaked to increase the search speed?
config file:

#!/usr/bin/php
<?php
$nPerShard = 32*1000*1000;//32M
$shardCount = 120;
for ($i=0; $i<$shardCount; ++$i) { $first = $i*$nPerShard; $last = $first+$nPerShard; 
?>
source shard<?=$i?>
{
  type			= mysql

  sql_host		= localhost
  sql_user		= root
  sql_pass		= пароль
  sql_db			= база
  sql_port		= 3306

  sql_query_pre		= SET NAMES utf8

  sql_query		= SELECT ID, TEXT FROM таблица WHERE ID >= <?= $first ?> AND ID < <?= $last ?>

  sql_field_string	= TEXT
}

index idx_eng_keywords<?=$i?>
{
  source			= shard<?=$i?>

        wordforms		= /home/***/***/wordforms/en.txt
  path			= /media/***/c2dcfc1c-1656-4f82-b9fb-64a22058a278/eng-keywords-idx.<?=$i?>

#https://habrahabr.ru/post/147745/
  expand_keywords		= 1
  index_exact_words	= 1
}

<?php
}
?>


index index_main
{
  type           		= distributed

<?php for ($i=0; $i<$shardCount; ++$i) { ?>
  local			= idx_eng_keywords<?=$i?>

<?php } ?>

}


indexer
{
  mem_limit 		= 1024M
}


searchd
{
  listen			= 9312
  listen			= 9306:mysql41
  log			= /home/bogdan/poisk/sph/var/log/searchd-k.log
  query_log		= /home/bogdan/poisk/sph/var/log/query-k.log
  binlog_path		= # disable logging
  read_timeout		= 5
  max_children		= 30
  pid_file		= /home/bogdan/poisk/sph/var/log/searchd-k.pid
  seamless_rotate		= 1
  preopen_indexes		= 1
  unlink_old		= 1
  workers			= threads # for RT to work
  binlog_path		= /home/bogdan/poisk/sph/var/data
  dist_threads		= 8
}

* - word forms file weighs 1.8 Mb.
An example of a query that is produced (sphinxql):
SELECT * FROM index_main WHERE MATCH('test');
Ps I would like to receive ALL search results in 10 seconds or less, and not in 5 minutes, as it is now.
Machine information:
AMD processor 8 cores at 4 GHz, 8 GB memory, 3 hard drives (2 HDD: 320gb, 750gb; 1 SSD: 120gb),
Debian Linux 8.8.
===========
[email protected]:/home/bogdan/poisk/sph/bin# hdparm -Tt /dev/sdc

/dev/sdc:
 Timing cached reads:   4440 MB in  2.00 seconds = 2220.46 MB/sec
 Timing buffered disk reads:  14 MB in  3.12 seconds =   4.49 MB/sec
[email protected]:/home/bogdan/poisk/sph/bin# hdparm -Tt /dev/sdc

/dev/sdc:
 Timing cached reads:   5266 MB in  2.00 seconds = 2633.95 MB/sec
 Timing buffered disk reads:   8 MB in  3.45 seconds =   2.32 MB/sec
[email protected]:/home/bogdan/poisk/sph/bin#

UPD 2 Jul 2017:
Removed "sql_field_string = TEXT" from the config, the field in the selection (search result) disappeared, i.e. Only the ID gives out, but the index size has become 70 GB. Search time 1 second on HDD, 0.4 sec on SSD.
Can someone please explain why sql_field_string is needed??

Answer the question

In order to leave comments, you need to log in

4 answer(s)
F
Fortop, 2017-07-01
@synapse_people

  • We increase the memory to 256, and even more is better.
    All of these recipes can be used individually or together.
    Specific numbers with the number of machines, search agents, the amount of memory for indexes and cache need to be calculated depending on the specific solution.

T
ThunderCat, 2017-07-01
@ThunderCat

If you make a simple query LIKE '%%
Let's start with the fact that this is the most incorrect method of working with text. That is, it is used only when there is absolutely no other way.
First, what kind of texts are stored in the records? Length, field type.
I strongly suspect that fulltextserch can dramatically change the situation.
UPD: https://habrahabr.ru/post/25646/#comment_641326 which, as we can see, is several orders of magnitude faster.

P
Puma Thailand, 2017-07-01
@opium

The index file is too large, you obviously indexed too much

V
Vasily Nazarov, 2017-07-01
@vnaz

This, of course, is not like an answer, but it is not a mockery either.
1. Yandex, Google - give out a quick search in a much larger array.
But they are unlikely to use the Sphinx.
2. I did not think that I would write such a thing about ES, but it
seems that in your case it will solve the problem if it is prepared correctly. For in any case, with such volumes +/- 1-2-5, the cars do not look like excess.
If it's not difficult, can you tell me why "ES is not suitable"?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question