F
F
freecode2015-04-16 11:03:55
elasticsearch
freecode, 2015-04-16 11:03:55

Elasticsearch - how to search?

I set up a search through Elasticsearch - I added data to the index, I set the search - it finds only the exact match of the word.
How to search for substrings?
How to search by morphology?
How to search for different combinations, for example, product models - "MD-501 DS" - search by "MD501", "MD 501", "MD501DS"?
The official doc gives definitions like filters suitable for me, does not indicate how to pass them in the array of connector parameters (in my case - PHP).

Answer the question

In order to leave comments, you need to log in

3 answer(s)
N
Narek, 2015-04-16
@webtop

Try like this:

$body['query']['filtered']['query']['fuzzy_like_this'] = array(
  'fields' => ['field_name'],
  'like_text' => 'some text',
  'max_query_terms' => 1000000,
  'fuzziness' => 3,
  'prefix_length' => 3
);

F
freecod, 2015-04-17
@freecod

Thank you, it works, but unfortunately it is case-sensitive and does not cover all tasks.
I guess we need to add filters and analyzers, there is an example from the docs:

$params = array(
            'index' => 'reuters',
            'body' => array(
                'settings' => array(
                    'number_of_shards' => 1,
                    'number_of_replicas' => 0,
                    'analysis' => array(
                        'filter' => array(
                            'shingle' => array(
                                'type' => 'shingle'
                            )
                        ),
                        'char_filter' => array(
                            'pre_negs' => array(
                                'type' => 'pattern_replace',
                                'pattern' => '(\\w+)\\s+((?i:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint))\\b',
                                'replacement' => '~$1 $2'
                            ),
                            'post_negs' => array(
                                'type' => 'pattern_replace',
                                'pattern' => '\\b((?i:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint))\\s+(\\w+)',
                                'replacement' => '$1 ~$2'
                            )
                        ),
                        'analyzer' => array(
                            'reuters' => array(
                                'type' => 'custom',
                                'tokenizer' => 'standard',
                                'filter' => array('lowercase', 'stop', 'kstem', 'word_delimiter', 'snowball')
                            )
                        )
                    )
                ),
                'mappings' => array(
                    '_default_' => array(
                        'properties' => array(
                            'title' => array(
                                'type' => 'string',
                                'analyzer' => 'reuters',
                                'term_vector' => 'yes',
                                'copy_to' => 'combined'
                            ),
                            'body' => array(
                                'type' => 'string',
                                'analyzer' => 'reuters',
                                'term_vector' => 'yes',
                                'copy_to' => 'combined'
                            ),
                            'combined' => array(
                                'type' => 'string',
                                'analyzer' => 'reuters',
                                'term_vector' => 'yes'
                            ),
                            'topics' => array(
                                'type' => 'string',
                                'analyzer' => 'reuters',
                            ),
                            'places' => array(
                                'type' => 'string',
                                'index' => 'not_analyzed'
                            )
                        )
                    ),
                    'my_type' => array(
                        'properties' => array(
                            'my_field' => array(
                                'type' => 'string'
                            )
                        )
                    )
                )
            )
        );

but it's not clear how to use the created filter later in the search - all examples are in json with a request to elastic's HTTP server.

M
Mirocow, 2016-04-09
@mirocow

Register-insensitive
By default, Eldsticseach includes a case-sensitive search parser. To disable it, use the following settings in the configuration file elasticsearch.yml below

settings:
                index:
                    analysis:
                        analyzer:
                            string_lowercase:
                                tokenizer: keyword
                                filter: lowercase

Set lowercase to elasticsearch and convert query string to lowercase

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question