How to organize a search by product names, by part of the name?

V

verberden2021-02-19 07:11:32

PostgreSQL

verberden, 2021-02-19 07:11:32

Hi all!
There is a database on Postgres, backing on Nodejs. Before me, they tried to fasten the search on Solr, but it turned out to be complicated to set up and the relevance was not very good. I screwed elastic and it seems to be better, but still unsatisfactory, namely:

search by part of the word does not work. that is, by entering "Greek" - now it does not find anything, since a full-text search is organized, but I would like the results to be given taking into account the following points.
a search is needed along with synonyms, that is, by entering "buckwheat", it would also search for "buckwheat", "buckwheat", etc., taking into account word forms. There are, for example, "bread with buckwheat."
ranking problem: I wanted that if the searched word (or word form) is closer to the beginning of the line (product name), then such a result is higher in the search. Now it comes out somehow mixed up: for example, a search for the word "buckwheat"

I set up the index as follows (synonyms are a test option) and I do the query as follows:

esClient.indices.create({
    index: 'products',
    body: {
        "settings": {
            "analysis": {
              "filter": {
                "ru_stop": {
                  "type": "stop",
                  "stopwords": "_russian_"
                },
                "ru_stemmer": {
                  "type": "hunspell",
                  "locale": "ru_RU"
                },
                "synonym": {
                  "type": "synonym",
                  "lenient": true,
                  "synonyms": [ "гречка, гречневая", "греча => гречка"]
                }
              },
              "analyzer": {
                "default": {
                  "tokenizer": "standard",
                  "filter": [
                    "lowercase",
                    "ru_stop",
                    "synonym",
                    "ru_stemmer",
                  ]
                }
              }
            }
        }
    }
})

//QUERY
esClient.search({
    index: "products",
    body: {
        size: 100,
        query: {
            match: {"name": searchText.trim()}
        }
    }
})

Can you suggest how to improve the search? Maybe a different technology?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

ayazer, 2021-02-19
@ayazer

in principle, there are several options here: solr (which uses lucene), elastic (which uses lucene), lucene (and add all the functionality that elastic \ solr adds yourself) and internal full-text search in postgres (when full-text search is needed, but not so much so that for the sake of lift it with solr/elastic).
any of these options can do all the things you need (i.e. fuzzy search, search by synonyms, ranking results and setting weights)

search by part of the word does not work. that is, by entering "Greek" - now it does not find anything, since a full-text search is organized, but I would like the results to be given taking into account the following points.

double check how you make the request. for solr I would expect to see a search for "Greek*", for postgres - a search for "Greek:*"

a search is needed along with synonyms, that is, by entering "buckwheat", it would also search for "buckwheat", "buckwheat", etc., taking into account word forms. There are, for example, "bread with buckwheat."

dictionaries with synonyms are configured everywhere

ranking problem: I wanted that if the searched word (or word form) is closer to the beginning of the line (product name), then such a result is higher in the search. Now it comes out somehow mixed up: for example, a search for the word "buckwheat"

well this is how reverse index works. he knows nothing about the position of the word, he only knows about the fact that the word is included in the result. in postgres, for example, I don't see a universal solution to how you can get this behavior. Re-read the documentation about ranking / weights, I won’t be surprised if something was added in solr / elastic
UPD: well, yes, the comments suggest that this is being solved and google the documentation for "TF-IDF"