G
G
greefon2021-12-24 19:44:39
PostgreSQL
greefon, 2021-12-24 19:44:39

How to find partial matches of query string in DB for PostgreSQL full text search?

Task: when searching for the type "alpha beta gamma ... omega", get results with a partial match (i.e. only one word was found, or several words from the query) and rank all this, taking into account the number of words found and their proximity. More matches - higher rank. Words closer to each other - higher rank.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
ayazer, 2021-12-25
@greefon

Those. make your own preparser, which will then send a request to a normal parser that will truncate word forms.

in my experience, any use of full-text search (be it elastic, solr, or built into postgres) ended up with just that. Sooner or later, anyway, there is a need to do something more than "take a request from the user and pass it on." Starting from the banal "highlight keywords so that they have more weight in the search results", "this user cannot see some results" or "wipe everything that looks like a password" and ending with "this is actually a search on partially structured data, therefore for part of the requests, we can generally generate a different cl"
Then it needs to be ranked. We have OR, and I don't really understand how the weights will be distributed

if you need a LOT of specific settings here, it's better to immediately look in the direction of the same solra. In Postgres, however, support for full-text search is insofar as it is. For many tasks, it is enough, but I suffered in due time. I constantly ran into the wall "but you can't do it like that." But in general, at least somehow postgres ranks + there is an opportunity to manually adjust the weights for key parts. Those. for example
select json_flat_content , ts_rank_cd(json_flat_tsv, 'jzvmw | julva | qxqvh | name | value') r
from my_fulltext_index i
where
  i.json_flat_tsv @@ to_tsquery('simple', 'jzvmw | julva | qxqvh | name | value')
order by r desc

will return
[{"name": "qtmlx", "value": "jzvmw  vajwq julva  ipsmwtbhki  lhgzr"}, {"name": "fslto", "value": "viykw"}]	0.6
[{"name": "lhnhq", "value": "sxgxh!!daxrh guxux!!kfgtirmgig!!ivqwz"}, {"name": "qxqvh", "value": "qbeli"}]	0.5
[{"name": "cepja", "value": "mrfma"}, {"name": "gwjqa", "value": "csxaf"}]	0.4
[{"name": "val", "value": "TNhmT<KxERm"}]	0.2

well, yes, for the query "cartoon | Norstein | Hedgehog <-> fog" you may need to manually specify a smaller weight for "cartoon" and a larger one for "Norstein". And at the same time think about what to do if in the search because of the misspelling there is "orNstein"
+ it is worth paying attention that postgres (at least versions 9.5 and 10.? with which I worked) did not work well with ngrams. or rather - for this it was necessary to put additional. plugins and then put it all together. those. by "slo: *" you can find a "word", but by "ovo" you can't find a "word" anymore.
+ perhaps it will be necessary to work with bugs
+ I don't remember if postgres can work with synonyms. It might be important too
in general, both for quick and angry prototyping - full-text search in postgres is convenient to use. But if you need a serious full-text search, it's better to look at the tools that are sharpened for this.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question