Answer the question
In order to leave comments, you need to log in
How to organize a search in the library of books?
What is the essence of the problem: if the text of books is stored in its entirety, then searching for a fragment means selecting the entire book for further cutting out the fragment. It turns out that a phrase consisting of high-frequency-used words can come across in all books and more than once. Accordingly, all of them will need to be selected for post-processing. Isn't it very energy intensive?
And if not, then how? I'm not a professional, maybe I don't know something, I don't understand. Please tell those who know. How to properly organize the search for text fragments in a library of books?
Answer the question
In order to leave comments, you need to log in
The answer from Uwe_Boll is wrong (or even directly, harmful)
The fact is that a non-identity search cannot use a standard index.
------------
What I mean is that queries like %some word% can't be sorted, and therefore can't be binary searched like normal indexes do, making your index, in fact, meaningless O(log N) algorithm in O(N)
-------------
You actually have only one option:
Reverse index weighted by https://en.wikipedia.org/wiki/ Tf%E2%80%93idf
For this you can use:
https://www.postgresql.org/docs/8.3/static/textsea...
https://www.sqlite.org/fts3.html
dev.mysql.com/doc/refman/5.7/en/fulltext-search.html
https://www.elastic.co/
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question