How to organize a search in the library of books?

E

Egor Chernyshov2016-09-19 18:23:38

PHP

Egor Chernyshov, 2016-09-19 18:23:38

What is the essence of the problem: if the text of books is stored in its entirety, then searching for a fragment means selecting the entire book for further cutting out the fragment. It turns out that a phrase consisting of high-frequency-used words can come across in all books and more than once. Accordingly, all of them will need to be selected for post-processing. Isn't it very energy intensive?
And if not, then how? I'm not a professional, maybe I don't know something, I don't understand. Please tell those who know. How to properly organize the search for text fragments in a library of books?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

E

Eugene 222, 2016-09-19
@mik222

The answer from Uwe_Boll is wrong (or even directly, harmful)
The fact is that a non-identity search cannot use a standard index.
------------
What I mean is that queries like %some word% can't be sorted, and therefore can't be binary searched like normal indexes do, making your index, in fact, meaningless O(log N) algorithm in O(N)
-------------
You actually have only one option:
Reverse index weighted by https://en.wikipedia.org/wiki/ Tf%E2%80%93idf
For this you can use:
https://www.postgresql.org/docs/8.3/static/textsea...
https://www.sqlite.org/fts3.html
dev.mysql.com/doc/refman/5.7/en/fulltext-search.html
https://www.elastic.co/

U

Uwe_Boll, 2016-09-19
@Uwe_Boll

put indexes on the title of the book and index on the ISBN