B
B
becks2013-02-05 11:12:53
Sphinx
becks, 2013-02-05 11:12:53

Full text search engine for a desktop project (Qt), which one to choose?

It is necessary to attach a full-text search for documents in various formats (doc, html, pdf, etc.) to a C ++ (Qt) project. I have already read a bunch of articles comparing similar systems (for example, habrahabr.ru/post/30594/ ), but in the end I can’t determine the most suitable one for myself. Candidates: Sphinx , Lucene (CLucene) and Xapian .

Put Sphinx figured out the basic things pretty quickly. I launched it on a test base, everything works, but with requests in English, in Russian, there are still some problems with encodings. Pleased with the speed of indexing and searching; the ability to implement morphology through soundex, and not just the use of stemming, i.e. if I understand correctly, the search accuracy for Russian should be much higher than when using stemming. By the way, if someone used soundex, share your experience, did it go smoothly, did the accuracy increase? Is a delta index handy or not when adding documents frequently? If you use sphinx, then how to rip out text from pdf, Word / Exce and others?

Xapian is praised for having an incremental index transparently updated in parallel with the search. They write that for applications in C ++ and the requirements for the rich features of the query language, it will be the best choice. Well, I am very pleased with the ability to process files of the most popular formats TXT, HTML, PHP, PDF, PostScript, OpenOffice/StarOffice, OpenDocument, Microsoft Word/Excel/Powerpoint.

Clucene is the most unlikely candidate. It seems like there is no morphological search. I didn’t find any big advantages over the first two systems, if I’m wrong, correct me.

I would like more whistles such as smart search, search with distance, search with synonyms, and so on.

Who had experience, tell me, please, where to stop.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question