Answer the question
In order to leave comments, you need to log in
Solr vs Elasticsearch for Russian Wikipedia search?
Good afternoon.
I am a master's student and my scientific work is related to the topic of information retrieval. It so happened that in order to fulfill my tasks, I need a search engine in which the main algorithms are already implemented, there is support for Russian morphology, etc. The search itself will be carried out by dumping Russian Wikipedia articles. Solr was suggested by my manager as a search engine. However, I looked at the documentation on it, picked it up a bit and it seemed not very convenient for work. And then I came across Elasticsearch. Due to the fact that I have never worked with ready-made search engines, questions arose. I would be grateful if you could help me resolve them:
1) Which one is better to choose for a full-fledged search in Russian texts?
2) You need to download and index the Wikipedia dump. For Solr, due to its prevalence, there are various options for performing this task, but they all seem to be not very convenient. For Elasticsearch, I found fewer solutions. However, I came across a variant of CirrusSearch ( https://dumps.wikimedia.org/other/) , which seems to be a dump of article indexes in a format suitable for loading into elastic. Has anyone gone through this way of loading Wikipedia?
3) At the moment, I write everything in Python, so the question arises of working with the engine in this language. Which of them is more convenient / easier to interact with in this language?
4) Extensibility of functionality. For example, adding your own ranking functions, etc.
I am sure that the questions will seem trivial to those who know, but still I would like to get an answer to them. Thanks in advance.
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question