Answer the question
In order to leave comments, you need to log in
How to organize the storage of a large number of text documents with the ability to search through them?
Greetings!
There are:
- A rented server
- About 200 million documents in the form of HTML markup, with a total weight of 6+ TB
- Each document refers to one or more documents (html links in the text)
Task:
1. Organize a repository of these documents, with the ability to quickly search both by the text of the document (taking into account morphology), and filtering by other parameters (dates, categories, etc.).
2. Additionally, you need the ability to display documents that link to the selected document.
What to choose for implementation, so as not to have performance problems? Will MySql + ElasticSearch handle it or is it better to choose something else?
Thank you!
Answer the question
In order to leave comments, you need to log in
I would try to do it on ES without any links. That is, we open the document, get the content and add it to the ES index. In addition, you can parse the document into links and add them to the index, then you do not need any database
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question