Answer the question
In order to leave comments, you need to log in
How can I index local html files into the database?
Good afternoon, please tell me how to solve the problem.
There is a folder on a network drive, with many subfolders, inside which there are also many HTML files, in other words, there are many site mirrors on the disk.
The task is to give users the opportunity not only to view offline site mirrors, but also to give them the opportunity to search for text on them.
Those. you need some service functionality with a search bar, where the user could enter the search word, and this service as a result gave at least just links to these html files to open
The dumbest thing I see is how to parse all html files (only zones defined at the DOM level, and a link to the file) then put this text somewhere in the database (with the same full-text search), after which the usual web form will be do a select in the database with a filter from the user and display links to the found file results (maybe even a piece of the found text with highlighting)
Or maybe there is already some kind of ready-made solution
Answer the question
In order to leave comments, you need to log in
once sphinxsearch.com/forum/view.html?id=3867
two https://github.com/Restream/reindexer
well, as you suggest - three,
drive into the database after strip tags for example, although I think false positives will still be
yes , four - five - there are also search engines with spiders, but there is a garden
Open with python and then - parse and save wherever you want. This is the simplest.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question