A
A
Andrey Strelkov2020-11-22 20:03:37
MySQL
Andrey Strelkov, 2020-11-22 20:03:37

How can I index local html files into the database?

Good afternoon, please tell me how to solve the problem.
There is a folder on a network drive, with many subfolders, inside which there are also many HTML files, in other words, there are many site mirrors on the disk.
The task is to give users the opportunity not only to view offline site mirrors, but also to give them the opportunity to search for text on them.
Those. you need some service functionality with a search bar, where the user could enter the search word, and this service as a result gave at least just links to these html files to open

The dumbest thing I see is how to parse all html files (only zones defined at the DOM level, and a link to the file) then put this text somewhere in the database (with the same full-text search), after which the usual web form will be do a select in the database with a filter from the user and display links to the found file results (maybe even a piece of the found text with highlighting)

Or maybe there is already some kind of ready-made solution

Answer the question

In order to leave comments, you need to log in

3 answer(s)
D
Dimonchik, 2020-11-22
@strelkov_av

once sphinxsearch.com/forum/view.html?id=3867
two https://github.com/Restream/reindexer
well, as you suggest - three,
drive into the database after strip tags for example, although I think false positives will still be
yes , four - five - there are also search engines with spiders, but there is a garden

X
xmoonlight, 2020-11-22
@xmoonlight

Open with python and then - parse and save wherever you want. This is the simplest.

R
Roman Mirilaczvili, 2020-11-24
@2ord

Solr, Sphinx search, Apache Tika,...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question