L
L
Leonid2018-01-11 23:28:15
linux
Leonid, 2018-01-11 23:28:15

Full-text search on the site by downloaded files of the format: Word, Excel, Visio, PDF - how?

There is a site in PHP where users upload files of the format: Word, Excel, Visio, PDF It is
necessary to organize on the site the possibility of full-text search by the contents of these uploaded files.
It turns out that you need to convert Word, Excel, Visio, PDF to text and shove it into the database?
What solutions can be applied? Naturally, you can consider using some command line utilities, but there is no hope for PHP here)

Answer the question

In order to leave comments, you need to log in

3 answer(s)
S
santaatnas, 2018-01-12
@santaatnas

You all think correctly, parse pdf, word, excel, etc. into tex, write to the database, sculpt to the Sphinx or elasticsearch database = profit. It’s really possible to do everything with the means of puffing, and there you can in any language ...

R
Roman Mirilaczvili, 2018-01-12
@2ord

Parsing documents on your own is not worth writing - there is Apache Tika (Java) - JSON output. Tika used to be part of Apache Lucene (search engine).
Text indexing and search - Elastic Search (Java), Sphinx Search (C++) as santaatnas noted earlier , plus Solr, Apache Lucene.

D
Dimonchik, 2018-01-12
@dimonchik2013

The Harvester

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question