How (with the help of what) is it better to organize text analysis?

O

Oleg Kulakov2017-06-25 09:10:13

data mining

Oleg Kulakov, 2017-06-25 09:10:13

Conditionally: There is a system that generates tasks for the helpdesk automatically. Techniques, after correction, describe the causes and actions for the task in an arbitrary form. The bottom line is that it is necessary to group the descriptions of technicians by the causes of the failure, i.e. take their comments, look for the cause of the failure in them and group them in such a way that at the next. the occurrence of a failure would be available info for the most popular reasons. The question is which way to dig to implement this functionality (neural networks? decision trees? hash tables? other buzzwords? just parse according to the "white list of keywords"?). Are there any solutions to implement this (like Java Hadoop for neural networks or Python Numpy for mathematical analysis). The tools are not fundamental, but so far the Web implementation is either in C / C ++ or Java on the backend and (not fundamentally what) Angularjs in the front.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

D

dummyman, 2017-06-25
@dummyman

need to dig towards sphinx .
Neural networks most likely will not help or will help with much higher efforts and resources.
Sphinx - a kind of Yandex / Google. First, it indexes the material, then by the search phrase it will give results sorted by relevance.

R

Roman Mirilaczvili, 2017-06-25
@2ord

Extracting objects and facts from texts in Yandex. ( video lecture )
What is Tomita-parser, how Yandex can use it ...

D

Dimonchik, 2017-06-25
@dimonchik2013

the basis for returning answers is the Sphinx, well, or Elastic
, but with a request to it - you need to get confused with the SYSTEM
, the system should, in addition to the main question, return its synonyms (in order to query the Sphinx in parallel, the main search problem is not to return the relevant answer, but to understand what the user wants to ask)
here - from Tomita and NLTK to clustering and manual tables of synonyms