Answer the question
In order to leave comments, you need to log in
Extracting information from a large number of documents. How?
Firework! there is a task: there are several thousand text documents of the same type, in which there are common logical blocks (not to be confused with the document scheme). It is necessary to extract knowledge from these documents and bring them to figures. simple functions like regular expressions are not suitable. Something more advanced is needed. I have never come across these areas, I can’t understand what algorithms and tools can be used to solve such a problem. I realized that this is textmining and then it’s not clear where to look
Answer the question
In order to leave comments, you need to log in
It is not entirely clear to which numbers you want / should convert the extracted information.
In general, the task is similar to the problems that NER solves https://en.wikipedia.org/wiki/Named-entity_recognition
Known tools:
https://en.wikipedia.org/wiki/OpenNLP
nlp.stanford.edu/software/CRF- NER.shtml
https://en.wikipedia.org/wiki/General_Architecture...
https://ru.wikipedia.org/wiki/UIMA
I can imagine that UIMA is more than enough for you.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question