H
H
HeadWithoutBrains2010-10-11 15:08:44
Text analysis
HeadWithoutBrains, 2010-10-11 15:08:44

Text analysis and parsing

Good afternoon. Help set the right direction for research.
There is a task of analyzing sentences, that is, translating them into commands understandable to the program. For example, “On the first Wednesday of January, go to the store” - to sort out what needs to be done and when. I will be glad to absolutely any links to any materials.
In particular, we are interested in algorithms that can achieve this. I understand that this is a very difficult task, but the text and sentences will be simple and with a certain structure of the same type, so I'm interested in the direction where to dig.
Thank you.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
MikhailEdoshin, 2010-10-11
@MikhailEdoshin

Apple had such an assistant in Newton, a small system service to which you write, for example, "fax Bob", and he figured out that you need to take the current document, find Bob in the address book and fax this document to him. The principle there was simple, details can be found in Newton Programmer's Reference v2.0, ch. 18, Intelligent Assistant. (PDF can be found on Google.)
If you write yourself, I advise you to take a closer look at this CYK parsing method - this is a universal bottom-up parsing method that starts by extracting tokens in a string and then folding them according to the rules of grammar. You won’t need the whole CYK, most likely, because your main problem is precisely the lack of grammar, but the basic principle can be used like this:
- Parse the string into words
- Classify each word. For example, let "Mon" be a serial number, "DN" the day of the week, "M" the month, "?" - an indefinite word. Your phrase will be "?-Mon-DN-M-?-?-?".
- Look for patterns in the string (in fact, this is exactly the phase of convolution and it turns out). In this case, the “MON-DN-M” pattern, you will have it registered for the date parser. On another line, you will have, for example, "Wednesday at the first movie" - "?-DN-?-MON-?". You will not have the “DN-?-PN” pattern in dates (it is unlikely that such a combination can indicate a date), so only “DN” will go to the date parser, and “PN” will either ignore it or give it, for example, to the TV channel parser.
This approach is convenient because the grammar is not needed, and you can determine the appropriate patterns as you process the data. I once wrote such a parser for addresses - I parsed it well, correctly distinguishing, for example, different "St" in "St Patrick St". Although not 100% accurate, there were some ambiguous patterns.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question