A
A
Andrey Kobyshev2018-03-10 18:16:27
Machine learning
Andrey Kobyshev, 2018-03-10 18:16:27

How to break text into sentences?

There is a continuous stream of text received at the output of a speech recognition system.
It is necessary to automatically convert this continuous stream of text into a more or less readable form - with correct punctuation and division into sentences, paragraphs. For the sake of simplicity, for now, let's assume that this is only needed for Russian or English.
What algorithms, approaches, libraries, developments, literature exist, where such a problem or parts of it have already been solved?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
D
Dmitry, 2018-03-10
@demon416nds

in this form, the problem is most likely solved by neural networks after a long training,
but IMHO roughly mark if there is a comparison of sound and text, you can pause

R
Roman Mirilaczvili, 2018-03-10
@2ord

I think the program should be able to:
And one can only dream of paragraphs. Here's why: How to break text into paragraphs?

#
#, 2018-03-11
@mindtester

IT progress in this area is growing, and quite fast, especially lately,
but the bar you requested is still quite high.. at least for the home user
on the other hand - as far as I remember - all good recognition systems (very very large vendors ), in general, they cope with the task tolerably well ...
unless, of course, you slander monotonously large volumes of texts .. you are not disingenuous in any place?
try api from MS here a person shares his experience

X
xmoonlight, 2018-03-12
@xmoonlight

In 2 stages:
1. Based on the audio stream
2. Based on the meaning of the text and grammar
1. Comma - pitch jumps (from bottom to top or vice versa) without changing the volume or a short pause.
2. Dots or dashes - long pause.
3. Interrogative or exclamatory intonation - a sharp increase in volume with a further pause. Recognition of intonation - only trained NS (here without NS - it will not work anymore).
1. Identification of the necessary parts of speech (and their chains) and the formation of sections of the sentence: compound/subordinate, participial/participle turnover, etc.
2. Harmonization with the rules of the language and correction of punctuation errors.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question