How does speech recognition work in iOS apps?

D

Denis Marian2015-04-12 03:21:56

iOS

Denis Marian, 2015-04-12 03:21:56

Hello, I am a beginner designer. There was an idea to make a concept of a financial application where expenses and incomes can be entered by voice. For example, say: “300 rubles to the phone” and the application will understand and write down where necessary.
I am a beginner designer and have no experience in programming, I turn to you to test my idea for strength. Even if it's just a concept, it should not be just a set of pictures, but a thoughtful application that can be brought to life.
How I imagine it:
For example, a person says: “300 rubles for food” - the application understands that 300 is 300. Rubles is a currency. “Food” is a tag, what we spend these same 300 rubles on. At the same time, the application should not record this consumption in the "food" tag, but in the "food" tag. At the same time, the application itself must determine that this is an expense (there will be pre-installed tags in applications for this).
You can determine the time: “Yesterday I spent 300 rubles on food.”
Repeatable transactions: "45,000 rubles salary every month on the 25th."
Scheduled transactions: “4000 rubles for light remind tomorrow at 10:00”
Transfer from account to account: “Withdrawn 5000 rubles from the card” - transfers from the “card” account to the “cash” account.
It would also be nice to make the application control through voice. For example: “delete card account”, or “delete last transaction”, or “open settings. Is this even possible?
Is it possible to implement this in an application? How difficult is it, etc. Have you made similar applications? Would you use such an application?
Thank you.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

P

Peter Bishop, 2015-04-12
@Peter_Bishop

There is a technology from the Yandex company, it allows and encourages it to be built into applications for android and ios - SpeechKit Cloud API.

A

azShoo, 2015-04-13
@azShoo

As already mentioned above, there are many speech recognition tools, you need to use them, and not write your own.
Next . Suppose you recognized the voice and received, roughly speaking, a String variable with a phrase spoken into the microphone as an output.
This is where your difficulties begin.
It plays in your favor that you have N tags (pre-built into the application) to which costs are distributed. For example: Food, Mobile communication, Education, Credits, etc. Their number is limited, not infinite, and known in advance.
Your next goal is to create "dictionaries" for categorizing these String variables.
Of the difficulties - a different word order, synonyms and vernacular, excessive and / or insufficient description + defects in speech recognition.
Solve the following problem for yourself:
You have 15 textual descriptions of expenses in any form (starting from "today I spent fifty thousand rubles on a jar of delicious black caviar" and ending with "a quintet for a mobile phone").
You need to match them by expense category.
How? Dictionaries, keywords, maximum number of matches. Something like this.
When you're done with this, screw on the speech recognizer and there will be no more problems.

A

Alexander Shcherbakov, 2015-04-16
@mkll

There is another point that you "jumped" over, immediately drawing attention to the technical issues of speech recognition. This moment is usability. The idea, as I understand it, is to speed up the input of information - speaking with your voice is easier than typing on the keyboard, right?
And now let's see the full scenario of using the application:
1. Get the phone
2. Unlock it
3. Launch the application
4. Turn on the input mode in it (voice or text input - it doesn't matter).
And only after that, in fact, " simplification " begins. Ask yourself - what percentage of the total user experience is this simplification? Is the game worth the candle? If the user has already completedso many actions with "hands", what prevents him from completing what he started with the same hands? :)
Indeed, unlike Siri, for example, which is part of the operating system and is activated directly from the lock screen in a couple of clicks, your application will require the user to perform the above actions.

X

xmoonlight, 2015-04-12
@xmoonlight

Voice assistant:
Dusya (Android)
Cortana (Win10, WinPhone, iOS and Android planned)