L
L
LakeForest2021-08-20 10:21:36
Speech recognition
LakeForest, 2021-08-20 10:21:36

VOSK ASR for Russian. How to set up your dictionary? And how to fix warnings when generating G.fst?

Everywhere complains about a solid sign. Where did he come from? It is not in words.
How to be?

LOG (arpa2fst[5.5.958~1-57f8d]:Read():arpa-file-parser.cc:149) Reading \1-grams: section.
WARNING (arpa2fst[5.5.958~1-57f8d]:Read():arpa-file-parser.cc:219) line 610817 [-4.645712 b-0.3890305] skipped: word 'b' not in symbol table
....
WARNING (arpa2fst[5.5.958~1-57f8d]:Read():arpa-file-parser.cc:219) line 9460316 [-3.161267 yesterday b] skipped: word 'b' not in symbol table
LOG (arpa2fst[5.5 .958~1-57f8d]:Read():arpa-file-parser.cc:149) Reading \3-grams: section.

LOG (arpa2fst[5.5.958~1-57f8d]:Read():arpa-file-parser.cc:149) Reading \4-grams: section.
WARNING (arpa2fst[5.5.958~1-57f8d]:Read():arpa-file-parser.cc:259) Of 2868 parse warnings, 30 were reported. Run program with --max-arpa-warnings=-1 to see all warnings
LOG (arpa2fst[5.5.958~1-57f8d]:RemoveRedundantStates():arpa-lm-compiler.cc:359) Reduced num-states from 105503353 to 12126947

Added my lexicon.txt. A very large list is obtained ...
But for some reason (and the same with a small one) after creating the final model: utils/mkgraph.sh --self-loop-scale 1.0 data/lang/ am/ graph/ - the quality of speech recognition is nowhere lower ...
What is the correct way to add name recognition to the wax model?

(followed this instruction, missing the paragraph REPLACEMENT OF LANGUAGE MODEL FOR GRAMMAR) https://habr.com/ru/company/cft/blog/558824/

Answer the question

In order to leave comments, you need to log in

1 answer(s)
N
nshmyrev, 2021-08-26
@nshmyrev

We have recently updated the documentation and rebuild package:
https://alphacephei.com/vosk/lm
https://alphacephei.com/vosk/models/vosk-model-ru
-... don't follow.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question