A
A
aLap2022-04-13 18:02:50
Speech recognition
aLap, 2022-04-13 18:02:50

How to solve the problem with the vosk-model-en-0.22 dictionary extension?

Greetings!
There was a need to expand the dictionary of the model vosk-model-ru-0.22.
I downloaded vosk-model-ru-0.22-compile, compiled kaldi, installed all dependencies, then followed the instructions. Filled out the db/extra.txt file in the required format. When I run compile_graph.sh I get the following error:

LOG (arpa2fst[5.5.1012~2-dd107]:Read():arpa-file-parser.cc:149) Reading \1-grams: section.
WARNING (arpa2fst[5.5.1012~2-dd107]:Read():arpa-file-parser.cc:219) line 82 [-5.653475  абаимова] skipped: word 'абаимова' not in symbol table
WARNING (arpa2fst[5.5.1012~2-dd107]:Read():arpa-file-parser.cc:219) line 84 [-5.653475  абайдуллина] skipped: word 'абайдуллина' not in symbol table
WARNING (arpa2fst[5.5.1012~2-dd107]:Read():arpa-file-parser.cc:219) line 100 [-5.653475 абакировна] skipped: word 'абакировна' not in symbol table
WARNING (arpa2fst[5.5.1012~2-dd107]:Read():arpa-file-parser.cc:219) line 107 [-5.653475 абакшина] skipped: word 'абакшина' not in symbol table
WARNING (arpa2fst[5.5.1012~2-dd107]:Read():arpa-file-parser.cc:219) line 114 [-5.653475 абалмазова] skipped: word 'абалмазова' not in symbol table
WARNING (arpa2fst[5.5.1012~2-dd107]:Read():arpa-file-parser.cc:219) line 115 [-5.653475 абалымов] skipped: word 'абалымов' not in symbol table
......
WARNING (arpa2fst[5.5.1012~2-dd107]:Read():arpa-file-parser.cc:259) Of 15464 parse warnings, 30 were reported. Run program with --max-arpa-warnings=-1 to see all warnings


Then this one:
utils/map_arpa_lm.pl: Processing "\1-grams:\"
utils/map_arpa_lm.pl: Warning: OOV line -5.653475       абаимова        -0.004129345
utils/map_arpa_lm.pl: Warning: OOV line -5.653475       абайдуллина     -0.004129345
utils/map_arpa_lm.pl: Warning: OOV line -5.653475       абакировна      -0.004129345
utils/map_arpa_lm.pl: Warning: OOV line -5.653475       абакшина        -0.004129345
utils/map_arpa_lm.pl: Warning: OOV line -5.653475       абалмазова      -0.004129345
utils/map_arpa_lm.pl: Warning: OOV line -5.653475       абалымов        -0.004129345
......


Accordingly, at the output I have a basic dictionary without words from extra.txt

I am new to this topic, please share your experience, what am I doing wrong? Problem with lexicon? If so, where should I put the generated lexicon.txt?
Thank you!

UPD.
I found that the dict.py script was not working correctly, the words from db/extra.txt did not get into the lexicon.txt file, only from db/ru.dic. Probably incorrectly fulfills phonetisaurus.predict. I understand further...

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
aLap, 2022-04-21
@aLap

Understood. In general, the problem is pnonetisaurus running on CentOS. I saw a comment on the github that it was tested on Debian, ran the script on Ubuntu (observing the versions for the purity of the experiment) and everything worked, new words were added to the model.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question