pocketsphinx_batch man page

pocketsphinx_batch — Run speech recognition in batch mode


pocketsphinx_batch -hmm hmmdir -dict dictfile [ options ]...


Run speech recognition over a list of utterances in batchmode. A list of arguments follows:

name for audio input (platform-specific)
Size of audio file header in bytes (headers are ignored)
Input is raw audio data
Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
Initial threshold for automatic gain control
Do phoneme recognition
Preemphasis parameter
Print back trace of recognition results
Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
Run bestpath (Dijkstra) search over word lattice (3rd pass)
Language model probability weight for bestpath search
Cache senone scores from first pass search
Input is cepstral files, output is log spectral files
files directory (prefixed to filespecs in control file)
Input files extension (prefixed to filespecs in control file)
Number of components in the input feature vector
Cepstral mean normalization scheme ('current', 'prior', or 'none')
Initial values (comma-separated) for cepstral mean when 'prior' is used
Compute all senone scores in every frame (can be faster when there are many senones)
file listing utterances to be processed
No. of utterances to be processed (after skipping -ctloffset entries)
Do every Nth line in the control file
No. of utterances at the beginning of -ctl file to be skipped
pronunciation dictionary (lexicon) input file
Add 1/2-bit noise
Use double bandwidth filters (same center freq)
Frame GMM computation downsampling ratio
FB Type of mel_scale or log_linear
word pronunciation dictionary input file
Feature stream type, depends on the acoustic model
Filler word transition penalty
Frame rate
state grammar
Force backtrace from FSG final state
finite state grammar control file
Use alternative pronunciations for FSG
(FSG Mode (Mode 2) only) Insert filler words at each state.
Use trigrams in first pass search
Run forward flat-lexicon search over word lattice (2nd pass)
Beam width applied to every frame in second-pass flat search
Minimum number of end frames for a word to be searched in fwdflat search
Language model probability weight for flat lexicon (2nd pass) decoding
Window of frames in lattice to search for successor words in fwdflat search
Beam width applied to word exits in second-pass flat search
Run forward lexicon-tree search (1st pass)
containing acoustic model files.
output file name
output with segmentation file name
Endianness of input data, big or little, ignored if NIST or MS Wav
Maximum number of Gaussians per leaf node in kd-Trees
Maximum depth of kd-Trees to use
file for Gaussian selection
Lattice size
Length of sin-curve for liftering, or 0 for no liftering.
Get input from audio hardware
trigram language model input file
a set of language model

The -hmm and -dict arguments are always required. Either -lm or -fsg is required, depending on whether you are using a statistical language model or a finite-state grammar. To do batchmode recognition, you will need to specify a control file, using -ctl This is a simple text file containing one entry per line. Each entry is the name of an input file relative to the -cepdir directory, and without the filename extension (which is given in the -cepext argument).

If you are using acoustic feature files as input (see sphinx_fe(1) for information on how to generate these), you can also specify a subpart of a file, using the following format:



Written by numerous people at CMU from 1994 onwards. This manual page by David Huggins-Daines <dhuggins@cs.cmu.edu>

See Also

pocketsphinx_continuous(1), sphinx_fe(1).

Referenced By


Explore man page connections for pocketsphinx_batch(1).