pocketsphinx_batch man page

pocketsphinx_batch — Run speech recognition in batch mode

Synopsis

pocketsphinx_batch -ctl ctlfile -cepdir cepdir -cepext .mfc [ options ]...

Description

Run speech recognition over a list of utterances in batchmode.  A list of arguments follows:

-adchdr

Size of audio file header in bytes (headers are ignored)

-adcin

Input is raw audio data

-agc

Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')

-agcthresh

Initial threshold for automatic gain control

-allphone

phoneme decoding with phonetic lm

-allphone_ci

Perform phoneme decoding with phonetic lm and context-independent units only

-alpha

Preemphasis parameter

-argfile

file giving extra arguments.

-ascale

Inverse of acoustic model scale for confidence score calculation

-aw

Inverse weight applied to acoustic scores.

-backtrace

Print results and backtraces to log file.

-beam

Beam width applied to every frame in Viterbi search (smaller values mean wider beam)

-bestpath

Run bestpath (Dijkstra) search over word lattice (3rd pass)

-bestpathlw

Language model probability weight for bestpath search

-build_outdirs

Create missing subdirectories in output directory

-cepdir

files directory (prefixed to filespecs in control file)

-cepext

Input files extension (suffixed to filespecs in control file)

-ceplen

Number of components in the input feature vector

-cmn

Cepstral mean normalization scheme ('current', 'prior', or 'none')

-cmninit

Initial values (comma-separated) for cepstral mean when 'prior' is used

-compallsen

Compute all senone scores in every frame (can be faster when there are many senones)

-ctl

file listing utterances to be processed

-ctlcount

No. of utterances to be processed (after skipping -ctloffset entries)

-ctlincr

Do every Nth line in the control file

-ctloffset

No. of utterances at the beginning of -ctl file to be skipped

-ctm

output in CTM file format (may require post-sorting)

-debug

level for debugging messages

-dict

pronunciation dictionary (lexicon) input file

-dictcase

Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)

-dither

Add 1/2-bit noise

-doublebw

Use double bandwidth filters (same center freq)

-ds

Frame GMM computation downsampling ratio

-fdict

word pronunciation dictionary input file

-feat

Feature stream type, depends on the acoustic model

-featparams

containing feature extraction parameters.

-fillprob

Filler word transition probability

-frate

Frame rate

-fsg

format finite state grammar file

-fsgctl

file listing FSG file to use for each utterance

-fsgdir

directory for FSG files

-fsgext

extension for FSG files (including leading dot)

-fsgusealtpron

Add alternate pronunciations to FSG

-fsgusefiller

Insert filler words at each state.

-fwdflat

Run forward flat-lexicon search over word lattice (2nd pass)

-fwdflatbeam

Beam width applied to every frame in second-pass flat search

-fwdflatefwid

Minimum number of end frames for a word to be searched in fwdflat search

-fwdflatlw

Language model probability weight for flat lexicon (2nd pass) decoding

-fwdflatsfwin

Window of frames in lattice to search for successor words in fwdflat search

-fwdflatwbeam

Beam width applied to word exits in second-pass flat search

-fwdtree

Run forward lexicon-tree search (1st pass)

-hmm

containing acoustic model files.

-hyp

output file name

-hypseg

output with segmentation file name

-input_endian

Endianness of input data, big or little, ignored if NIST or MS Wav

-jsgf

grammar file

-keyphrase

to spot

-kws

file with keyphrases to spot, one per line

-kws_delay

Delay to wait for best detection score

-kws_plp

Phone loop probability for keyword spotting

-kws_threshold

Threshold for p(hyp)/p(alternatives) ratio

-latsize

Initial backpointer table size

-lda

containing transformation matrix to be applied to features (single-stream features only)

-ldadim

Dimensionality of output of feature transformation (0 to use entire matrix)

-lifter

Length of sin-curve for liftering, or 0 for no liftering.

-lm

trigram language model input file

-lmctl

a set of language model

-lmname

language model in -lmctl to use by default

-lmnamectl

file listing LM name to use for each utterance

-logbase

Base in which all log-likelihoods calculated

-logfn

to write log messages in

-logspec

Write out logspectral files instead of cepstra

-lowerf

Lower edge of filters

-lpbeam

Beam width applied to last phone in words

-lponlybeam

Beam width applied to last phone in single-phone words

-lw

Language model probability weight

-maxhmmpf

Maximum number of active HMMs to maintain at each frame (or -1 for no pruning)

-maxwpf

Maximum number of distinct word exits at each frame (or -1 for no pruning)

-mdef

definition input file

-mean

gaussian means input file

-mfclogdir

to log feature files to

-min_endfr

Nodes ignored in lattice construction if they persist for fewer than N frames

-mixw

mixture weights input file (uncompressed)

-mixwfloor

Senone mixture weights floor (applied to data from -mixw file)

-mllr

transformation to apply to means and variances

-mllrctl

file listing MLLR transforms to use for each utterance

-mllrdir

directory for MLLR transforms

-mllrext

extension for MLLR transforms (including leading dot)

-mmap

Use memory-mapped I/O (if possible) for model files

-nbest

Number of N-best hypotheses to write to -nbestdir (0 for no N-best)

-nbestdir

for writing N-best hypothesis lists

-nbestext

Extension for N-best hypothesis list files

-ncep

Number of cep coefficients

-nfft

Size of FFT

-nfilt

Number of filter banks

-nwpen

New word transition penalty

-outlatbeam

Minimum posterior probability for output lattice nodes

-outlatdir

for dumping word lattices

-outlatext

Filename extension for dumping word lattices

-outlatfmt

Format for dumping word lattices (s3 or htk)

-pbeam

Beam width applied to phone transitions

-pip

Phone insertion penalty

-pl_beam

Beam width applied to phone loop search for lookahead

-pl_pbeam

Beam width applied to phone loop transitions for lookahead

-pl_pip

Phone insertion penalty for phone loop

-pl_weight

Weight for phoneme lookahead penalties

-pl_window

Phoneme lookahead window size, in frames

-rawlogdir

to log raw audio files to

-remove_dc

Remove DC offset from each frame

-remove_noise

Remove noise with spectral subtraction in mel-energies

-remove_silence

Enables VAD, removes silence frames from processing

-round_filters

Round mel filter frequencies to DFT points

-samprate

Sampling rate

-seed

Seed for random number generator; if less than zero, pick our own

-sendump

dump (compressed mixture weights) input file

-senin

Input is senone score dump files

-senlogdir

to log senone score files to

-senmgau

to codebook mapping input file (usually not needed)

-silprob

Silence word transition probability

-smoothspec

Write out cepstral-smoothed logspectral files

-svspec

specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)

-tmat

state transition matrix input file

-tmatfloor

HMM state transition probability floor (applied to -tmat file)

-topn

Maximum number of top Gaussians to use in scoring.

-topn_beam

Beam width used to determine top-N Gaussians (or a list, per-feature)

-toprule

rule for JSGF (first public rule is default)

-transform

Which type of transform to use to calculate cepstra (legacy, dct, or htk)

-unit_area

Normalize mel filters to unit area

-upperf

Upper edge of filters

-uw

Unigram weight

-vad_postspeech

Num of silence frames to keep after from speech to silence.

-vad_prespeech

Num of speech frames to keep before silence to speech.

-vad_startspeech

Num of speech frames to trigger vad from silence to speech.

-vad_threshold

Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level.

-var

gaussian variances input file

-varfloor

Mixture gaussian variance floor (applied to data from -var file)

-varnorm

Variance normalize each utterance (only if CMN == current)

-verbose

Show input filenames

-warp_params

defining the warping function

-warp_type

Warping function type (or shape)

-wbeam

Beam width applied to word exits

-wip

Word insertion penalty

-wlen

Hamming window length

To do batchmode recognition, you will need to specify a control file, using -ctl This is a simple text file containing one entry per line.  Each entry is the name of an input file relative to the -cepdir directory, and without the filename extension (which is given in the -cepext argument).

If you are using acoustic feature files as input (see sphinx_fe(1) for information on how to generate these), you can also specify a subpart of a file, using the following format:

FILENAME START-FRAME END-FRAME UTTERANCE-ID

Author

Written by numerous people at CMU from 1994 onwards.  This manual page by David Huggins-Daines <dhuggins@cs.cmu.edu>

See Also

pocketsphinx_continuous(1), sphinx_fe(1).

Referenced By

pocketsphinx_continuous(1).

2007-08-27