sphinx_fe man page

sphinx_fe — Convert audio files to acoustic feature files

Synopsis

sphinx_fe [ options ]...

Description

This program converts audio files (in either Microsoft WAV, NIST Sphere, or raw format) to acoustic feature files for input to batch-mode speech recognition.  The resulting files are also useful for various other things.  A list of options follows:

-alpha

Preemphasis parameter

-argfile

file (e.g. feat.params from an acoustic model) to read parameters from.  This will override anything set in other command line arguments.

-blocksize

Number of samples to read at a time.

-build_outdirs

Create missing subdirectories in output directory

-c

file for batch processing

-cep2spec

Input is cepstral files, output is log spectral files

-di

directory, input file names are relative to this, if defined

-dither

Add 1/2-bit noise

-do

directory, output files are relative to this

-doublebw

Use double bandwidth filters (same center freq)

-ei

extension to be applied to all input files

-eo

extension to be applied to all output files

-example

Shows example of how to use the tool

-frate

Frame rate

-help

Shows the usage of the tool

-i

audio input file

-input_endian

Endianness of input data, big or little, ignored if NIST or MS Wav

-lifter

Length of sin-curve for liftering, or 0 for no liftering.

-logspec

Write out logspectral files instead of cepstra

-lowerf

Lower edge of filters

-mach_endian

Endianness of machine, big or little

-mswav

Defines input format as Microsoft Wav (RIFF)

-ncep

Number of cep coefficients

-nchans

Number of channels of data (interlaced samples assumed)

-nfft

Size of FFT

-nfilt

Number of filter banks

-nist

Defines input format as NIST sphere

-npart

Number of parts to run in (supersedes -nskip and -runlen if non-zero)

-nskip

If a control file was specified, the number of utterances to skip at the head of the file

-o

cepstral output file

-ofmt

Format of output files - one of sphinx, htk, text.

-part

Index of the part to run (supersedes -nskip and -runlen if non-zero)

-raw

Defines input format as raw binary data

-remove_dc

Remove DC offset from each frame

-round_filters

Round mel filter frequencies to DFT points

-runlen

If a control file was specified, the number of utterances to process, or -1 for all

-samprate

Sampling rate

-seed

Seed for random number generator; if less than zero, pick our own

-smoothspec

Write out cepstral-smoothed logspectral files

-sndfile

Use libsndfile to read input data

-spec2cep

Input is log spectral files, output is cepstral files

-sph2pipe

Input is NIST sphere (possibly with Shorten), use sph2pipe to convert

-transform

Which type of transform to use to calculate cepstra (legacy, dct, or htk)

-unit_area

Normalize mel filters to unit area

-upperf

Upper edge of filters

-verbose

Show input filenames

-warp_params

defining the warping function

-warp_type

Warping function type (or shape)

-whichchan

Channel to process (numbered from 1), or 0 to mix all channels

-wlen

Hamming window length

Currently the only kind of features supported are MFCCs (mel-frequency cepstral coefficients).  There are numerous options which control the properties of the output features.  It is VERY important that you document the specific set of flags used to create any given set of feature files, since this information is NOT recorded in the files themselves, and any mismatch between the parameters used to extract features for recognition and those used to extract features for training will cause recognition to fail.

Author

Written by numerous people at CMU from 1994 onwards.  This manual page by David Huggins-Daines <dhuggins@cs.cmu.edu>

Referenced By

pocketsphinx_batch(1), pocketsphinx_continuous(1).

2007-08-27