# esl-mixdchlet - Man Page

fitting mixture Dirichlets to count data

## Synopsis

esl-mixdchlet fit[options]Q K in_countfile out_mixchlet(train a new mixture Dirichlet)esl-mixdchlet score[options]mixdchlet_file counts_file(calculate log likelihood of count data, given mixture Dirichlet)esl-mixdchlet gen[options]mixdchlet_file(generate synthetic count data from mixture Dirichlet)esl-mixdchlet sample[options] (sample a random mixture Dirichlet for testing)

## Description

The **esl-mixdchlet** miniapp is for training mixture Dirichlet priors, such as the priors used in HMMER and Infernal. It has four subcommands: **fit, score, gen,** and **sample.** The most important subcommand is **fit,** which is the subcommand for fitting a new mixture Dirichlet distribution to a collection of count vectors (for example, emission or transition count vectors from Pfam or Rfam training sets).

Specifically, **esl-mixdchlet fit** fits a new mixture Dirichlet distribution with *Q* mixture components to the count vectors (of alphabet size *K* ) in input file *in_countfile,* and saves the mixture Dirichlet into output file *out_mixdchlet.*

The input count vector file *in_countfile* contains one count vector of length *K* fields per line, for any number of lines. Blank lines and lines starting in # (comments) are ignored. Fields are nonnegative real values; they do not have to be integers, because they can be weighted counts.

The format of a mixture Dirichlet file *out_mixdchlet* is as follows. The first line has two fields, *K Q,* where *K* is the alphabet size and *Q* is the number of mixture components. The next *Q* lines consist of *K+1* fields. The first field is the mixture coefficient *q_k,* followed by *K* fields with the Dirichlet alpha[k][a] parameters for this component.

The **esl-mixdchlet score** subcommand calculates the log likelihood of the count vector data in *counts_file,* given the mixture Dirichlet in *mixdchlet_file.*

The **esl-mixdchlet gen** subcommand generates synthetic count data, given a mixture Dirichlet.

The **esl-mixdchlet sample** subcommand creates a random mixture Dirichlet distribution and outputs it to standard output.

## Options for Fit Subcommand

- -h
Print brief help specific to the

**fit**subcommand.- -s
*<seed>* Set random number generator seed to nonnegative integer

*<seed>.*Default is 0, which means to use a quasirandom arbitrary seed. Values >0 give reproducible results.

## Options for Score Subcommand

**-h**Print brief help specific to the

**score**subcommand.

## Options for Gen Subcommand

**-h**Print brief help specific to the

**gen**subcommand.**-s***<seed>*Set random number generator seed to nonnegative integer

*<seed>.*Default is 0, which means to use a quasirandom arbitrary seed. Values >0 give reproducible results.**-M***<M>*Generate

*<M>*counts per sampled vector. (Default 100.)**-N***<N>*Generate

*<N>*count vectors. (Default 1000.)

## Options for Sample Subcommand

**-h**Print brief help specific to the

**sample**subcommand.**-s***<seed>*Set random number generator seed to nonnegative integer

*<seed>.*Default is 0, which means to use a quasirandom arbitrary seed. Values >0 give reproducible results.**-K***<K>*Set the alphabet size to

*<K>.*(Default is 20, for amino acids.)**-Q***<Q>*Set the number of mixture components to

*<Q>.*(Default is 9.)

## See Also

http://bioeasel.org/

## Copyright

Copyright (C) 2020 Howard Hughes Medical Institute. Freely distributed under the BSD open source license.

## Author

http://eddylab.org