hmmalign man page

hmmalign — align sequences to a profile HMM


hmmalign [options] <hmmfile> <seqfile>


Perform a multiple sequence alignment of all the sequences in <seqfile> by aligning them individually to the profile HMM in <hmmfile>. The new alignment is output to stdout in Stockholm format.

The <hmmfile> should contain only a single profile. If it contains more, only the first profile in the file will be used.

Either <hmmfile> or <seqfile> (but not both) may be '-' (dash), which means reading this input from stdin rather than a file.  

The sequences in <seqfile> are aligned in unihit local alignment mode.  Therefore they should already be known to contain only a single domain (or a fragment of one). The optimal alignment may assign some residues as nonhomologous (N and C states), in which case these residues are still included in the resulting alignment, but shoved to the outer edges. To trim these unaligned nonhomologous residues from the result, see the --trim option.



-o <f>

Direct the output alignment to file <f>, rather than to stdout.

--mapali <f>

Merge the existing alignment in file <f> into the result, where <f> is exactly the same alignment that was used to build the model in   <hmmfile>. This is done using a map of alignment columns to consensus  profile positions that is stored in the <hmmfile>. The multiple alignment in <f> will be exactly reproduced in its consensus columns (as defined by the profile), but the displayed alignment in insert columns may be altered, because insertions relative to a profile are considered by convention to be unaligned data.


Trim nonhomologous residues (assigned to N and C states in the optimal alignments) from the resulting multiple alignment output.


Specify that all sequences in <seqfile> are proteins. By default, alphabet type is autodetected from looking at the residue composition.


Specify that all sequences in <seqfile> are DNAs.


Specify that all sequences in <seqfile> are RNAs.

--informat <s>

Declare that the input <seqfile> is in format <s>. Accepted sequence file formats include FASTA, EMBL, GenBank, DDBJ, UniProt, Stockholm, and SELEX. Default is to autodetect the format of the file.

--outformat <s>

Specify that the output multiple alignment is in format <s>. Currently the accepted multiple alignment sequence file formats only include Stockholm and SELEX.

