esl-alimap - Man Page

map two alignments to each other

Synopsis

esl-alimap [options] msafile1 msafile2

esl-alimap is a highly specialized application that determines the optimal alignment mapping of columns between two alignments of the same sequences. An alignment mapping defines for each column in alignment 1 a matching column in alignment 2. The number of residues in the aligned sequences that are in common between the two matched columns are considered 'shared' by those two columns.

For example, if the nth residue of sequence i occurs in alignment 1 column x and alignment 2 column y, then only a mapping of alignment 1 and 2 that includes column x mapping to column y would correctly map and share the residue.

The optimal mapping of the two alignments is the mapping which maximizes the sum of shared residues between all pairs of matching columns. The fraction of total residues that are shared is reported as the coverage in the esl-alimap output.

Only the first alignments in msafile1 and msafile2 will be mapped to each other. If the files contain more than one alignment, all alignments after the first will be ignored.

The two alignments (one from each file) must contain exactly the same sequences (if they were unaligned, they'd be identical) in precisely the same order. They must also be in Stockholm format.

The output of esl-alimap differs depending on whether one or both of the alignments contain reference (#=GC RF) annotation. If so, the coverage for residues from nongap RF positions will be reported separately from the total coverage.

esl-alimap uses a dynamic programming algorithm to compute the optimal mapping. The algorithm is similar to the Needleman-Wunsch-Sellers algorithm but the scores used at each step of the recursion are not residue-residue comparison scores but rather the number of residues shared between two columns.

The --mask-a2a <f>, --mask-a2rf <f>, --mask-rf2a <f>, and --mask-rf2rf <f> options create 'mask' files that pertain to the optimal mapping in slightly different ways. A mask file consists of a single line, of only '0' and '1' characters. These denote which positions of the alignment from msafile1 map to positions of the alignment from msafile2 as described below for each of the four respective masking options. These masks can be used to extract only those columns of the msafile1 alignment that optimally map to columns of the msafile2 alignment using the esl-alimask miniapp. To extract the corresponding set of columns from msafile2 (that optimally map to columns of the alignment from msafile1), it is necessary to rerun the program with the order of the two msafiles reversed, save new masks, and use esl-alimask again.

Options

-h: Print brief help; includes version number and summary of all options.
-q: Be quiet; don't print information the optimal mapping of each column, only report coverage and potentially save masks to optional output files.
--mask-a2a <f>: Save a mask of '0's and '1's to file <f>. A '1' at position x means that position x of the alignment from msafile1 maps to an alignment position in the alignment from msafile2 in the optimal map.
--mask-a2rf <f>: Save a mask of '0's and '1's to file <f>. A '1' at position x means that position x of the alignment from msafile1 maps to a nongap RF position in the alignment from msafile2 in the optimal map.
--mask-rf2a <f>: Save a mask of '0's and '1's to file <f>. A '1' at position x means that nongap RF position x of the alignment from msafile1 maps to an alignment position in the alignment from msafile2 in the optimal map.
--mask-rf2rf <f>: Save a mask of '0's and '1's to file <f>. A '1' at position x means that nongap RF position x of the alignment from msafile1 maps to a nongap RF position in the alignment from msafile2 in the optimal map.
--submap <f>: Specify that all of the columns from the alignment from msafile1 exist identically (contain the same residues from all sequences) in the alignment from msafile2. This makes the task of mapping trivial. However, not all columns of msafile1 must exist in msafile2. Save the mask to file <f>. A '1' at position x of the mask means that position x of the alignment from msafile1 is the same as position y of msafile2, where y is the number of '1's that occur at positions <= x in the mask.
--amino: Assert that msafile1 and msafile2 contain protein sequences.
--dna: Assert that msafile1 and msafile2 contain DNA sequences.
--rna: Assert that the msafile1 and msafile2 contain RNA sequences.

Copyright

Copyright (C) 2020 Howard Hughes Medical Institute.
Freely distributed under the BSD open source license.

Author

http://eddylab.org

Info

Nov 2020 Easel 0.48 Easel Manual