esl-construct [options] msafile
esl-construct reports information on existing consensus secondary structure annotation of an alignment or derives new consensus secondary structures based on structure annotation for individual aligned sequences.
The alignment file must contain either individual sequence secondary structure annotation (Stockholm #=GR SS), consensus secondary structure annotation (Stockohlm #=GC SS_cons), or both. All structure annotation must be in WUSS notation (Vienna dot paranetheses notation will be correctly interpreted). At present, the alignment file must be in Stockholm format and contain RNA or DNA sequences.
By default, esl-construct generates lists the sequences in the alignment that have structure annotation and the number of basepairs in those structures. If the alignment also contains consensus structure annotation, the default output will list how many of the individual basepairs overlap with the consensus basepairs and how many conflict with a consensus basepair.
For the purposes of this miniapp, a basepair 'conflict' exists between two basepairs in different structures, one between columns i and j and the other between columns k and l, if (i == k and j != l) or (j == l and i != k).
esl-construct can also be used to derive a new consensus structure based on structure annotation for individual sequences in the alignment by using any of the following options: -x, -r, -c, --indi <s>, --ffreq <x>, --fmin. These are described below. All of these options require the -o <f> option be used as well to specify that a new alignment file <f> be created. Differences between the new alignment(s) and the input alignment(s) will be limited to the the consensus secondary structure (#=GC SS_cons) annotation and possibly reference (#=GC RF) annotation.
Print brief help; includes version number and summary of all options, including expert options.
List all alignment positions that are involved in at least one conflicting basepair in at least one sequence to the screen, and then exit.
Be verbose; with no other options, list individual sequence basepair conflicts as well as summary statistics.
Compute a new consensus structure as the maximally sized set of basepairs (greatest number of basepairs) chosen from all individual structures that contains 0 conflicts. Output the alignment with the new SS_cons annotation. This option must be used in combination with the -o option.
Remove any consensus basepairs that conflict with >= 1 individual basepair and output the alignment with the new SS_cons annotation. This option must be used in combination with the -o option.
Define a new consensus secondary structure as the individual structure annotation that has the maximum number of consistent basepairs with the existing consensus secondary structure annotation. This option must be used in combination with the -o option.
With -c, set the reference annotation (#=GC RF) as the sequence whose individual structure becomes the consensus structure.
- --indi <s>
Define a new consensus secondary structure as the individual structure annotation from sequence named <s>. This option must be used in combination with the -o option.
With --indi <s>, set the reference annotation (#=GC RF) as the sequence named <s>.
- --ffreq <x>
Define a new consensus structure as the set of basepairs between columns i:j that are paired in more than <x> fraction of the individual sequence structures. This option must be used in combination with the -o option.
Same as --ffreq <x> except find the maximal <x> that gives a consistent consensus structure. A consistent structure has each base (alignment position) as a member of at most 1 basepair.
- -o <s>,
Output the alignment(s) with new consensus structure annotation to file <f>.
With -o, specify that the alignment output format be Pfam format, a special type of non-interleaved Stockholm on which each sequence appears on a single line.
- -l <f>
Create a new file <f> that lists the sequences that have at least one basepair that conflicts with a consensus basepair.
- --lmax <n>
With -l, only list sequences that have more than <n> basepairs that conflict with the consensus structure to the list file.
Copyright (C) 2020 Howard Hughes Medical Institute. Freely distributed under the BSD open source license.