esl-shuffle - Man Page

shuffling sequences or generating random ones

Synopsis

esl-shuffle [options] seqfile
  (shuffle sequences)

esl-shuffle -G [options]
  (generate random sequences)

esl-shuffle -A [options] msafile
  (shuffle multiple sequence alignments)

Description

esl-shuffle has three different modes of operation.

By default, esl-shuffle reads individual sequences from seqfile, shuffles them, and outputs the shuffled sequences. By default, shuffling is done by preserving monoresidue composition; other options are listed below.

With the -G option, esl-shuffle generates some number of random sequences of some length in some alphabet. The -N option controls the number (default is 1), the -L option controls the length (default is 0), and the --amino, --dna, and --rna options control the alphabet.

With the -A option, esl-shuffle reads one or more multiple alignments from msafile shuffles them, and outputs the shuffled alignments. By default, the alignment is shuffled columnwise (i.e. column order is permuted). Other options are listed below.

General Options

-h: Print brief help; includes version number and summary of all options, including expert options.
-o <f>: Direct output to a file named <f> rather than to stdout.
-N <n>: Generate <n> sequences, or <n> perform independent shuffles per input sequence or alignment.
-L <n>: Generate sequences of length <n>, or truncate output shuffled sequences or alignments to a length of <n>.

Sequence Shuffling Options

These options only apply in default (sequence shuffling) mode. They are mutually exclusive.

-m: Monoresidue shuffling (the default): preserve monoresidue composition exactly. Uses the Fisher/Yates algorithm (aka Knuth's "Algorithm P").
-d: Diresidue shuffling; preserve diresidue composition exactly. Uses the Altschul/Erickson algorithm (Altschul and Erickson, 1986). A more efficient algorithm (Kandel and Winkler 1996) is known but has not yet been implemented in Easel.
-0: 0th order Markov generation: generate a sequence of the same length with the same 0th order Markov frequencies. Such a sequence will approximately preserve the monoresidue composition of the input.
-1: 1st order Markov generation: generate a sequence of the same length with the same 1st order Markov frequencies. Such a sequence will approximately preserve the diresidue composition of the input.
-r: Reversal; reverse each input.
-w <n>: Regionally shuffle the input in nonoverlapping windows of size <n> residues, preserving exact monoresidue composition in each window.

Multiple Alignment Shuffling Options

-b: Sample columns with replacement, in order to generate a bootstrap-resampled alignment dataset.
-v: Shuffle residues with each column independently; i.e., permute residue order in each column ("vertical" shuffling).

Sequence Generation Options

One of these must be selected, if -G is used.

--amino: Generate amino acid sequences.
--dna: Generate DNA sequences.
--rna: Generate RNA sequences.

Expert Options

--informat <s>: Assert that input seqfile is in format <s>, bypassing format autodetection. Common choices for <s> include: fasta, embl, genbank. Alignment formats also work; common choices include: stockholm, a2m, afa, psiblast, clustal, phylip. For more information, and for codes for some less common formats, see main documentation. The string <s> is case-insensitive (fasta or FASTA both work).
--seed <n>: Specify the seed for the random number generator, where the seed <n> is an integer greater than zero. This can be used to make the results of esl-shuffle reproducible. If <n> is 0, the random number generator is seeded arbitrarily and stochastic simulations will vary from run to run. Arbitrary seeding (0) is the default.

Copyright

Copyright (C) 2020 Howard Hughes Medical Institute.
Freely distributed under the BSD open source license.

Author

http://eddylab.org

Info

Nov 2020 Easel 0.48 Easel Manual