esl-shuffle [options] seqfile (shuffle sequences) esl-shuffle -G [options] (generate random sequences) esl-shuffle -A [options] msafile (shuffle multiple sequence alignments)
esl-shuffle has three different modes of operation.
By default, esl-shuffle reads individual sequences from seqfile, shuffles them, and outputs the shuffled sequences. By default, shuffling is done by preserving monoresidue composition; other options are listed below.
With the -G option, esl-shuffle generates some number of random sequences of some length in some alphabet. The -N option controls the number (default is 1), the -L option controls the length (default is 0), and the --amino, --dna, and --rna options control the alphabet.
With the -A option, esl-shuffle reads one or more multiple alignments from msafile shuffles them, and outputs the shuffled alignments. By default, the alignment is shuffled columnwise (i.e. column order is permuted). Other options are listed below.
Print brief help; includes version number and summary of all options, including expert options.
- -o <f>
Direct output to a file named <f> rather than to stdout.
- -N <n>
Generate <n> sequences, or <n> perform independent shuffles per input sequence or alignment.
- -L <n>
Generate sequences of length <n>, or truncate output shuffled sequences or alignments to a length of <n>.
Sequence Shuffling Options
These options only apply in default (sequence shuffling) mode. They are mutually exclusive.
Monoresidue shuffling (the default): preserve monoresidue composition exactly. Uses the Fisher/Yates algorithm (aka Knuth's "Algorithm P").
Diresidue shuffling; preserve diresidue composition exactly. Uses the Altschul/Erickson algorithm (Altschul and Erickson, 1986). A more efficient algorithm (Kandel and Winkler 1996) is known but has not yet been implemented in Easel.
0th order Markov generation: generate a sequence of the same length with the same 0th order Markov frequencies. Such a sequence will approximately preserve the monoresidue composition of the input.
1st order Markov generation: generate a sequence of the same length with the same 1st order Markov frequencies. Such a sequence will approximately preserve the diresidue composition of the input.
Reversal; reverse each input.
- -w <n>
Regionally shuffle the input in nonoverlapping windows of size <n> residues, preserving exact monoresidue composition in each window.
Multiple Alignment Shuffling Options
Sample columns with replacement, in order to generate a bootstrap-resampled alignment dataset.
Shuffle residues with each column independently; i.e., permute residue order in each column ("vertical" shuffling).
Sequence Generation Options
One of these must be selected, if -G is used.
Generate amino acid sequences.
Generate DNA sequences.
Generate RNA sequences.
- --informat <s>
Assert that input seqfile is in format <s>, bypassing format autodetection. Common choices for <s> include: fasta, embl, genbank. Alignment formats also work; common choices include: stockholm, a2m, afa, psiblast, clustal, phylip. For more information, and for codes for some less common formats, see main documentation. The string <s> is case-insensitive (fasta or FASTA both work).
- --seed <n>
Specify the seed for the random number generator, where the seed <n> is an integer greater than zero. This can be used to make the results of esl-shuffle reproducible. If <n> is 0, the random number generator is seeded arbitrarily and stochastic simulations will vary from run to run. Arbitrary seeding (0) is the default.
Copyright (C) 2020 Howard Hughes Medical Institute. Freely distributed under the BSD open source license.