convert-backgound-model
Interconversions between formats of background models supported by different programs (oligo-analysis, dyad-analysis, matrix-scan, patser, consensus, MotifLocator, MotifSampler, MEME, ...).
Background models can be automatically loaded from the RSAT collection of pre-calculated background models.
This is a temporary version, with only some formats supported. Additional formats will be progressively added.
util
convert-backgound-model [-i inputfile] [-o outputfile] [-v]
Note: some formats are supported for both input and output, but some formats are supported only for input or for output.
Oligonucleotide frequency table, as exported by the RSAT program oligo-analysis. The simplest format is a two-column file with the oligonucleotide in the first column, and the frequency in the second one.
An oligo file containing k-mers frequencies corresponds to a Markov model of order m=k-1. Frequencies are converted to transition probabilities by calculating the relative frequencies of all oligomers of size k having the same prefix (first m letters).
Oligonucleotide frequency table, as exported by the RSAT program dyad-analysis. Dyads with spacing 0 are converted to oligos, and frequencies are converted into transition probabilities as for the format ``oligos''.
This format can be used by the various programs of the software suite INCLUSive developed by Gert Thijs (http://homes.esat.kuleuven.be/~thijs/download.html).
MotifSampler background files can be generated with the program CreateBackgroundModel, which creates a Markov model from a background sequence.
MEME (http://meme.sdsc.edu/meme/) is a matrix-based pattern discovery program developed by Tim Bailey, and supporting Markov models.
MEME background files can be generated with the program fasta-get-markov, which creates a Markov model from a background sequence.
Export the Markov model in patser format. patser isa pattern matching program developed by Jerry Hertz.
WARNING: patser only supports Bernoulli models ! If the Markov order is superior to 0, this function issues a fatal error.
Transition table with one row per prefix, one column per suffix, each cell indicating the transition frequency.
Export tables with various statistics: - oligo frequencies
- frequencies: oligomer frequencies, presented in a prefix/suffix table
- relative frequencies: relative oligomer frequencies. These differ from the raw frequencies only i the input file contained non-normalized frequencies (e.g. pattern occurrences).
- transition frequencies: see the format "transitions" above.
This output is convenient for understanding the different steps for the conversion between oligomer frequencies (oligomer length k=m+1) and transition tables.
Level of verbosity (detail in the warning messages during execution)
Display full help message
Same as -h
There are two ways to specify a background model file:
-i inputfile
If no input file is specified, the standard input is used. This allows to use the command within a pipe.
Note: Specify the format of the background model with '-from' option. Accepted values are oligos and dyad.
-org organism_name
Specifies an organism, installed in RSAT. To have the list of supported organism in RSAT, type the following command: supported-organism
Markov order for the background model.
No overlapping. See oligo-analysis for more details on this option
Strand-sensitive background model
Strand-insensitive background model
If no output file is specified, the standard output is used. This allows to use the command within a pipe.
Input format. Supported: oligo-analysis, MotifSampler, meme,dyads.
Output format. Supported: transitions (default), tables, oligo-analysis, patser, MotifSampler.
Pseudo frequency for the background models. Value must be a real between 0 and 1 (default: 0.01) If the training sequence length (L) is known, the value can be set to square-root of L divided by L+squareroot of L
Number of decimals to print or the transition probabilities.
Convert to a strand-insensitive background model, by averaging transition frequencies on each pair of reverse complements.
Calculate the probability of a sequence fiven a background model.
Scan sequences with a PSSM, accepting various background models.
Scan sequences with a PSSM. Program developed by Jerry Hertz.