RSAT - convert-background-model manual



NAME

convert-backgound-model


DESCRIPTION

Interconversions between formats of background models supported by different programs (oligo-analysis, dyad-analysis, matrix-scan, patser, consensus, MotifLocator, MotifSampler, MEME, ...).

Background models can be automatically loaded from the RSAT collection of pre-calculated background models.

This is a temporary version, with only some formats supported. Additional formats will be progressively added.


AUTHORS

Jacques van Helden Jacques.van-Helden\@univ-amu.fr
Morgane Thomas-Chollier morgane@bigre.ulb.ac.be
Jean Valery Turatsinze


CATEGORY

util


USAGE

convert-backgound-model [-i inputfile] [-o outputfile] [-v]


BACKGROUND MODEL FORMATS

Note: some formats are supported for both input and output, but some formats are supported only for input or for output.

oligos (input/output)

Oligonucleotide frequency table, as exported by the RSAT program oligo-analysis. The simplest format is a two-column file with the oligonucleotide in the first column, and the frequency in the second one.

An oligo file containing k-mers frequencies corresponds to a Markov model of order m=k-1. Frequencies are converted to transition probabilities by calculating the relative frequencies of all oligomers of size k having the same prefix (first m letters).

dyads (input)

Oligonucleotide frequency table, as exported by the RSAT program dyad-analysis. Dyads with spacing 0 are converted to oligos, and frequencies are converted into transition probabilities as for the format ``oligos''.

MotifSampler (input/output)

This format can be used by the various programs of the software suite INCLUSive developed by Gert Thijs (http://homes.esat.kuleuven.be/~thijs/download.html).

MotifSampler background files can be generated with the program CreateBackgroundModel, which creates a Markov model from a background sequence.

MEME (input/output)

MEME (http://meme.sdsc.edu/meme/) is a matrix-based pattern discovery program developed by Tim Bailey, and supporting Markov models.

MEME background files can be generated with the program fasta-get-markov, which creates a Markov model from a background sequence.

patser (output)

Export the Markov model in patser format. patser isa pattern matching program developed by Jerry Hertz.

WARNING: patser only supports Bernoulli models ! If the Markov order is superior to 0, this function issues a fatal error.

transitions (output)

Transition table with one row per prefix, one column per suffix, each cell indicating the transition frequency.

tables (output)

Export tables with various statistics: - oligo frequencies

 - frequencies: oligomer frequencies, presented in a
   prefix/suffix table
  - relative frequencies: relative oligomer frequencies. These differ
    from the raw frequencies only i the input file contained
    non-normalized frequencies (e.g. pattern occurrences).
  - transition frequencies: see the format "transitions" above.

This output is convenient for understanding the different steps for the conversion between oligomer frequencies (oligomer length k=m+1) and transition tables.


OPTIONS

-v #

Level of verbosity (detail in the warning messages during execution)

-h

Display full help message

-help

Same as -h

BACKGROUND MODEL FILE

There are two ways to specify a background model file:

By giving the path of the file:

-i inputfile

If no input file is specified, the standard input is used. This allows to use the command within a pipe.

By loading a precalculated background model from the RSAT collection

Note: Specify the format of the background model with '-from' option. Accepted values are oligos and dyad.

-org organism_name

Specifies an organism, installed in RSAT. To have the list of supported organism in RSAT, type the following command: supported-organism

-markov #

Markov order for the background model.

-noov

No overlapping. See oligo-analysis for more details on this option

-1str

Strand-sensitive background model

-2str

Strand-insensitive background model

-o outputfile

If no output file is specified, the standard output is used. This allows to use the command within a pipe.

-from input_format

Input format. Supported: oligo-analysis, MotifSampler, meme,dyads.

-to output_format

Output format. Supported: transitions (default), tables, oligo-analysis, patser, MotifSampler.

-bg_pseudo #

Pseudo frequency for the background models. Value must be a real between 0 and 1 (default: 0.01) If the training sequence length (L) is known, the value can be set to square-root of L divided by L+squareroot of L

-decimals #

Number of decimals to print or the transition probabilities.

-to2str

Convert to a strand-insensitive background model, by averaging transition frequencies on each pair of reverse complements.


SEE ALSO

seq-proba

Calculate the probability of a sequence fiven a background model.

matrix-scan

Scan sequences with a PSSM, accepting various background models.

patser

Scan sequences with a PSSM. Program developed by Jerry Hertz.