RSA-tools - Tutorials - Random models

The prediction of regulatory elements includes a risk of false predictions. A good motif discovery program includes some statistical parameters (E-value, information content, significance index, ...) to estimate the reliability of each prediction. These statistical estimators may however be more or less adequate, depending on the adequation between the underlying statistical model and the type of biological sequences analyzed.

A practical way to test the rate of false positive returned by motif discovery programs is to submit sequences which are not supposed to be co-regulated. In such case, a good program should be able to return a negative answer, i.e. no pattern at all.

This web site offers two tools for generating random sequence sets, in order to estimate the rates of false positive obtained with different programs: random-sequence and random-genes.

  1. random sequence. This program generates random sequences on the basis of various probabilistic background. One interesting model is to use Markov chain models, calibrated on the complete set off upstream sequences of an organism. This means that the generated sequences does not exist as a biological sequences, but it has the same oligonucleotide composition as the chosen background.

  2. random genes. This program selects a random set of gene identifiers in the selected genome. The selected genes can then be piped to the program retrieve-seq, which will retrieve the upstream sequences of the randomly selected genes. In this case, you will analyze real biological sequences, contrarily to the program random-sequences. The analysis of random gene sets is very useful to evaluate the rate of false positives of motif discovery programs.

The analysis of random gene selections gives more realistic (and more severe) conditions than the randomly generated sequences, which rely on a quite strong assumption about the model underlying sequence composition (e.g. Markov chain: which order is the most appropriate ? Do biological sequences really resemble to the result of a Markov process ? ...).

Next tutorials

  1. Random sequence generation
  2. Random gene selections

You can now come back to the tutorial main page and follow the next tutorials.

For suggestions please post an issue on GitHub or contact the