RSA-tools - Tutorials - random-seq
Contents
Prerequisite
This tutorial does not require any prior knowledge, but in order to take the maximal benefit of it, you should first follow the tutorials on sequence retrieval and oligo-analysis.
Introduction
Random sequences are commonly used in bioinformatics to test the rate of false positive returned by a given pattern dicovery program.
The program random-sequence generates random sequences on the basis of different probabilistic models. This tutorial describes the different ways to generate random sequences, following either a Bernouilli or a Markov chain process.
Examples of utilization
Independent and equiprobable nucleotides
We will start this tutorial by generating a random sequence according to the simplest possible model: independent and equiprobable nucleotides.
Independent means that, at each position of the sequence, a nucleotide is added without taking into account its context, i.e. the precedin and following nucleotides. Equiprobable means that each nucleotide has the same probability, 0.25, to occur at any position. Note that this is a very poor model to simulate biological sequences. Indeed, in most biological sequences, there is a bias in the A/T versus G/C composition. This option is only supported for the sale of comparison with more realistic models.
- In the left frame, click on random sequence.
- Select the option Independent and equiprobable nucleotides.
- Click GO.
The program generates a set of 10 random sequences of length 1000 each. We will now use oligo-analysis to check the nucleotide composition of these sequences.
- At the bottom of the result page, a series of buttons allow you to send these sequences to various programs. Click oligo-analysis.
- In the oligo-analysis form, we need to inactivate some of the default options, in order to calculate nucleotide frequencies.
- Select an oligonucleotide size of 1 (single nucleotides).
- Count on single strand.
- In the Return table, uncheck the boxes Binomial proba and Significance, and check the box Frequencies
- Click GO.
After a few seconds, the result appears. Do the frequencies correspond to your expectation ?
Independent nucleotides with distinct probabilities
A slightly more elaborate random model allows you to assign a specific expected frequency to each nucleotide
As an exercise, generate a random sequence with 90% of A/T and 10% of G/C. Check the frequencies of nucleotides with oligo-analysis.
Additional exercises
You can now come back to the tutorial main page and follow the next tutorials.
For suggestions please post an issue on GitHub or contact the