RSA-tools - Tutorials - Word counts


  1. Prerequisite
  2. Introduction
  3. Questions
  4. Example of utilization
  5. Interpreting the result
  6. Additional exercises
  7. Bibliography


The theoretical background required for this tutorial can be found in the RSAT course.

In particular, we recommend to read the following slides before starting this tutorial.


In this tutorial, we will get familiar with the concepts of word occurrences (i.e. number of instances of a given oligonucleotide) in DNA sequences.


  1. Assuming a 5th order Markovian background model calibrated on all upstream non-coding sequences of the yeast Saccharomyces cerevisiae, how many occurrences of the word GATACA would you expect by chance in a 5kb sequence?

  2. Using the same background model, generate 1,000 random sequences of length L=5000bp and compute the frequency distribution of the word GATACA. Does the observed mean correspond to your expectation?

  3. Which fraction of the sequences contain at least 3 occurrences of the word ?


  1. By default, the program dna-pattern returns the matching postions of the query patterns in the input sequences, but the options can be changed to obtain a count table, indicating the number of occurrences of a given pattern for each input sequence.


View solution| Hide solution

Next steps

You can now come back to the tutorial main page and follow the next tutorials.

For suggestions please post an issue on GitHub or contact the