- Open the toolset Build control sets of the RSAT
	    toolbox, and click random sequence. Adapt
	    the options to get 1000 sequences of 5,000bp each.
	 
- Choice of the background model: select the
	    organism Saccharomyces cerevisiae, check the
	    option "DNA sequences calibrated on non-coding upstream
	    sequences" with an oligonucleotide size
	    of 6 (this corresponds to a Markov model of
	    order m = k-1 = 5), and click GO.
	 - After a few seconds, the result page appears. The sequences
	  are not displayed to avoid massive transfer (you don't
	  specially need to transfer 5Mb before submitting the
	  resulting sequences to the next analysis steps). If you
	  want, you can check the sequences by clicking on the link
	  under Result files(s). 
- In the result page, click the
	    button dna-pattern. 
	     - 
	      - Enter GATACA in
		the box Query pattern(s).
- For the Search strands option,
		  select direct only, and disactivate the
		  option prevent overlapping matches. 
- Disactivate the default return
		options match positions and sequence
		  limits. 
- Activate the options match count
		  table, match rank and sort.
		
 - 
	      - Justification for the chosen options- 
	      - The option match count table will return a
		table with one row per sequence, and one column per
		pattern (in this case we submitted a single pattern,
		but the tool would also allow you to analyze different
		patterns in a single run),
	      
- The option sort will sort the sequences y
		decreasing occurrences, you will thus immeiately see the
		maximum number of occurrences in the random trial. 
- The option match rank will indicate the rank
		of each sequence accordnig to the number of pattern
		occurrences, and thus allow us to check the number of
		sequences haing 0, 1, 2, ... occurrences. 
 
 - At this stage, you can already count the number of
	      sequences having 6, 5, 4, ... 0 occurrences of the
	      pattern by browing the result table from top to
	      bottom. We will however use another program to compute
	      it automatically, and display the result
	      graphically. 
- At the bottom of the dna-pattern result page,
	    click the button Frequency distribution. Set
	    the class interval to 1, the Data column to
	    3 (this column contains the counts of GATACA) and
	    click GO. The result table shows you the number of
	    sequences containing 0, 1, 2, ... occurrences,
	    respectively. Read the header to understand the content of
	    the columns. At the bottm of the frequency table, you have
	    some statistics, including the mean number of occurrences
	    per sequences (as I run this test, I obtain 1.331, but
	    this number is supposed to fluctuate between
	    trials). Compare this observed mean to the expected number
	    of occurrences.
	 
- We will now generate a graphical representation of the
	    frequency distribution. For this,
	    click XYgraph at the bottom of the frequency
	    distribution result page. Set the Data column for X
	    axis to 1, leave all other parameters unchanged and
	    click GO. - On the resulting graph, 
	     - 
	      - the blue curve indicates displays the
		number n of sequences (ordinate) presenting X
		occurrences (0, 1, 2, ...); of GATACA; 
- the green curve
		(n_cum) indicates the cumulative distribution,
		i.e.  number of sequences containing at most X
		occurrences;
- the pink curve
		(n_dcum) indicates the decrasing cumulative
		distribution, i.e. the number of sequences
		containing at least X occurrences.
	    
 
- The distribution graph above indicating the absolute
	    frequencies (i.e. number of sequences). In order to
	    display the distributions of relative frequencies, come
	    back to the XYgraph form, and type 7,8,9 in the
	    box Data columns for Y axis. Optionally, you can
	    also choose to speficy a log base of 10 for the Y
	    axis. This will better highlight the lower frequencies
	    associated to large occurrence numbers.