RSA-tools - Tutorials - microarrays
Contents
Prerequisite
This tutorial assumes that you already followed the tutorials on motif discovery.
Introduction
One of the most common utilizations of the tools proposed on this web site is the prediction of regulatory motifs from clusters of co-expressed genes. For this tutorial, we will use clusters of co-expressed genes from one of the most spectacular experiments published in the early times of microarrays: Spellman's paper on yeast cell cycle.
Spellman, P.T. et al (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9: 3273-3297.
Note that the complete data set is available at Stanford (http://genome-www.stanford.edu/cellcycle/), but you don't need to download it for this tutorial.
Spellman and co-workers measured the level of expression of the ~6000 yeast genes at different stages of the cell cycle, and selected ~800 genes showing periodical fluctuations. They applied hierarchical clustering to group genes with similar expression profiles, and identified several clusters of co-expressed genes, corresponding to functional classes (Figure 1 of the original paper).
Example of utilization
We selected some clusters of co-expressed genes defined in Spellman in 1998, and stored the member genes and the upstream sequences on this web site.
Motif discovery
- Click on the link below to open a new window with the data sets.
- Open the file with the composition of the CLN2 cluster (spellman_cln2.fam). Select the whole content of the file, and copy it in the clipboard.
- In the left frame of the Regulatory Sequence Analysis Tools, click on retrieve-seq. Paste the genes and get their sequences.
- Use oligo-analysis to detect over-represented motifs in this set of upstream sequences.
A few (16) oligonucleotides are returned, with a very high score (the top scoring pattern, ACGCGT, has a significance value of 32!). These patterns can be assembled to form different variants of the motif, whose most conserved consensus are:
aaACGCGTtt aaACGCGAaaFeature map drawing
We will now draw a feature map to analyze the position of these motifs.
- Use dna-pattern and feature-map (as in the previous tutorials) to draw a graphical representation of the occurrences of the significant motifs.
- How do you interpet the map ? Are the patterns present in all the upstream sequences ? Do they show a positional preference ?
Matching discovered patterns with a library
The motif discovered in the CLN2 cluster is very highly significant, and the feature-map shows that it is present in almost all the upstream sequences of this cluster. We seem to have isolated a motif which is quite specific for this cluster of genes. The next question is to know which transcription factorr binds to this motif. For this, we will use the motif matching tool of the Saccharomyces cerevisiae Promoter Database (SCPD).
- Click on the link below to access the SCPD motif matching tool.
- Search existing motif with the most significant hexanucleotide returned by oligo-analysis (ACGCGT).
The search returns 8 matches, all corresponding to the transcription factor MCB. If you try the alternative motif returned by oligo-analysis (ACGCGA), you will get two additional matches for MCB.
MCB is a protein complex formed by Mbp1 and Swi6. This complex controls many genes required for DNA synthesis during the G1/S transition (see MIPS for more details).
Additional exercises
If you want to go further for the analysis of microarray clusters, we suggest the following exercises.
- Perform the same analysis with the other clusters of Spellman's paper (MET, MAT, MCM, CLB2, SIC1, histone, Yprime).
- Use the other motif discovery tools (dyad-analysis, consensus, gibbs) to detect motifs in the upstream sequences of the CLN2 cluster. Compare the motifs returned by the respective algorithms.
Next steps
You can now come back to the tutorial main page and follow the next tutorials.
For suggestions or information request, please contact