2009 by Matthieu Defrance
info-gibbs is a motif discovery software based on a Gibbs sampling strategy. Given a set of sequences, a motif length and a background model it searches for motifs (PSSMs) that have the best relative entropy (information content).
Defrance M. and van Helden J. info-gibbs: a motif discovery algorithm that directly optimizes information content during sampling, Bioinformatics. 2009;25:2715-2722.
The sequence that will be analyzed. Multiple sequences can be entered at once with several sequence formats.
Input sequence format. Various standards are supported.
Only A, C, G, and T residues are accepted. oligomers that contain undefined (N) or partly defined (IUPAC code) nucleotides are discarded.
Purge sequences (highly recommended)
When checked, large duplicated regions (>= 40 bp alignment with less than 3 mismatches) are filtered out before analysis. Purging is essential for any motif discovery process, to avoid a bias due to non-independence of sequences. Purging is performed with the programs mkvtree and vmatch developed by Stefan Kurtz (firstname.lastname@example.org).
Search both strands: (single or both strands)
By selecting "search both strands", the occurrences of the motif are searched on both strands. This allows to detect elements which act in an orientation-insensitive way (as is generally the case for yeast upstream elements).
The length of the motif. For the detection of regulatory sites, we recommend starting with a length comprise between 6 and 16.
Expected number of matches per sequences:
This option allow to specify the number of the motif occurrences that are expected to be found in each of the input sequences.
Number of motif to extract:
This option allows to search for more than one motif in the input sequences.
Maximum number of iterations:
This option allows to set the maximal number of iteration of the algorithm.
Number of runs:
Due to its stochastic behavior info-gibbs can return different results each time it is run. The core algorithm should be repeated a sufficient number of times in order to produce useful results.
The background model used to compute expected frequencies can be computed from input sequences or loaded form a predefined Markov model. In this later case, a target organism and a Markov order should be selected.