RSAT - pattern search manual

Name

1997-98 by

Description

Searches all occurrences of a pattern within DNA sequences. The pattern can be entered as a simple nucleotide sequence, or may include degenerate nucleotide codes and regular expressions.

Options

Query pattern(s)

Example:

        TTGTT   TGbox
        GATWA   GATAbox
        CCCCT   STRE

Ambiguous nucleotide codes

                A                       (Adenine)
                C                       (Cytosine)
                G                       (Guanine)
                T                       (Thymine)
                R       = A or G        (puRines)
                Y       = C or T        (pYrimidines)
                W       = A or T        (Weak hydrogen bonding)
                S       = G or C        (Strong hydrogen bonding)
                M       = A or C        (aMino group at common position)
                K       = G or T        (Keto group at common position)
                H       = A, C or T     (not G)
                B       = G, C or T     (not A)
                V       = G, A, C       (not T)
                D       = G, A or T     (not C)
                N       = G, A, C or T  (aNy)

GAT[TA]AG means "GATAAG or GATTAG" (equivalent to GATWAG).
CGGN{11}CCG means CGG followed by 10 N followed by CCG.
GATAAGN{0,30}GATAAG means two GATAAG spaced by 0 to 30 nucleotides.

Input sequence

Input sequence format:

raw: the raw sequence without any identifier or comment.
multi: several raw sequences concatenated.
IG: IntelliGenetics format.
FastA: the sequence format used by FastA, BLAST, Gibbs sampler and a lot of other bioinformatic programs.
Wconsensus: the format defined by Jerry Hertz for his programs (patser, consensus, wconsensus).

Search strands

Substitutions

Return

Flanking

Origin