oligo-diff
$program_version
Compare frequencies of oligonucleotides between two input sequence files, and return oligos that are significantly enriched in one of the files respective to the other one.
Jacques.van-Helden\@univ-amu.fr
util
oligo-diff [-i inputfile] [-o outputfile] [-v #] [...]
The program takes as input a pair of sequence files in fasta format.
The output is a tab-delimted file with one row per oligonucleotide, and one column per statistics. The column content is detailed in the header of the output (for this, the verbosity needs to be at least 1).
The programs oligo-diffand oligo-analysis serve related purposes: discovering exceptional oligonucleotides. The difference is that oligo-analysis considers a single sequence file, and compares pobserved oligo-frequencies with those expected from a background model (Bernoulli or Markov). This background model is generally estimated from a set of background sequences.
In the situation where one wants to compare a small sequence file (e.g. 50 promoters of co-expressed genes) to a large one (e.g. the 6000 other promoters of the considered organism), oligo-diff should return more or less the same results as oligo-analysis with a background model based on the large file. Slight differences come from the use of the hypergeometric (oligo-diff) vesus binomial (oligo-analysis) statistics.
oligo-diff calls the program count-words to count oligonucleotide occurrences in the two input sequence files. The program count-words is part of the RSAT suite (it is written in C, and has to be compiled as explained in the RSAT installation guide).
Level of verbosity (detail in the warning messages during execution)
Display full help message
Same as -h
First sequence file.
Second sequence file.
Oligonucleotide length.
Count oligonucleotides on a single strand only.
Alternative option: -2str
Sum oligonucleotides on both strands.
More precisely, each pair of reverse complements is counted as a single motif (the count is performed on a single strand, but pairs of reverse complements are merged).
Alternative option: -1str
Do not accept overlap between successive occurrences of the same word. Only renewing occurrences are counted.
E.g.: TATATATATATA is counted as 2 occurrences of TATATA
Alternative option: -ovlp
Count all occurrences of self-overlapping words.
E.g.: TATATATATATA is counted as 4 occurrences of TATATA
Alternative option: -noov
If no output file is specified, the standard output is used. This allows to use the command within a pipe.
Lower threshold on some output field.
Supported fields for threshold: occ,sig
Upper threshold on some output field.