CONTACT

NAME

variation-scan

VERSION

2.0

DESCRIPTION

Scan variant sequences with position specific scoring matrices (PSSM) and report variations that affect the binding score, in order to predict regulatory variants.

AUTHORS

Walter Santana-Garcia
Jacques van Helden
Alejandra Medina-Rivera

INPUT DATA

Sequence file

variation-scan takes as input a .varseq file in the format produced by retrieve-variation-seq. For details about this format and how to create a .varseq file, see retrieve-variation-seq output format.

Matrix file

A list of matrices in a valid format. Supported formats: alignance, assembly, cis-bp, clustal, cluster-buster, consensus, encode, feature, footprintDB, gibbs, homer, infogibbs, jaspar, meme, meme_block,
motifsampler, mscan, sequences, stamp, stamp-transfac, tab, transfac and uniprobe.

Background file

A background model file in oligo-analysis format as produced by oligo-analysis. For details about this format and how to create a oligo-analysis file, see oligo-analysis output format.

OUTPUT FORMAT

A tab delimited file with the following column content.

1. matrix_ac: Name of the matrix (generally the transcription factor name)
2. var_id: ID of the variation
3. var_class: Variation type, according to Sequence Ontology (SO) nomenclature
4. var_coord: Coordinates of the variation
5. best_w: Best weight for the putative site
6. worst_w: Worst weight for the putative site
7. w_diff: Difference between best and worst weight
8. best_pval: P_value of the best putative site
9. worst_pval: P_value of the worst putative site
10. pval_ratio: Ratio between worst and best pval ( pval_ratio = worst_pval/best_pval )
11. best_allele: Allele in the best putative site
12. worst_allele: Allele in the worst putative site
13. best_offest: Offset of the best putative site
14. worst_offset: Offset of the worst putative site
15. min_offset_diff: Minimal offset difference between best and worst putative site
16. best_strand: Strand of the best putative site
17. worst_strand: Strand of the worst putative site
18. str_change: Indicate if strand have change between the offset of min_offset_diff
19. best_seq: Sequence of the worst putative site
20. worst_seq: Sequence of the worst putative site
21. minor_alle_freq: Minor allele frequency

OPTIONS

Matrix collections

A matrix or a list of matrices in a valid format. Supported formats: alignance, assembly, cis-bp, clustal, cluster-buster, consensus, encode, feature, footprintDB, gibbs, homer, infogibbs, jaspar, meme,
meme_block, motifsampler, mscan, sequences, stamp, stamp-transfac, tab, transfac and uniprobe.

Input personal collection of motifs: Input own motif collections as text or upload motifs as file.
Select a collection in list: Input a motif from a collection in list.
Select one motif collection: Input an available motif collection from list.

Input variation sequences

Background

Background model estimation

Organism-specific

Estimate background model with genome sequences from a specific organism.

Markov order: Markov order of the background model estimation.
Organism: Name of the organism where the background model estimation will be made.
Sequence type: Sequence type of the selected organism genome that will be used to estimate the background model.

Input your own background file

Input a previously estimated background model file.

Format: Specify the background model file format.
File Upload: Upload the background model file.
URL of file available on a Web server: Input the background model file from a URL.

Scanning parameters

Length of sequence around the variant

Length of the longest matrix in the input list used to scan the current variant sequences. This value has to be consistent with the value used for retrieving the variant sequences
and by no means the longest matrix at the input list must be greater than this value. For details of this option, see Length of flanking sequence on each side of the variant
option at retrieve-variation-seq.

Thresholds

Only output hits with values passing the thresholds.

Weight of predicted sites: This value will be used as a lower threshold and all scanned variation records with best weight (best_w) greater than this value will be reported.
Weight difference between variants: This value will be used as a lower threshold and all scanned variation records with weight difference (w_diff) between alleles greater than this value will be reported.
P-value of predicted sites: This value will be used as a upper threshold and all scanned variation records with best p-value smaller than this value will be reported.
P-value ratio between variants: This value will be used as a lower threshold and all scanned variation records with p-value ratio (pval_ratio) between alleles greater than this value will be reported.

CONTACT

For further inquiries, please contact Jacques van Helden (Jacques.van-Helden@univ-amu.fr) or Ask a question to the RSAT team !

RSAT - variation-scan manual