RSAT - variation-scan manual

NAME

variation-scan

VERSION

2.0

DESCRIPTION

Scan variant sequences with position specific scoring matrices (PSSM) and report variations that affect the binding score, in order to predict regulatory variants.

AUTHORS

Walter Santana-Garcia
Jacques van Helden
Alejandra Medina-Rivera

CATEGORY

Genetic variations

INPUT DATA

Sequence file

variation-scan takes as input a .varseq file in the format produced by retrieve-variation-seq. For details about this format and how to create a .varseq file, see retrieve-variation-seq output format.

Matrix file

A list of matrices in a valid format. Supported formats: alignance, assembly, cis-bp, clustal, cluster-buster, consensus, encode, feature, footprintDB, gibbs, homer, infogibbs, jaspar, meme, meme_block,
motifsampler, mscan, sequences, stamp, stamp-transfac, tab, transfac and uniprobe.

Background file

A background model file in oligo-analysis format as produced by oligo-analysis. For details about this format and how to create a oligo-analysis file, see oligo-analysis output format.

OUTPUT FORMAT

A tab delimited file with the following column content.

1. matrix_ac

Name of the matrix (generally the transcription factor name)

2. var_id

ID of the variation

3. var_class

Variation type, according to Sequence Ontology (SO) nomenclature

4. var_coord

Coordinates of the variation

5. best_w

Best weight for the putative site

6. worst_w

Worst weight for the putative site

7. w_diff

Difference between best and worst weight

8. best_pval

P_value of the best putative site

9. worst_pval

P_value of the worst putative site

10. pval_ratio

Ratio between worst and best pval ( pval_ratio = worst_pval/best_pval )

11. best_allele

Allele in the best putative site

12. worst_allele

Allele in the worst putative site

13. best_offest

Offset of the best putative site

14. worst_offset

Offset of the worst putative site

15. min_offset_diff

Minimal offset difference between best and worst putative site

16. best_strand

Strand of the best putative site

17. worst_strand

Strand of the worst putative site

18. str_change

Indicate if strand have change between the offset of min_offset_diff

19. best_seq

Sequence of the worst putative site

20. worst_seq

Sequence of the worst putative site

21. minor_alle_freq

Minor allele frequency

OPTIONS

Matrix collections

A matrix or a list of matrices in a valid format. Supported formats: alignance, assembly, cis-bp, clustal, cluster-buster, consensus, encode, feature, footprintDB, gibbs, homer, infogibbs, jaspar, meme,
meme_block, motifsampler, mscan, sequences, stamp, stamp-transfac, tab, transfac and uniprobe.

Input personal collection of motifs

Input own motif collections as text or upload motifs as file.

Select a collection in list

Input a motif from a collection in list.

Select one motif collection

Input an available motif collection from list.

Input variation sequences

variation-scan takes as input a .varseq file in the format produced by retrieve-variation-seq. For details about this format and how to create a .varseq file, see retrieve-variation-seq output format.

Background

Background model estimation

Organism-specific

Estimate background model with genome sequences from a specific organism.

Markov order

Markov order of the background model estimation.

Organism

Name of the organism where the background model estimation will be made.

Sequence type

Sequence type of the selected organism genome that will be used to estimate the background model.

Input your own background file

Input a previously estimated background model file.

Format

Specify the background model file format.

File Upload

Upload the background model file.

URL of file available on a Web server

Input the background model file from a URL.

Scanning parameters
Length of sequence around the variant

Length of the longest matrix in the input list used to scan the current variant sequences. This value has to be consistent with the value used for retrieving the variant sequences
and by no means the longest matrix at the input list must be greater than this value. For details of this option, see Length of flanking sequence on each side of the variant
option at retrieve-variation-seq.

Thresholds

Only output hits with values passing the thresholds.

Weight of predicted sites

This value will be used as a lower threshold and all scanned variation records with best weight (best_w) greater than this value will be reported.

Weight difference between variants

This value will be used as a lower threshold and all scanned variation records with weight difference (w_diff) between alleles greater than this value will be reported.

P-value of predicted sites

This value will be used as a upper threshold and all scanned variation records with best p-value smaller than this value will be reported.

P-value ratio between variants

This value will be used as a lower threshold and all scanned variation records with p-value ratio (pval_ratio) between alleles greater than this value will be reported.

CONTACT

For further inquiries, please contact Jacques van Helden (Jacques.van-Helden@univ-amu.fr) or Ask a question to the RSAT team !