RSAT - variation-info manual

NAME

variation-info

VERSION

2.0

DESCRIPTION

Taking as input variation IDs (rs numbers) or regions in a given genome, variation-info retrieves variant information associated to the queries in varBed format.

AUTHORS

Walter Santana-Garcia
Jacques van Helden
Alejandra Medina-Rivera

CATEGORY

Genetic variations

INPUT DATA

Genomic coordinate file

A genomic coordinate file in bed format can be used as input to retrieve overlapping variants from ensembl databases. The tool only takes into account the 3 first columns of the bed file, which specify the genomic coordinates.

Note (from Jacques van Helden): the UCSC genome browser adopts a somewhat inconsistent convention for start and end coordinates: the start position is zero-based (first nucleotide of a chromosome/scaffold has coordinate 0), but the end position is considered not included in the selection. This is equivalent to have a zero-based coordinate for the start, and a 1-base coordinate for the end.

Example of bed file
 chr1   3473041 3473370
 chr1   4380371 4380650
 chr1   4845581 4845781
 chr1   4845801 4846260

The definition of the BED format is provided on the UCSC Genome Browser web site (http://genome.ucsc.edu/FAQ/FAQformat#format1).

This tool only takes into account the 3 first columns, which specify the genomic coordinates.

1. chrom

The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671).

2. chromStart

The starting position of the feature in the chromosome or scaffold. For RSAT programs, the first base in a chromosome is numbered 1 (this differs from the UCSC-specific zero-based notation for the start).

Note from Jacques van Helden: the UCSC genome browser adopts a somewhat inconsistent convention for start and end coordinates: the start position is zero-based (first nucleotide of a chromosome/scaffold has coordinate 0), and the end position is considered not included in the selection. This is equivalent to have a zero-based coordinate for the start, and a 1-base coordinate for the end. We find this representation completely counter-intuitive, and we herefore decided to adopt a "normal" convention, where:

start and end position represent the first and last positions included in the region of interest.
start and end positions are provided in one-based notation (first base of a chromosome or contig has coordinate 1).
3. chromEnd

The ending position of the feature in the chromosome or scaffold.

Variation ID list

An ID (dbSNP) list of variation in one column.

OUTPUT FORMAT

varBed format is a tab delimited file that facilitates access to relevant variant information. The file includes the following columns:

1) chr

Chromosome number (without "chr").

2) start

Start position of the variation.

3) end

End position of the variation.

4) strand

Strand were the variation is found.

5) id

Variant ID, rs number.

6) ref

Reference allele.

7) alt

Alternative allele(s).

8) so_term

Sequence Ontology term of the variant.

9) validate

Validation status of the variant: 1 if it had evidence, 0 otherwhise.

10) minor_allele_freq

Frequency of the alternative allele.

OPTIONS

Organism

Name of the organism where the queries will be searched.

Variations IDs or genomic regions of interest

Input data, provided either as a list of variants IDs (dbSNPs) or as set of genomic coordinates (BED file).

Input format

Format of the supplied queries, supported: bed or id.

CONTACT

For further inquiries, please contact Jacques van Helden (Jacques.van-Helden@univ-amu.fr) or Ask a question to the RSAT team!