variation-info
2.0
Taking as input variation IDs (rs numbers) or regions in a given genome, variation-info retrieves variant information associated to the queries in varBed format.
A genomic coordinate file in bed format can be used as input to retrieve overlapping variants from ensembl databases. The tool only takes into account the 3 first columns of the bed file, which specify the genomic coordinates.
Note (from Jacques van Helden): the UCSC genome browser adopts a somewhat inconsistent convention for start and end coordinates: the start position is zero-based (first nucleotide of a chromosome/scaffold has coordinate 0), but the end position is considered not included in the selection. This is equivalent to have a zero-based coordinate for the start, and a 1-base coordinate for the end.
Example of bed file chr1 3473041 3473370
chr1 4380371 4380650
chr1 4845581 4845781
chr1 4845801 4846260
The definition of the BED format is provided on the UCSC Genome Browser web site (http://genome.ucsc.edu/FAQ/FAQformat#format1).
This tool only takes into account the 3 first columns, which specify the genomic coordinates.
The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671).
The starting position of the feature in the chromosome or scaffold. For RSAT programs, the first base in a chromosome is numbered 1 (this differs from the UCSC-specific zero-based notation for the start).
Note from Jacques van Helden: the UCSC genome browser adopts a somewhat inconsistent convention for start and end coordinates: the start position is zero-based (first nucleotide of a chromosome/scaffold has coordinate 0), and the end position is considered not included in the selection. This is equivalent to have a zero-based coordinate for the start, and a 1-base coordinate for the end. We find this representation completely counter-intuitive, and we herefore decided to adopt a "normal" convention, where:
The ending position of the feature in the chromosome or scaffold.
An ID (dbSNP) list of variation in one column.
varBed format is a tab delimited file that facilitates access to relevant variant information. The file includes the following columns:
Chromosome number (without "chr").
Start position of the variation.
End position of the variation.
Strand were the variation is found.
Variant ID, rs number.
Reference allele.
Alternative allele(s).
Sequence Ontology term of the variant.
Validation status of the variant: 1 if it had evidence, 0 otherwhise.
Frequency of the alternative allele.
Name of the organism where the queries will be searched.
Input data, provided either as a list of variants IDs (dbSNPs) or as set of genomic coordinates (BED file).
Format of the supplied queries, supported: bed or id.