retrieve-variation-seq
2.0
Given a set of set of variants in varBed format (see convert-variations), list of dbSNP IDs or genomic coordinates in bed format, retrieve the corresponding variants and their flanking sequences, in order to scan them with the tool variation-scan.
A genomic coordinate file in bed format. The program only takes into account the 3 first columns of the bed file, which specify the genomic coordinates.
Note (from Jacques van Helden): the UCSC genome browser adopts a somewhat inconsistent convention for start and end coordinates: the start position is zero-based (first nucleotide of a chromosome/scaffold has coordinate 0), but the end position is considered not included in the selection. This is equivalent to have a zero-based coordinate for the start, and a 1-base coordinate for the end.
chr1 3473041 3473370
chr1 4380371 4380650
chr1 4845581 4845781
chr1 4845801 4846260
The definition of the BED format is provided on the UCSC Genome Browser web site (http://genome.ucsc.edu/FAQ/FAQformat#format1).
This program only takes into account the 3 first columns, which specify the genomic coordinates.
The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671).
The starting position of the feature in the chromosome or scaffold. For RSAT programs, the first base in a chromosome is numbered 1 (this differs from the UCSC-specific zero-based notation for the start).
Note from Jacques van Helden: the UCSC genome browser adopts a somewhat inconsistent convention for start and end coordinates: the start position is zero-based (first nucleotide of a chromosome/scaffold has coordinate 0), and the end position is considered not included in the selection. This is equivalent to have a zero-based coordinate for the start, and a 1-base coordinate for the end. We find this representation completely counter-intuitive, and we herefore decided to adopt a "normal" convention, where:
The ending position of the feature in the chromosome or scaffold.
A variation file in varBed format, see convert-variations.
A list of dbSNP IDs.
A tab delimited file with the following column content.
The name of the chromosome (e.g. 1, X, 8...)
The starting position of the feature in the chromosome
The ending position of the feature in the chromosome
The strand of the feature in the chromosome
ID of the variation
SO Term of the the variation
Allele of the variation in the reference sequence
Allele of the variation in the sequence
Allele frequency
Sequence of the current variant, flanked by a user-specified neighbouring region
Name of the genome organism where the flanking sequences will be retrieved.
Set of variants in varBed format (see convert-variations), list of dbSNP IDs or genomic coordinates in bed format that will be used to retrieve the
corresponding variants and their flanking sequences.
The data can be provided either as text or as a file.
Format of the current input data (varBed, bed or id).
Length of the flanking sequences that will be retrieved around the variants.