retrieve-seq-bed
$program_version
Retrieve sequences for a set of genomic coordinates provided in bed, gff or vcf format.
This script is a wrapper around bedtools getfasta, an efficient tool to retrieve sequences from a FASTA-formatted sequence file (e.g. all genome sequences) and a file of coordinates defined on the sequences of the FASTA file. Note that in BED coordinates are zero-based
The wrapper generates the bedtools getfasta command in order to retrieve genomic coordinates from one of the locally supported genomes.
Bruno Contreras Moreira <bcontreras\@eead.csic.es> Jacques.van-Helden\@univ-amu.fr
retrieve-seq-bed -org organism_name -i inputfile [-o outputfile] [-v #] [...]
The genomic coordinate file will be used as input by bedtools getfasta, and must be compliant with the supported formats: BED/GFF/VCF.
A sequence file in fasta format (produced by bedtools getfasta.
Send the request to a remote RSAT server via the Web services. This option enables to get fasta sequences from any RSAT server without having to install them locally.
Extend the peaks by a given length on the upstream (-exetend_up), downstream (-extend_down) or both sides (-extend). The side is adapted according to the strand.
Flank extension is done via bedtools flank.
The extended coordinates are exported with the same name as the output file, supplemented with the suffix _flanks.bed.
Level of verbosity (detail in the warning messages during execution)
Display full help message
Same as -h
Genomic coordinates, in one of the formats supported by bedtools getfasta: BED, GFF, VCF.
Output file (in fasta format), where the sequences will be saved. This argument is mandatory, since it is required by bedtools getfasta.
Organism name, which must correspond to one organism supported on the local RSAT instance.
Use repeat-masked version of the genome.