EnsEMBL::IO meeting - Tue 20 May 14:00 Copper Room -------------------------------------------------- - Talked to Anne about possible work. - Pointed me to Miguel, Compara might need some parsers. - Miguel suggested BLAST parser, developed independently by different groups, pointed to Harpreet for web parser, told me what/where Compara and genebuilder use. - Talked to Harpreet, checked out private sanger_plugins repository for NCBI BLAST parser used by web team. - Created feature/blast_parser branch. - Explored scope and contexts of existing BLAST parsers. Web --- Use NCBIBLAST module in private sanger-plugins repo. It simply parses the tab result file produced by the ncbi blast program "blast-formatter". Output: a sequence of lines, each line/record composed of at least 13 fields: 0: qid (qseqid) 1: qstart (query start) 2: qend 3: tid (sseqid) 4: tstart (sstart) 5: tend (send) 6: score 7: evalue 8: pident 9: len (length) 10: aln 11: qframe 12: tframe (sframe) Note: aln option doesn't seem to be a valid format specifier for blat-formatter of BLAST2.2.25. Ouptut format used to produce test files is: 7 qseqid qstart qend sseqid sstart send score evalue pident length qframe sframe Compara ------- Parser is Bio::EnsEMBL::Compara::RunnableDB::BlastAndParsePAF. Run ncbi_blastp and store the output into PeptideAlignFeature objects which are then stored in the compara database. blastp is run with -outfmt option (custom). Output: a sequence of lines/records, each record composed of 14/16 fields 7 qacc sacc evalue score nident pident qstart qend sstart send length positive ppos [qseq sseq] | --> code meaning: "tabular with comment lines" Genebuilders ------------ Use Bio::EnsEMBL::Analysis::Tools::BPlite, a port of bioperl 0.7 BPlite blast parser. The description says: "There are differences in the format of the various flavors of BLAST". BPlite parses BLASTN, BLASTP, BLASTX, TBLASTN, TBLASTX reports from both high-perf WU-BLAST and the more generic NCBI-BLAST. BPlite have accompanying modules in Bio::EnsEMBL::Analysis::Tools::BPlite::[HSP,Iteration,Sbjct] Iteration represents object for parsing a single iteration of a PSI-BLAST report. and related Bio::EnsEMBL::Analysis::Tools::FilterBPlite. BPlite inherits from Bio::Root::IO and Bio::SeqAnalysisParserI from Bioperl. A subject is a BLAST hit, which may have several alignments (HSP, high scoring pairs) associated with it.