NAME

compare-features

DESCRIPTION

Compare two or more sets of features.

This program takes as input several feature files (two or more), and calculates the intersection, union and difference between features. It also computes contingency tables and comparison statistics.

AUTHORS

Jean Valery Turatsinze <jturatsi@ulb.ac.be>
Jacques van Helden <Jacques.van-Helden\@univ-amu.fr>

USAGE


compare-features -i inputfile_1 -i inputfile_2 [-i inputfile_3 ... ...] 
[-o outputfile] [-v]

INPUT FORMAT

The default input format is .ft (the same as for feature-map). Other formats are also supported ($supported_input_formats).

Feature format

Each feature is represented by a single line, which should provide the following information:

Input file columns:

map label (eg gene name)
feature type
feature identifier (ex: GATAbox, Abf1_site)
strand (D for Direct, R for Reverse),
feature start position
feature end position
(optional) description
(optional) score
The standard input format assumes that these topics are provided in this order, separated by tabs. Start and end positions can be positive or negative.

OUTPUT TYPES

inter

Intersections between features (pairwise comparisons). For each intersection between two features, a feature of type "inter" is created.

The ID of an "inter" feature indicates the files to which the intersecting features belong. For example "f1.and.f3" means that the intersection feture was obtained from a feature of the first input file and a feature of the second input file.

diff

Pairwise differences between files. For each pair of file, a feature of type "diff" is created.

The ID of the "diff" feature indicates the number of the files containing and not containing the feature, respezctively. For example, the ID "f1.not.f3" indictaes a feature found in file 1 and without any intersection with features oof file 3.

stats

Calculate statistics about the intersections between features of each pair of input file.

OUTPUT FORMAT

The output depends on the return type(s), which can be specified with the option -return.

inter,diff

The intersection and differences are reported as features. Different output formats can be specified with the option -oformat (supported: $supported_output_formats).

stats

Matching statistics are exported as tab-delimited tables. Each row is starting with a comment character ';', so that the statistics are ignored when the program is used as input by feature-map.

These comment characters can easily be removed if the result has to be used by other programs. Try for example:

perl -pe 's/^;//' outfile

OPTIONS

-v #: Level of verbosity (detail in the warning messages during execution)
-h: Display full help message
-help: display options
-i inputfile: This option can be used iteratively to specify several input files. It must be used at least 2 times, since the comparison requires at least two feature files.
-files inputfile_1 inputfile_2 ...: Specify multiple input files. All the arguments following the option -files are considered as input files.
-ref reference_file: Specify a reference file. Only one reference file can be specified.; All the other input files (specified with -i or -files) are then compared to the reference file. When the option '-return stats' is combined with a reference fiile, some additional statistics are calculated (PPV, sensitivity, accuracy).
-o outputfile: If no output file is specified, the standard output is used. This allows to use the command within a pipe.
-oft outputfeaturefile: In addition to the output, export a feature file containing the type of the feature, and chromosomal location of each features. This option is compatible with -return inter.
-iformat input_format: Input feature format (Supported: $supported_input_formats)
-oformat output_format: Output feature format (Supported: $supported_output_formats)
-self: Also perform comparison between features in the same file (self-comparison). This can be useful to detect redundancy between annotated features.
-return output1[,output2,...]: Specify the output type(s).; Supported output types: stats,inter,diff
-lth parameter value: Specify the value of the lower threshold on some parameter.; Examples: