compare-features
Compare two or more sets of features.
This program takes as input several feature files (two or more), and calculates the intersection, union and difference between features. It also computes contingency tables and comparison statistics.
util
compare-features -i inputfile_1 -i inputfile_2 [-i inputfile_3 ... ...] [-o outputfile] [-v]
The default input format is .ft (the same as for feature-map). Other formats are also supported ($supported_input_formats).
Each feature is represented by a single line, which should provide the following information:
Input file columns:
The standard input format assumes that these topics are provided in this order, separated by tabs. Start and end positions can be positive or negative.
Intersections between features (pairwise comparisons). For each intersection between two features, a feature of type "inter" is created.
The ID of an "inter" feature indicates the files to which the intersecting features belong. For example "f1.and.f3" means that the intersection feture was obtained from a feature of the first input file and a feature of the second input file.
Pairwise differences between files. For each pair of file, a feature of type "diff" is created.
The ID of the "diff" feature indicates the number of the files containing and not containing the feature, respezctively. For example, the ID "f1.not.f3" indictaes a feature found in file 1 and without any intersection with features oof file 3.
Calculate statistics about the intersections between features of each pair of input file.
The output depends on the return type(s), which can be specified with the option -return.
The intersection and differences are reported as features. Different output formats can be specified with the option -oformat (supported: $supported_output_formats).
Matching statistics are exported as tab-delimited tables. Each row is starting with a comment character ';', so that the statistics are ignored when the program is used as input by feature-map.
These comment characters can easily be removed if the result has to be used by other programs. Try for example:
perl -pe 's/^;//' outfile
Level of verbosity (detail in the warning messages during execution)
Display full help message
display options
This option can be used iteratively to specify several input files. It must be used at least 2 times, since the comparison requires at least two feature files.
Specify multiple input files. All the arguments following the option -files are considered as input files.
Specify a reference file. Only one reference file can be specified.
All the other input files (specified with -i or -files) are then compared to the reference file. When the option '-return stats' is combined with a reference fiile, some additional statistics are calculated (PPV, sensitivity, accuracy).
If no output file is specified, the standard output is used. This allows to use the command within a pipe.
In addition to the output, export a feature file containing the type of the feature, and chromosomal location of each features. This option is compatible with -return inter.
Input feature format (Supported: $supported_input_formats)
Output feature format (Supported: $supported_output_formats)
Also perform comparison between features in the same file (self-comparison). This can be useful to detect redundancy between annotated features.
Specify the output type(s).
Supported output types: stats,inter,diff
Specify the value of the lower threshold on some parameter.
Examples:
Supported parameters :
Length (in residues) of the intersection between two features.
Coverage of the intersection between two features. The coverage (inter_cov) is defined as
inter_cov = inter_len / inter_pair
where inter_len is the length of the intersection, pair_len is the total length covered by the pair of intersecting features.