RSAT - Help about contingency-stats
This programs takes as input a contingency table, and calculates various
matching statistics between the rows and columns. The description of
these statistics can be found in Brohee and van Helden, 2006.
INPUT FORMAT
A contingency table is a N*M table used to compare the contents of two
classifications. Rows represent the clusters of the first classification
(considered as reference), and columns the clusters of the second
classification (query).
Contingency tables can be generated with the program contingency-table,
or with compare-classes (option -matrix QR).
OUTPUT FORMAT
A tab-delimited text file with one row per statistics.
STATISTICS
Sn
Sensitivity. This parameter indicates the fraction of each reference
cluster (row) covered by its best matching query cluster (column).
Sensitivity is calculated at the level of each cell (cell-wise Sn),
of each row (row-wise Sn) and of the whole contingency table
(table-wise Sn).
Cell-wise sensitivity
Sn_{i,j} = X_{i,j}/SUM_j(X_{i,j})
row-wise sensitivity
Sn_{i.} = MAX_j(Sn_{i,j})
The row-wise sensitivity of a row is the maximal value of
sensitivity for all the cells of this row.
table-wise sensitivity
Sn = SUM_i(Sn_{i.})/M
The table-wise sensitivity is the average of the row-wise
sensitivity over all the rows of the contingency table.
PPV
Positive Predictive Value. This parameter indicates the fraction of
each query cluster (column) covered by its best matching reference
cluster (row).
PPV is calculated at the level of each cell (cell-wise PPV), of each
column (column-wise PPV) and of the whole contingency table
(table-wise PPV).
Cell-wise PPV
PPV_{i,j} = X_{i,j}/SUM_i(X_{i,j})
column-wise PPV
PPV_{.j} = MAX_i(PPV_{i,j})
The column-wise PPV of a column is the maximal value of PPV for
all the cells of this column.
table-wise PPV
PPV = SUM_j(PPV_{j.})/N
The table-wise PPV is the average of the column-wise PPV over
all the columns of the contingency table.
Acc.geom
Geometric accuracy. This reflects the tradeoff between sensitivity
and positive predictive value, by computing the geometric accuracy
between Sn and PPV.
Acc.geom = sqrt(Sn*PPV)
Sep
Separation.
The separation is defined, at the level of each cell (cell-wise
separation) as the product between Sn and PPV.
Cell-wise separation
sep_{i,j}=Sn_{i,j}*PPV_{i,j}
Column-wise separation
Column-wise separation is defined at the level of each column,
as the sum of separation value for all the cells of this column.
sep_{.j} = SUM_i( sep_{i,j})
Row-wise separation
Row-wise separation is defined at the level of each row, as the
sum of separation value for all the cells of this row.
sep_{i.} = SUM_j(sep_{i,j})
Table-wise separation
Three table-wise statistics are computed for separation.
average column-wise separation
sep_c = AVG_j(sep_{.j})
average row-wise separation
sep_r = AVG_i(sep_{i.})
table-wise separation
sep = sqrt(sep_r*sep_c)
REFERENCES
Brohee, S. & van Helden, J. (2006). Evaluation of clustering algorithms
for protein-protein interaction networks. BMC Bioinformatics 7, 488.
Return fields
stats
table-wise statistics
rowstats
row-wise statistics (one line per row of the contingency table)
colstats
column-wise statistics (one line per column of the contingency
table)
tables
full tables for each statistics (counts, Sn, PPV, separation).
margins
marginal statistics besides the tables (requires to return
tables).