The metabolic pathfinder enumerates metabolic pathways between a set of start nodes
and a set of end nodes, where start and end nodes may be compounds, reactions or enzymes (which are mapped to the reactions
they catalyze). When choosing the right parameters (which are set by default), the metabolic pathways found are with high
probability biochemically relevant.
The accuracy of path finding in metabolic networks (as in other biological networks) is diminished by the presence of
hub nodes (highly connected compounds such as ATP, NADPH or CO2) in the network.
Path finding algorithms will traverse the network preferentially via the hub nodes,
thereby inferring biochemically irrelevant pathways.
Different strategies have been devised to overcome this problem. Arita introduced the mapping and tracing of atoms
from substrates to products [1]. This strategy is also applied in the Pathway Hunter Tool available at http://pht.tu-bs.de/PHT/.
Other tools rely on rules to avoid hub nodes, e.g. the pathway prediction system at UMBBD (http://umbbd.msi.umn.edu/predict/).
Didier Croes et al. used weighted graphs to avoid highly connected nodes [5],[6].
The functionality of Didier Croes' tool is covered by the metabolic pathfinder (with the weighted reaction network).
Metabolic pathfinder relies on a mixed strategy: On the one hand, it makes use of weighted graphs to avoid irrelevant
hub nodes and on the other hand, it integrates KEGG RPAIR annotation [18] to favor for each traversed
reaction main over side compounds.
KEGG RPAIR is a database that divides reactions into reactant pairs (substrate-product pairs) and classifies the reactant
pairs according to their role in the reaction. For instance, the cofac reactant pair A00001 couples NADP+ with NADPH.
Main reactant pairs connect main compounds and should be traversed preferentially by path finding algorithms.
The KEGG RPAIR annotation is integrated by construction of the undirected RPAIR network, which consists of 7,058 reactant pairs,
4,297 compounds and 14,116 edges for KEGG version 41.0. Alternatively, two other networks are available: the directed reaction network evaluated in
[6] and an undirected reaction-specific RPAIR network, in which each reaction is divided in its reactant pairs.
Note that in more recent KEGG versions, identifiers of reactant pairs start with RP instead of A.
In this chapter, we will recover the aldosterone pathway using the RPAIR and the reaction network respectively. Note that the study case was carried out with data from KEGG LIGAND version 41.0. Results might differ for more recent KEGG versions.
Aldosterone is a human steroid hormone involved in the regulation of ion uptake in the kidney and of blood pressure. It is synthesized from progesterone. We aim to recover the aldosterone biosynthesis pathway by providing its start and end reaction.
In the right panel, you should now see a form entitled ``Metabolic pathfinder''.
The metabolic pathfinder form is now filled with the start and end reaction of the aldosterone biosynthesis pathway. In addition, information on this pathway is displayed.
This table lists for each reaction the reactant pair identifier(s) associated to it. Note that reaction
R02724 is associated to two reactant pairs.
The seed node selection form allows you to select the correct among all compounds matching your query string in case you provided a partial compound name. If you give KEGG compound identifiers, it displays the name of each compound. For EC numbers, it lists associated reactions or reactant pairs. The seed node selection form also warns you in case you provide problematic identifiers.
The computation should take no more than one minute.
Then, a table is displayed, which lists the found paths in the order of their weight.
The table may be sorted according to other criteria by clicking the respective column header.
Each path node is linked to its corresponding KEGG entry for easy inspection of results.
If you set Output format in the metabolic pathfinder form to ``Graph'', you obtain an image of the inferred pathway generated by the program dot of the graphviz tool suite and a link to the pathway in the selected graph format.
To see how results change with the choice of the graph, you can repeat steps 1 and 2. In the metabolic path finding form, select Reaction graph instead of RPAIR graph (which is selected by default) and follow step 3 to 5. You will notice in the seed node selection form that the reaction identifiers are no longer mapped to reactant pairs.
This section assumes that you have installed the RSAT/NeAT command line tools.
The metabolic pathfinder is a web application on top of Pathfinder. You may run metabolic path finding on command line by launching the Pathfinder command line tool on the RPAIR and reaction networks, which are provided in the KEGG graph repository reachable from the metabolic pathfinder manual page.
Type the following command in one line to find paths in the RPAIR network:
java -Xmx800m graphtools.algorithms.Pathfinder -g RPAIRGraph_allRPAIRs_undirected.txt -f flat -s 'A02437' -t 'A02894' -b -y rpairs
To repeat path finding in the reaction network, type in one line:
java -Xmx800m graphtools.algorithms.Pathfinder -g ReactionGraph_directed.txt -d -f flat -s 'R02724>/R02724<' -t 'R03263>/R03263<' -b -y con
The path of first rank does not reproduce exactly the annotated pathway. Instead, it suggests a deviation via 21-hydroxypregnelonone, bypassing progesterone. This path might be a valid alternative, as it appears on the KEGG map for C21-Steroid hormone metabolism in human. One of the two second-ranked paths corresponds to the annotated pathway.
First ranked path:
A02437 (1.14.15.6) Pregnenolone A03407 (1.14.99.10) 21-Hydroxypregnenolone A00731 (1.1.1.145, 5.3.3.1) 11-Deoxycorticosterone A03469 (1.14.15.4) Corticosterone A02893 (1.14.15.5) 18-Hydroxycorticosterone A02894
Second ranked paths:
A02437 (1.14.15.6) Pregnenolone A00386 (1.1.1.145, 5.3.3.1) Progesterone A02045 (1.14.99.10) 11-Deoxycorticosterone A03469 (1.14.15.4) Corticosterone A02893 (1.14.15.5) 18-Hydroxycorticosterone A02894
A02437 (1.14.15.6) Pregnenolone A00386 (1.1.1.145, 5.3.3.1) Progesterone A02047 (1.14.15.4) 11beta-Hydroxyprogesterone A03467 (1.14.99.10) Corticosterone A02893 (1.14.15.5) 18-Hydroxycorticosterone A02894
The paths of first and second rank traverse a side compound, namely adrenal ferredoxin. None of these paths is therefore biochemically valid. In the weighted reaction graph all highly connected side compounds such as ATP and water are avoided. However, adrenal ferredoxin is a rare side compound, thus weighting is not sufficient to bypass it.
First ranked path:
R02724 Reduced adrenal ferredoxin R03262
18-Hydroxycorticosterone R03263
Second ranked paths:
R02724 Oxidized adrenal ferredoxin R02726
Reduced adrenal ferredoxin R03262
18-Hydroxycorticosterone R03263
R02724 Oxidized adrenal ferredoxin R02725
Reduced adrenal ferredoxin R03262
18-Hydroxycorticosterone R03263
Metabolic path finder provides k shortest path finding in metabolic networks constructed from KEGG LIGAND and KEGG RPAIR. The metabolic path finder is coupled with a mirror of the KEGG database to allow quick identification of partial compound names and to annotate results.
By default, the optimal parameter values are set. However, if you set your own values, they might not be in the supported value range. Check the Metabolic path finder manual.
This occurs when you provide identifiers that do not match any KEGG identifier, EC number or KEGG compound name. Check your identifiers or in case you provided a compound name, check whether the compound is present in KEGG.
As stated in the Weaknesses section, the RPAIR network does not contain all KEGG compounds due to incomplete coverage of the RPAIR database. Try to search paths for this compound in the reaction network.
This may happen in the RPAIR network because in this network reactant pairs belonging to the same reaction exclude each other. Try the reaction-specific RPAIR network or the reaction network instead.
This may occur when requesting a large number of paths with the reactant subreaction and compound weighting schemes set to unweighted. In general, when setting the weighting schemes to unweighted, biochemically irrelevant paths will be returned. Use another weighting scheme or reduce the number of requested paths to avoid this error.