Hhugin: Hypergraph heuristic for general shortest source-sink hyperpaths
Overview
Signaling pathways and metabolic networks are cornerstones of molecular and cellular biology. Hypergraphs and hyperpaths provide a precise model for these networks and pathways but have been underutilized, due to their theoretical intractability and the lack of practical algorithms. The best previously-available method to find a shortest hyperpath, which is NP-complete, can only find acyclic hyperpaths, even though cycles are abundant in real signaling pathways.Methods
We give here source code for several new algorithms that find pathways in metabolic and signaling networks. A new heuristic of finding shortest hyperpaths which for the first time may contain cycles. A new hyperpath enumeration algorithm, which exhaustively explores all possible s,t-hyperpaths. We also provide the datasets and the hypergraphs constructed from them, for all datasets referenced in our papers.Hyperpath heuristic
Our fast heuristic often finds a provably optimal hyperpath on real signaling pathways. Based on Dijkstra's algorithm for shortest paths in ordinary graphs, our hyperpath heuristic finds a possibly-cyclic hyperpath in under a minute, even on large cell-signaling hypergraphs with nearly 10,000 hyperedges.
Hyperpath enumeration algorithm
Our hyperpath enumeration algorithm maintains a queue of hyperpath problem instances consisting of insets and outsets. An outset for a problem instance is a set of hyperedges that are not allowed to be in the solution. An inset is a set of hyperedges that cannot be placed in the outset of child problem instances spawned from the current instance. Together, these sets ensure that the solution to a problem instance is ever computed twice.
Hyperpath heuristic performance
On datasets from two common pathway databases, Reactome and NCI-PID, our hyperpath heuristic (in less than a minute of computation) finds a demonstrably optimal solution on 99.5% of all possible source, target instances. Furthermore, these experiments reveal the best prior methods, which were limited to acyclic hyperpaths, failed to find any solution for many instances where all source-target hyperpaths are in fact cyclic. For all such cyclic biological instances, our heuristic hyperpath is optimal.
Publications
The methods implemented in Hhugin are given in the following publications- Spencer Krieger and John Kececioglu, “Heuristic shortest hyperpaths in cell signaling hypergraphs,” Algorithms for Molecular Biology, vol. 17, no. 1, p. 12, 2022. link
- Spencer Krieger and John Kececioglu, “Fast approximate shortest hyperpaths for inferring cell signaling pathways”, Proceedings of the 21st International Workshop on Algorithms in Bioinformatics (WABI 2020), 20:1-20. link
Source code
The heuristic and hyperpath enumeration source code is available on GitHub.Videos
The following video was presented at WABI 2020 and gives more detailed information on the heuristic:
|
A shorter version was presented at SCS 2021:
|