Experimental results

Overview

We evaluated the heuristic on datasets from two common pathway databases, Reactome and NCI-PID. These datasets are large: NCI-PID has 9000 different proteins, complexes, and other entities, and 8500 reactions; Reactome has 20,000 molecules and 12,000 reactions. These respectively map to the vertices and hyperedges in our hypergraphs, resulting in a hypergraph built from NCI-PID with 9000 vertices and 8500 hyperedges and a hypergraph built from Reactome with 20,000 vertices and 12,000 hyperedges. Any vertex without in-edges is treated as a source, and any vertex without out-edges is treated as a target. An instance of the problem is to find the shortest hyperpath from all of the sources to a single target, resulting in 2600 NCI-PID instances and 5000 Reactome instances.

Accuracy

Our hyperpath heuristic finds a demonstrably optimal solution on 99.5% of all instances. Furthermore, these experiments reveal that the best prior methods, which were limited to acyclic hyperpaths, fail to find any solution for many instances where all source-target hyperpaths are cyclic. For all such cyclic biological instances, our heuristic hyperpath is optimal, as shown by enumerating all hyperpaths and comparing to the shortest.

Running time

The hyperpath heuristic runs in under a minute, averaged over all instances.


Source code

The heuristic and hyperpath enumeration source code is available on GitHub.