An iterative algorithm for metabolic network-based drug target identification

Double Iterative Optimization for Metabolic Network-Based Drug Target Identification

Drug discovery aims finding molecules that manipulate enzymes in order to increase or decrease the production of desired compounds while incurring minimum side-effects. An important part of this problem is the identification of the target enzymes, i.e., the enzymes that will be inhibited by the drug molecules. Finding the right set of target enzymes is essential for developing a successful drug. The relationship between enzymes and compounds through reactions is defined using metabolic networks. Finding the best set of target enzymes requires a careful analysis of the underlying metabolic network.
This paper presents the problem of finding the set of enzymes, whose inhibition stops the production of a given set of target compounds, while eliminating minimal number of non-target compounds. Here, target compounds are the compounds whose presence cause the underlying disorder. The non-target compounds are all the remaining compounds. We call this problem Target Identification by Enzymes (TIE). An exhaustive evaluation of all possible enzyme combinations in the metabolic network to find the optimal solution is computationally infeasible for very large metabolic networks. We developed a scalable iterative method which computes a sub-optimal solution within reasonable time-bounds. The method consists of two phases: Iteration and Fusion Phases. The Iteration Phase is based on the intuition that a good solution can be found by tracing backward from the target compounds. It initially evaluates the immediate precursors of the target compounds and iteratively moves backwards to identify the enzymes whose inhibition incurs less side-effects. This phase converges to a sub-optimal solution after a small number of iterations. The Fusion Phase takes the union of a set of sub-optimal results found at the Iteration Phase. Each set, here, is a potential solution. It then increases this set by inserting a small subset of the remaining enzymes randomly. The size of the final set is bounded by the time allowed for the exhaustive search. The Fusion Phase exhaustively searches the final set to find the optimal subset of enzymes from this set. It, then, recursively creates a new set by inserting random enzymes to the optimal solution found so far and exhaustively searches this set again until a predefined number of iterations are performed.
The experiments on the E. coli metabolic network show that the average accuracy of the Iteration Phase alone deviates from that of the exhaustive search only by 0.02 %. The Iteration Phase is highly scalable. It can solve the problem for the entire metabolic network of \emph{Escherichia coli} in less than 10 seconds. The Fusion Phase improves the accuracy of the Iteration Phase by 19.3 %.