(462e) Creating an in silico Drug Discovery Pipeline for Faster Drug Discovery
AIChE Annual Meeting
2015
2015 AIChE Annual Meeting Proceedings
Computational Molecular Science and Engineering Forum
Data Mining and Machine Learning in Molecular Sciences I
Wednesday, November 11, 2015 - 10:00am to 10:15am
Drugs candidates are extremely difficult to find, making up a very small percentage of known compounds. To find the rare candidates, extensive compound libraries are screened against a single target, searching for the few active compounds in the library. To increase search efficiency, researchers use cheminformatic models to conduct virtual high-throughput screens and filter out unpromising compounds. By filtering out unpromising compounds, many potential dead ends are removed from the actual screen and compound library only contains those compounds showing promise. The only weakness was the lack of experimental data available for cheminformatic model development.
With modern technological advances, more experimental data is produced and with increased connectivity, that experimental data becomes increasingly accessible in databases like NCBI’s Bioassay database and EMBL-EBI’s CHEMBL database. Available data also includes compound databases like NCBI’s PubChem database and ZINC’s compound database. People can take advantage of the amount of data available in those databases to create and train virtual high-throughput screening models against any target of interest available in those databases.
To discover drug candidates faster, we created a drug discovery pipeline based on the ideas and available databases mentioned above. The pipeline uses a fragmental based descriptor, called Signature, which has been used in several studies during the past two decades, including on structural elucidation and property-based molecular design of industrial reactants and products1-4, of ICAM-15 inhibitors, and of Factor XIa6,7 inhibitors. Signature is a low-degeneracy technique, capable of capturing and reproducing most of a compound’s chemical and structural information.8,9. The Signature system decomposed compounds into atomic Signatures fragments. Incorporating the genetic algorithm (GA) with support vector machines (SVMs), we stochastically create many different models, testing different fragment combinations. Ranked by predictive power and accuracy, only the best ones are kept. Compound databases were then broken down into molecular Signature fragments and fed into the models to find potential drug candidates.
To verify our drug discovery pipeline, we examined the Cathepsin-L assay (PubChem Bioassay AID 825). Cathepsin-L is a receptor implicated in several disease pathways, including ebola. We used data from the assay to train our models and ran PubChem’s Compound database through them. Based on confidence and predicted activity, we found 35 potential candidates, 15 of which were commercially available. The compounds were tested using the protocol from AID 825 for efficacy and to validate our pipeline.
1 Chemmangattuvalappil, N. G. & Eden, M. R. A Novel Methodology for Property-Based Molecular Design Using Multiple Topological Indices. Ind Eng Chem Res 52, 7090-7103, doi:10.1021/ie302516v (2013).
2 Chemmangattuvalappil, N. G., Solvason, C. C., Bommareddy, S. & Eden, M. R. Reverse problem formulation approach to molecular design using property operators based on signature descriptors. Comput Chem Eng 34, 2062-2071, doi:DOI 10.1016/j.compchemeng.2010.07.009 (2010).
3 Dev, V. A., Chemmangattuvalappil, N. G. & Eden, M. R. Structure Generation of Candidate Reactants Using Signature Descriptors. Comput-Aided Chem En 33, 151-156 (2014).
4 Dev, V. A., Namikis, R., Chemmangattualappi, N. G. & Eden, M. R. Molecular Synthesis of Candidate Reactant Structures Using Signature Descriptors. Proceedings of the 6th International Conference On Process Systems Engineering (PSE ASIA), 25-27 (2013).
5 Churchwell, C. J. et al. The signature molecular descriptor. 3. Inverse-quantitative structure-activity relationship of ICAM-1 inhibitory peptides. J Mol Graph Model 22, 263-273, doi:10.1016/j.jmgm.2003.10.002 (2004).
6 Li, H., Visco, D. P. & Leipzig, N. D. Confirmation of predicted activity for factor XIa inhibitors from a virtual screening approach. AIChE Journal 60, 2741-2746, doi:10.1002/aic.14508 (2014).
7 Weis, D. C., Visco, D. P., Jr. & Faulon, J. L. Data mining PubChem using a support vector machine with the Signature molecular descriptor: classification of factor XIa inhibitors. J Mol Graph Model 27, 466-475, doi:10.1016/j.jmgm.2008.08.004 (2008).
8 Faulon, J. L., Visco, D. P., Jr. & Pophale, R. S. The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J Chem Inf Comput Sci 43, 707-720, doi:10.1021/ci020345w (2003).
9 Faulon, J. L., Churchwell, C. J. & Visco, D. P., Jr. The signature molecular descriptor. 2. Enumerating molecules from their extended valence sequences. J Chem Inf Comput Sci 43, 721-734, doi:10.1021/ci020346o (2003).