(203j) Virtual High-Throughput Screening Pipeline: Size and Classification Distribution Effects on Experimentally Validated Hit-Rates
- Conference: AIChE Annual Meeting
- Year: 2017
- Proceeding: 2017 Annual Meeting
- Group: Pharmaceutical Discovery, Development and Manufacturing Forum
- Time: Monday, October 30, 2017 - 3:15pm-4:45pm
Previously, we developed a virtual HTS (vHTS) pipeline: using available experimental data, models predicting class and activity are trained and applied to compound databases to identify compounds likely to be active and to focus experimental efforts on (Chen and Visco, 2016), with the option of retraining the models to improve performance (hit-rate) or to identify more leads. The pipeline was applied several NCBIâs PubChem Bioassay datasets: AID 825 (target: Cathepsin L, 1st iteration hit-rate: 19%, 2nd iteration hit-rate: 75%) (Chen and Visco, 2016), AID 728 (target: Factor XIIa, 1st iteration hit-rate: 43%, 2nd iteration hit-rate: 100%) (Chen and Visco), and AID 846 (target: Factor Xia, 1st hit-rate: 27%, 2nditeration hit-rate: 62%) (Chen and Visco).
To determine the hit-rate enrichment ability of the pipeline, we have applied our pipeline on more datasets, specifically targeted to examine how the pipeline responds to datasets of different sizes and classification distributions. Determining the effect of these two dataset parameters will indicate what datasets the pipeline has the most enrichment value on and/or what the expected enrichment ability is for a given dataset.
In this poster, we present work characterizing the effects of size and classification distribution. While controlling for one parameter (size or classification distribution), the other is varied. Experimental validation is conducted based on the protocol specified by the original dataset to determine vHTS hit-rates. The hit-rates determined using the same experimental protocol will allow for direct comparison of hit-rates and to identify the hit-rate enrichment ability of the pipeline for a given dataset. Based on the results, there will be a clearer idea what datasets will have the most impact on and what an expected hit-rate for a given dataset would be. We also aim to show, indirectly, the pipelineâs hit-rate enrichment ability is repeatable and the pipeline is robust enough to handle many different kinds of datasets.
Chen J.J.F., Visco D.P. Jr., Developing an in silico pipeline for faster drug candidate discovery: Virtual high throughput screening with the Signature molecular descriptor using support vector machine models, Chemical Engineering Science, 2 March 2016. http://dx.doi.org/10.1016/j.ces.2016.02.037.
Dobson, C.M., 2004. Chemical space and biology. Nature 432, 824-828.
Chen, J.J.F, Visco, D.P. Jr. Identifying Novel Factor XIIa Inhibitors With PCA-GA-SVM Developed vHTS Models. Manuscript in preparation.
Chen, J.J.F, Visco, D.P. Jr. Identifying Novel Factor XIa Inhibitors With PCA-GA-SVM Developed vHTS Models. Manuscript in preparation.