(95f) Automating the Search for New Drugs: From Prediction to Characterization and Back Again | AIChE

(95f) Automating the Search for New Drugs: From Prediction to Characterization and Back Again

Authors 

Koscher, B., Massachusetts Institute of Technology
Lee, F., MIT
Jensen, K., Massachusetts Institute of Technology
Discovery of new drug molecules typically involves identifying a drug target and then screening combinatorial libraries of compounds to find hit molecules. Hits then undergo lead optimization to produce candidates for clinical trials. This process can be tedious, with substantial effort being invested in screening compounds that contribute little to understanding the drug property landscape. In addition to efficacy, medicines must be non-toxic (relatively) and bioavailable (water-soluble), attributes which are not often considered until lead optimization. We have previously demonstrated a molecular discovery platform that reduces the experimental burden of lead optimization for dye molecules (as a proxy for drugs) by iteratively proposing, synthesizing, testing, and analyzing new molecules using several machine learning and automation tools. Since, we have worked on expanding the platform to map organic chemistry more generally. The platform attempts to learn the property space, which is defined as drug-activity, toxicity, and water-octanol partitioning, by autonomously choosing candidates that are easily synthesized and that maximize the rate at which the drug–property landscape is uncovered. Once the summit of the property landscape can be surmised, the established automated lead optimization workflow can produce a lead molecule.

We chose histone deacetylase inhibitors (HDIs) as an accessible drug-activity due to their scarcity of literature examples and relevance to cancer and neurological diseases. There are several hurdles that must be overcome to realize efficient, automated, and targeted screening. Firstly, we need a model to provide a prediction of HDI activity, to climb towards the summit, and a measure of prediction uncertainty, to fill in uncharted parts of the map. For this task we use a Chemprop (a machine learning model architecture capable of predicting many different molecular properties) model pretrained on drug molecules and fine-tuned on known HDIs augmented with predicted binding affinities used as an additional input feature. Secondly, we require a way to test general chemical space so that we may interpolate between presently known inhibitors (which fall into five chemically distinct classes) and extrapolate to new inhibition modes. For this, class-clustering and molecular generation with restricted graph edit distances attempt to step through chemical space in a controlled fashion. Finally, we have the challenge of automating the synthesis and analysis of proposed molecules. ASKCOS serves as our retrosynthesis planner, with ASKCOS-predicted routes being executed on a physical platform consisting of an automated liquid handler, HPLC, and plate reader, with auxiliary reactors, storage, and materials handling. Altogether, the integrated platform attempts to learn multi-property landscapes by iteratively proposing, selecting, synthesizing, and analyzing potential HDIs.