(51d) Designing and Synthesizing Novel Dye Molecules Using Generative Modeling and Data-Driven Synthesis Planning | AIChE

(51d) Designing and Synthesizing Novel Dye Molecules Using Generative Modeling and Data-Driven Synthesis Planning

Authors 

Koscher, B., Massachusetts Institute of Technology
Greenman, K. P., Massachusetts Institute of Technology
Gomez-Bombarelli, R., Massachusetts Institute of Technology
Jensen, K. F., Massachusetts Institute of Technology
Development of new products often relies on the discovery of novel molecules. While traditional molecular design involves using expert chemical knowledge to propose, synthesize, and test new molecules, this process can be cost and time expensive, limiting the number of molecules that can be reasonably tested. Recently, we have developed a generative modeling framework that is capable of producing novel molecules that are optimized with respect to multiple objectives or constraints. In this work, we combined this method with our previously developed, open-source synthesis-planning tool, ASKCOS, to generate novel dye molecules and their synthetic routes. We chose to focus on dye design because dyes have rich chemical diversity and can be used for a range of important applications included use in LED displays, functional coatings, or even biological labels for fundamental biological studies. We focused on the problem of designing a set of novel dyes that have an absorbance maximum at a range of target wavelengths. First, the generative modeling framework was used to generate a large number of dye molecules, with focus on producing dyes that absorbed at higher wavelengths (for which there are fewer examples in the literature). Then, top candidates for each target wavelength were screened for their synthetic accessibility (based on a synthetic accessibility score) and their novelty (based on comparisons with general and dye-specific datasets). ASKCOS was used to design synthetic routes for all generated molecules that were 1) predicted to absorb light at or near a target wavelength, 2) found to be synthetically accessible, and 3) found to be novel. Finally, a high throughput synthesis and testing platform was used to synthesize and verify that each dye molecule absorbed maximally near its target wavelength. This work illustrates how generative models and synthesis planning tools can be used together to discover new molecules and represents an important step toward achieving closed loop molecular discovery.