(683b) Chemlg - a Smart and Massively Parallel Code to Accelerate the Molecular Library Generation | AIChE

(683b) Chemlg - a Smart and Massively Parallel Code to Accelerate the Molecular Library Generation


Afzal, M. A. F. - Presenter, University at Buffalo, SUNY
Hachmann, J., University at Buffalo, SUNY
The discovery of new compounds, materials, and chemical reactions with exceptional properties is the key to progress in chemistry. This process can be dramatically accelerated by means of the virtual high-throughput screening of large-scale candidate libraries. The key challenge is that chemical space is practically infinite, and any approach to survey it or enumerate certain of its domains has to address the problem of combinatorial complexity. A number of software packages are now available for the enumeration of compound space. However, the majority of these packages deal with small molecules, which are relevant as drug-like molecules. By far the largest small molecule database, GDB-17, contains 166.4 billion molecules, which are combinatorically generated with up to 17 atoms consisting of C, N, O, S, and halogens. These screening libraries have great utility in drug-related research, but larger molecules are required for various other applications. However, as mentioned before, an exhaustive list of large molecules would result in a combinatorial explosion of candidates, and would thus be impractical for screening.

Our work aims to extend and generalize library generation to identify molecular lead candidates and reaction networks in various other applications such as functional polymers, optoelectronics, and catalysis. Our massively parallel generator ChemLG is part of our ChemHTPS program suite for automated, virtual high-throughput screening studies, and it offers a multitude of options to customize and restrict the scope of the enumerated chemical space and thus tailor it for the demands of specific applications. To streamline the non-combinatorial exploration of chemical space, we incorporate genetic algorithms into the framework. Genetic algorithms have shown to be effective in optimizing chemical structures and generating useful compounds for different target applications. We built the code in python and implement parallelization using mpi4py library. In addition to implementing smarter algorithms, we also focus on the ease of use, workflow, and code integration to make this technology more accessible to the community.