(463b) Framework for Computer-Aided Molecular Design

Computer-Aided Molecular Design (CAMD) is a powerful technique to identify compounds that are well suited for various applications. CAMD is reverse property prediction, i.e., it seeks to design molecules that fit given property criteria using property prediction techniques. CAMD techniques combine molecular fragments in an optimal way to design a molecule that maximizes its potential for the application, while maintaining feasibility in terms of environmental and physical stability.

This paper presents a generic framework and its computational implementation that can be used in various CAMD applications. The motivation for this work is to create an implementation that can be applied over a wide range of CAMD problems.

The framework consists of three stages 1) Preliminary screening, 2) Detailed structural analysis, and 3) Extension to other properties.

The preliminary screening generates molecular candidates that fit a relaxed range of defined property targets. The targets are relaxed as accuracy of property prediction at this stage is lower. The use of Mixed Integer Linear Programming (MILP) to generate molecules addresses the major issue related with the exploration of a search space. The search space for CAMD problems explodes exponentially with the molecular size. Thus previous works have been limited to enumeration of a small part of these solutions. The MILP approach allows us to exploit the efficient, fast solution techniques available today for solving such problems and does not restrict molecules to a specific class.

The second stage of the framework screens the molecules further by using structure-dependent corrections to the estimated property values. As the CAMD approach is based on property prediction models, the diversity and incongruity between different models for different properties creates a major hurdle in a generic approach. The proposed framework allows us to use an assortment of models which may differ in their basic molecular descriptors. The MILP approach is again employed to determine the specific structure of the candidates and thus add the higher order group contributions. The property method used for these stages is the GC+ method developed by Gani and coworkers [1, 2].

The third stage demonstrates another important feature in the framework, namely its ability to incorporate different properties or constraints to further screen the solutions. As the molecular structure and primary properties are accurately determined in previous stages, various other property correlations and empirical methods can be used in this stage to further constrain the solutions set.

The proposed framework has been applied for the design of secondary refrigerants and yielded many previously unknown compounds [3]. We will discuss the latter as well as additional applications of the framework, along with the issue of stability of the predicted molecules.


1. Marrero J. and R. Gani, A group contribution based estimation of pure component properties, Fluid Phase Equilibria, 183, 183-208, 2001.

2. Gani R.,P. Harper and, M. Hostrup , Automatic Creation of Missing Groups through Connectivity Index for Pure-Component Property Prediction, Industrial & Engineering Chemistry Research 2005 44 (18), 7262-7269

3. Samudra A and N. Sahinidis, Design of Secondary Refrigerants: a Combined Optimization-Enumeration Approach, to be published in Proceedings of the FOCAPD 2009