(295a) Group Contribution Method for Organosilicon Compounds

Sun, Y., Carnegie Mellon University
Sahinidis, N., Carnegie Mellon University
The prediction of physico-chemical properties of a compound from its structure plays a significant role in computer-aided molecular design (CAMD) problems. Among all quantitative structure property relationships, group contribution (GC) methods are the most commonly used. GC methods operate under the assumption that each molecular structure can be broken down into a set of functional groups where each group has a property-dependent contribution. Key molecular properties satisfy the group additivity principle and can be estimated by adding the product of the number of occurrences to the contribution of each group in a molecule, followed by a transformation function (usually nonlinear) chosen to closely match experimental data. GC methods enable the representation of a diverse chemical design space as combination of sub-functional groups. One of the most widely-used GC methods in CAMD is the one from Marrero and Gani [1,2]. This method introduces multiple levels of groups to capture the proximity effects of neighboring groups. This method has been used successfully in many molecular design methodologies, including in [3]. When it comes to silicon-containing structures, however, predictions from the GC methods are usually associated with high absolute average deviation, especially for melting point prediction. As a result, organosilicon compounds are usually excluded from computer-aided molecular design applications, despite their wide use in commercial products.

In this work, we aim to build group contribution models for silicon-based compounds to predict the following properties: boiling point, melting point, heat of vaporization, and liquid viscosity. The key to building an accurate and reliable group contribution model is to identify an optimal set of functional groups that form the basis in the representation of the properties. For each property, we use an information-criterion based model selection method to determine a unique set of optimal groups that minimize the chosen information criterion, prevent overfitting, and reduce root mean square error over the training data. Unlike most GC models that are only linear in the number of occurrences of groups, we utilize nonlinear basis functions to capture the effect of group interactions on physical properties in addition to the linear function of occurrences. We also explicitly model the contribution of a variety of structural features. A hierarchical regression method is used to determine the contributions of all present groups, interactions terms, and features. The resulting GC models are embedded in a CAMD framework [3] to exclusively generate organosilicon compounds that can be used as electronics coolants.

[1] J. Marrero and R. Gani, Group-contribution based estimation of pure component properties, Fluid Phase Equilibria, 183–184, 183–208, 2001

[2] J. Marrero and R. Gani, Group-contribution based estimation of octanol/water partition coefficient and aqueous stability, Industrial & Engineering Chemistry Research, 41, 6623–6633, 2002

[3] A. Samudra and N. V. Sahinidis, Optimization-based framework for computer-aided molecular design, AIChE Journal, 59, 3686–3701, 2013