(273b) Estimation of Thermodynamic Properties of Polycyclic Molecules By a Linear Regression Model
To address this issue, we have developed an improved model to predict the thermochemistry of polycyclic species.Â A regularized linear model was chosen to resemble the simplicity and interpretability of the additivity method. However, instead of following the definitions of the chemical groups of Bensonâs scheme, we generated a comprehensive list of identifiers containing local (atoms, bonds, and angles) and/or nonlocal (rings) structural information and used these identifiers to compose feature vectors for molecules.Â The identifiers carrying nonessential information were eliminated by L1 regularization during model training so that those remained in the final model represent an optimum set of chemical units for the calculation of the thermodynamic property of interest with the contribution of the cyclic structures. These chemical units are human interpretable and conceptually equivalent to the chemical groups defined in Bensonâs scheme but are selected by the model objectively without human intervention.
For a training set of 25,716 cyclic and polycyclic organic molecules made up of C, H, and O atoms, 408 identifiers were found to be needed for the calculation of formation enthalpies. The transferability of the trained model was validated on an independent test set of 2,858 molecules with a mean absolute error of 1.88 kcal/mol.