Predicting Multiphasic Chemical Properties with Machine Learning | AIChE

Predicting Multiphasic Chemical Properties with Machine Learning

Authors 

Luxon, A. - Presenter, Virginia Commonwealth University
Ferri, J. K., Virginia Commonwealth University
McQuade, T., Virginia Commonwealth University
Le, Q., Virginia Commonwealth University
Parish, C., University of Richmond
Although chemistry’s governing physics are largely understood, the number of features that determine the properties or performance of a product or formulation are too complex to wrangle for a real system of interest, such as an emulsion. This complexity is multiplied by intrinsic molecular features and extrinsic processing parameters both of which contribute to product performance. To model the effect of intrinsic features such as molecular structure or composition, engineers make reducing assumptions and apply mechanistic relationships to provide insight into aspects of formulations design. These models are often limited in scope and resolution. Machine learning (ML) provides the ability to tackle the high dimensional complexity of chemical systems but needs large data sets of physically relevant descriptions of molecules to create a useful model. A quantum mechanical (QM) wave function describes a molecule at the most fundamental level. A high-resolution approximation of the wave function can be calculated using density functional theory (DFT). We performed calculations on 4200 molecules for which experimental octanol-water partition coefficient (logP) values exist. The DFT geometry optimizations were carried out in the gas, water, 1-octanol phases using the Q-Chem software package. Quantum chemical descriptions of each molecule were extracted from the resulting wave functions. Supervised ML models were trained using logP as the target response and the QM descriptors as the model features. Multiple ML algorithms were used and evaluated based on prediction accuracy criteria. The resulting models have higher prediction accuracy and lower error than the current models using other molecular representations. It was found that polar surface area and polarizability are the most influential molecular features for predicting logP. Because DFT is relatively time consuming, the tradeoff between model accuracy and calculation time was investigated to elucidate the relationship between intrinsic descriptor resolution and predictive capacity in formulations consisting of multiphase systems.

Checkout

This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.

Checkout

Do you already own this?

Pricing

Individuals

AIChE Pro Members $95.00
AIChE Graduate Student Members $95.00
AIChE Undergraduate Student Members $95.00
AIChE Explorer Members $95.00
Non-Members $95.00