(197v) Novel Sustainable Materials Design with Thermodynamics-Informed Machine Learning
AIChE Annual Meeting
2023
2023 AIChE Annual Meeting
Computational Molecular Science and Engineering Forum
Poster Session: Computational Molecular Science and Engineering Forum
Monday, November 6, 2023 - 3:30pm to 5:00pm
Data-driven methods such as machine learning are promising alternatives to classical trial-and-error methodologies. The premise of these approaches consists of training machine learning models using existing datasets of known materials, and subsequently applying them to guide the development of new compounds by predicting physicochemical properties relevant to any given target application. However, a severe limitation of this approach lies on the representation of molecules in a machine-intelligible way (i.e., converting molecular structures to sets of numerical values that can be understood by mathematical machine learning models). In fact, most molecular representations proposed in the past lead to overly complex models that require a tremendous volume of experimental data to be properly trained.
To bridge the gap between small, scarce datasets and data-driven approaches, this work aims at designing novel sustainable materials using thermodynamics-informed machine learning models, with a particular focus on green solvents. The use of thermodynamics information was achieved in two separate ways. The first was to use Gaussian processes (GPs) and active learning (AL) to describe activity coefficients (a single property), which can then be combined with thermodynamic models to predict a plethora of different solvent-related properties (e.g., phase equilibria). Relying on synthetic data generated from an excess Gibbs energy model, GPs were found to accurately describe the activity coefficients of several binary mixtures across large composition and temperature ranges. Moreover, GPs could estimate their own uncertainty and identify composition/temperature regions where activity coefficient data provides the most information to the models. This was leveraged to build AL algorithms targeted at modelling phase equilibria. In many cases, a single active-learning-acquired data point was sufficient to describe the phase diagrams studied. The ability of AL to greatly reduce the amount of data needed to obtain accurate models was further verified on experimental case studies, namely individual ion activity coefficients, the solid-liquid and vapor-liquid equilibrium of deep eutectic solvents, and phase equilibria in ternary mixtures.
The second thermodynamics-informed approach was to develop a molecular descriptor that encodes thermodynamic information, namely molecular polarity, and that possesses a small, fixed size. In this sense, sigma profiles, a type of molecular descriptor obtained through quantum chemistry calculations, were used to train convolutional neural networks (CNNs) that accurately correlate and predict a wide range of physicochemical properties (molar masses, normal boiling temperatures, vapor pressures, densities, refractive indexes, and aqueous solubilities). To boost their performance, the architecture and hyperparameters of each CNN were optimized using a battery of algorithms, particularly Bayesian Optimization and Local Search. Furthermore, it was shown that thermodynamic conditions, namely temperature, can also be used as additional inputs to broaden the applicability of the models.
Finally, inverse design workflows based on this type of thermodynamics information (active learning on the sigma profile space) were also constructed, which enable the computational design of tailor-made materials bounded by a set of physicochemical properties selected based on a given application of interest. The results of this work constitute a major breakthrough in the way novel materials can be designed, as the machine learning models developed are robust, accurate, and generic enough to be applied across several families of compounds and materials.