(197v) Novel Sustainable Materials Design with Thermodynamics-Informed Machine Learning | AIChE

(197v) Novel Sustainable Materials Design with Thermodynamics-Informed Machine Learning

Authors 

Maginn, E., University of Notre Dame
Scientists have traditionally employed trial-and-error methodologies to design novel materials, often complemented by basic heuristic rules or chemical intuition (e.g., “like dissolves like”). However, to date, this simplistic approach has led to the discovery and characterization of only a small fraction of all synthesizable compounds. This limitation is aggravated when studying mixtures, particularly those that show potential in minimizing the environmental impact of the chemical industry, as the number of all possible molecule combinations is virtually infinite. The massive amount of resources and extreme cost necessary to explore this chemical landscape is simply too demanding to be feasible.

Data-driven methods such as machine learning are promising alternatives to classical trial-and-error methodologies. The premise of these approaches consists of training machine learning models using existing datasets of known materials, and subsequently applying them to guide the development of new compounds by predicting physicochemical properties relevant to any given target application. However, a severe limitation of this approach lies on the representation of molecules in a machine-intelligible way (i.e., converting molecular structures to sets of numerical values that can be understood by mathematical machine learning models). In fact, most molecular representations proposed in the past lead to overly complex models that require a tremendous volume of experimental data to be properly trained.

To bridge the gap between small, scarce datasets and data-driven approaches, this work aims at designing novel sustainable materials using thermodynamics-informed machine learning models, with a particular focus on green solvents. The use of thermodynamics information was achieved in two separate ways. The first was to use Gaussian processes (GPs) and active learning (AL) to describe activity coefficients (a single property), which can then be combined with thermodynamic models to predict a plethora of different solvent-related properties (e.g., phase equilibria). Relying on synthetic data generated from an excess Gibbs energy model, GPs were found to accurately describe the activity coefficients of several binary mixtures across large composition and temperature ranges. Moreover, GPs could estimate their own uncertainty and identify composition/temperature regions where activity coefficient data provides the most information to the models. This was leveraged to build AL algorithms targeted at modelling phase equilibria. In many cases, a single active-learning-acquired data point was sufficient to describe the phase diagrams studied. The ability of AL to greatly reduce the amount of data needed to obtain accurate models was further verified on experimental case studies, namely individual ion activity coefficients, the solid-liquid and vapor-liquid equilibrium of deep eutectic solvents, and phase equilibria in ternary mixtures.

The second thermodynamics-informed approach was to develop a molecular descriptor that encodes thermodynamic information, namely molecular polarity, and that possesses a small, fixed size. In this sense, sigma profiles, a type of molecular descriptor obtained through quantum chemistry calculations, were used to train convolutional neural networks (CNNs) that accurately correlate and predict a wide range of physicochemical properties (molar masses, normal boiling temperatures, vapor pressures, densities, refractive indexes, and aqueous solubilities). To boost their performance, the architecture and hyperparameters of each CNN were optimized using a battery of algorithms, particularly Bayesian Optimization and Local Search. Furthermore, it was shown that thermodynamic conditions, namely temperature, can also be used as additional inputs to broaden the applicability of the models.

Finally, inverse design workflows based on this type of thermodynamics information (active learning on the sigma profile space) were also constructed, which enable the computational design of tailor-made materials bounded by a set of physicochemical properties selected based on a given application of interest. The results of this work constitute a major breakthrough in the way novel materials can be designed, as the machine learning models developed are robust, accurate, and generic enough to be applied across several families of compounds and materials.