(105b) Machine Learning-Enabled Optimization of Classical Molecular Models | AIChE

(105b) Machine Learning-Enabled Optimization of Classical Molecular Models


Befort, B. - Presenter, University of Notre Dame
DeFever, R. S., Clemson University
Maginn, E., University of Notre Dame
Dowling, A., University of Notre Dame
In this work, we integrate molecular simulations, data-driven optimization, machine learning, and applied statistics to accelerate the discovery of new ionic liquid (IL) separating agents for azeotropic separation of hydrofluorocarbon (HFC) refrigerant mixtures. Mandated by the 1987 Montreal Protocol, chlorofluorocarbon (CFC) refrigerants have been gradually replaced by HFCs to prevent ozone depletion. Many of these second-generation HFC mixtures, however, have a high global warming potential (GWP) and the 2016 Kigali agreement and more recent US regulations ordered their gradual phase-out [1][2]. Due to the often azeotropic compositions of these HFC refrigerants, existing separation methods for removing the high GWP components from low GWP HFCs are currently infeasible or not practical, but it is wasteful to incinerate HFC mixtures as some HFC components have low GWPs and can be recycled. We hypothesize that custom ILs can be designed to remove low GWP HFC components from specific HFC mixtures [3][4]. Millions of potential IL separating agents exist [5], though, making trial-and-error molecular discovery intractable. Instead, we are developing a data-to-design framework which integrates molecular simulations, experimental measurements, and process optimization to concurrently design novel separating agents and processes for azeotropic HFC mixtures. Within this framework, we obtain binary and ternary HFC solubility in IL data from either experiments or predictive molecular simulations. We fit thermodynamic models to the data and conduct preliminary process design, process optimization, uncertainty quantification, and techno-economic analyses. The insights we gain from this are then used to create a feedback loop to our experimental and molecular modeling collaborators, providing guidance towards the identification of ILs with ideal properties for HFC separations. This data-to-design framework provides the opportunity to investigate many ways in which tools of different engineering communities can be harnessed to address long-standing challenges in other communities.

As an example, we investigate harnessing machine learning, data science, and optimization techniques commonly used by the process systems engineering community to address a long-standing challenge within the molecular modeling and simulations community: efficiently calibrating molecular models. Molecular simulation, which is a powerful tool for studying the thermodynamic and dynamic properties of materials, can be used within our integrated multiscale modeling and design framework to screen vast molecular design space and optimize resource-intensive experimental characterization. With progress made by recent molecular engineering initiatives, including designer solvents, such as ILs, porous materials, and biological systems, as well as advances in computational power, molecular simulation capabilities have rapidly progressed. However, using molecular simulation in a quantitative and predictive capacity as is necessary for its use as a data generator and high-throughput screening tool within a data-to-design workflow requires accurate molecular models, called force fields. Developing force fields is a laborious, time-consuming endeavor due to the expense of calculating the objective function, or error between the simulation output and the property the simulation is trying to capture. Though off-the-shelf force fields offer accurate predictions for some systems, they lack quantitative accuracy across the extraordinary range of chemistries found in the natural and synthetic world. Thus, further manual parameter tuning, an effort which often takes months, is standard practice to ensure the model has the required accuracy for the molecule(s) and properties of interest [6]. The lack of tools to rapidly calibrate force field models represents a barrier to innovation enabled by predictive modeling.

Thus we propose a machine learning-enabled force field optimization framework (see Figure 1). While force field development can be achieved by fitting parameters to data from quantum calculations, here we show surrogate assisted optimization, integrating Gaussian Process regression models with Bayesian optimization, facilitates rapid “top-down” calibration of van der Waals interaction parameters within a force field to reproduce experimental property measurements. This calibration workflow is framed as a multi-objective optimization scheme and we utilize dominated versus non-dominated criteria to find multiple parameter sets with low error in reproducing multiple thermophysical properties. This work challenges existing ideas within the molecular modeling community that an objective “best” force field exists. We show the effectiveness of using surrogate models for force field calibration, establishing that surrogate models can identify accurate force fields using limited training data. This is because the physics-based nature of force fields, including physically interpretable parameters and functional forms, makes them extensible to multiple properties, including those to which they were not calibrated [6]. We compare the efficiency of parameter space exploration and exploitation using iterative surrogate model updates enabled through large samples of parameter space with automated Bayesian sampling (see Figure 2). We find that Bayesian optimization finds lower error parameter sets in half the time [7]. As a demonstration case, we show these methods provide a quick and efficient route to optimized molecular models which reproduce vapor-liquid equilibrium (VLE) properties for two environmentally relevant hydrofluorocarbons, HFC-32 and HFC-125.

Finally, we evaluate the insights into classical molecular modeling paradigms and the guidance these tools provide to experimental and process modeling collaborators within the data-to-design framework. Because we found multiple low error parameter sets for each HFC that were well-spread out through parameter space, we performed analyses using traditional applied statistics techniques to understand the apparent over-parameterization of the system. For example, we performed a local identifiability analysis and showed that the force field models for both HFCs were not fully identifiable when tuning to a single property (liquid density), but became so when full VLE data were included in the calibration workflow, indicating the importance of measuring a variety of data and including uncorrelated properties in an optimization procedure. Additionally we investigate using model selection techniques while simultaneously calibrating multiple HFC force fields to determine the number of parameters which balance the complexity of the model optimization procedure with the quality of fit. We perform preliminary efficiency analyses of the formulation of the Bayesian optimization workflow to determine the most effective method for automatically finding optimal parameter sets. As further capabilities are developed, this tool will be harnessed to rapidly calibrate molecular models for other systems of interest and multiscale design schemes to facilitate rapid molecular and process design and optimization. We emphasize the significance of this work in the molecular modeling community, as rapid generation of accurate molecular models is the key to harnessing the full quantitative predictive capacity of molecular simulations.


[1]United Nations Environment Programme. Ozone Secretariat. (2006). Handbook for the Montreal protocol on substances that deplete the ozone layer. UNEP/Earthprint.

[2]EPA, Phasedown of hydrofluorocarbons: Establishing the allowance allocation and trading program under the American innovation and manufacturing act (DoA: 2021).

[3]Plechkova, N. V., & Seddon, K. R. (2008). Applications of ionic liquids in the chemical industry. Chemical Society Reviews, 37(1), 123-150.

[4]Chávez-Islas, L. M., Vasquez-Medrano, R., & Flores-Tlacuahuac, A. (2011). Optimal molecular design of ionic liquids for high-purity bioethanol production. Industrial & Engineering Chemistry Research, 50(9), 5153-5168.

[5]Holbrey, J. D., & Seddon, K. R. (1999). Ionic liquids. Clean products and processes, 1(4), 223-236.

[6]Befort, B. J., DeFever, R. S., Tow, G. M., Dowling, A. W., & Maginn, E. J. (2021). Machine learning directed optimization of classical molecular modeling force fields. Journal of Chemical Information and Modeling, 61(9), 4400-4414.

[7]Bridgette J. Befort, Ryan S. DeFever, Edward J. Maginn, Alexander W. Dowling. Machine Learning-Enabled Optimization of Force Fields for Hydrofluorocarbons (2021). PSE2021+. Accepted for publication.