(420e) Active Learning for Data-Efficient Training of Machine Learning Models to Predict Adsorption in Metal-Organic Frameworks (MOFs).

Conference

AIChE Annual Meeting

Year

2023

Proceeding

2023 AIChE Annual Meeting

Group

Computational Molecular Science and Engineering Forum

Session

Automated Molecular and Materials Discovery: Integrating Machine Learning, Simulation, and Experiment II

Time

Tuesday, November 7, 2023 - 4:30pm to 4:45pm

Authors

Osaro, E. - Presenter, University of Notre Dame

Fajardo-Rojas, F., Colorado School of Mines

Gomez Gualdron, D., Colorado School of Mines

Colón, Y.

Metal-organic frameworks (MOFs) are a promising class of porous, crystalline materials for numerous applications. For instance, a MOF with the â€œrightâ€ adsorption properties could enable replacing a given thermal-based, chemical separation process with and adsorption-based one, which could in turn bring up a 10-fold increase in energy efficiency. As chemical separation account for roughly 15% of U.S. energy usage, and about 80% of these separations are done thermally, finding the â€œrightâ€ MOF for each separation could potentially reduce U.S. energy expenditure by around 11%. Given i) the overwhelmingly large MOF â€œdesign spaceâ€ (with trillions of potential designs), and ii) the thousands of chemical separations, each which could be potentially performed at a variety of different operating conditions (OC, e.g., temperature, pressure, relative proportion of components), one can imagine that computation is to play a central role in identifying for each chemical separation the most promising MOF with the corresponding â€œoptimalâ€ operating condition.

The challenge is that classical simulations methods to predict adsorption (e.g., grand canonical Monte Carlo (GCMC)) are just â€œfast enoughâ€ to make thousands to hundreds of thousand adsorption predictions in a reasonable timeframe. However, finding the optimal MOF-OC combination for each chemical separation of interest is a task that would probably entail trillions of adsorption predictions. Thus, faster methods such as machine learning (ML) are better poised to take such task. In earlier work, some of us demonstrated the ability of multilayer perceptron (MLP) models to learn to predict adsorption at multiple conditions, for multiple molecules when provided with GCMC-generated training data for adsorption of different molecules at different pressures in different MOFs. However, this demonstration was limited to nearâ€‘spherical, non-polar molecules, and extension to a wider class of molecules requires increasing the diversity and size of the training data. However, due to the computational resources needed to generate training data and training the ML model, there is a critical need to keep training dataset as small as possible.

Active learning (AL) can play a very important role in efficiently and â€œsmartlyâ€ navigate the â€œadsorption spaceâ€ to limit the burden on data generation while enabling the training of highly predictive ML models. In this work, we first establish the implementation of a Gaussian process regression (GPR) framework to model pure component adsorption of nitrogen at 77K from 10^-5 to 1 bar, methane at 298K from 10^-5 to 100 bar, carbon dioxide at 298K from 10^-5 to 100 bar, and hydrogen at 77K from 10^-5 to 100 bar on eleven diverse sets of MOFs. In this GPR framework, a first model is trained with an initial data set known as the â€œprior.â€ Then subsequent models are retrained upon subsequent addition of adsorption data to the dataset, which is decided by the uncertainty of the GP model evaluated on a new data set. Here, we tested three different â€œpriorâ€ selection schemes and make a recommendation on the best prior selection scheme for 44 adsorbate-adsorbent pairs. Recommendation is primarily based on the mean absolute error and the total amount of data points required for convergence of the prediction of the ML model.

Upon establishing the GPR framework, we demonstrated the application of the methodology to include alchemical molecules. These hypothetical species can be characterized by two main features: intermolecular potential parameters (e.g., well-depth and the distance at which the intermolecular potential between two particles is zero); and intra-molecular properties, such as bond length and charges.

A previously developed MLP model trained on GCMC data points (approximately 5 million) obtained from 1800 topologically and chemically diverse ToBaCCo generated MOFs using several single- and multiple-site alchemical species at different fugacities has led to the progress in adsorption studies providing accurate results for a diverse set of real molecules. Using the established AL framework and the developed MLP model as a substitute for GCMC, we show we can make accurate GPR models that predict the isotherm of all alchemical species across these 1800 diverse MOFs using a different set of test-data set including the fugacity and alchemical parameters. Our results show we saved 57.5% of the data, indicating that only around 2.2 million simulations are needed to train a new MLP model for adsorption.

Topics

Adsorption

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 Center for Hydrogen Safety Americas Conference

PD2M Conference on Modeling and Simulation Applications in Pharmaceutical Development and Manufacturing

Upcoming Conferences & Events

2024 Center for Hydrogen Safety Americas Conference

World Digital Congress of Chemical and Biochemical Engineering

PD2M Conference on Modeling and Simulation Applications in Pharmaceutical Development and Manufacturing

CCPS Latin America Regional Meeting (Spanish)

The Future of AI

2024 Lebanon Student Regional Conference

2024 Greece Chem-E-Car Competition

International Congress on Sustainability Science & Engineering (ICOSSE '24) and RAPID Roadmapping Workshop

2024 Process Development Symposium

CEP: May 2024

CEP: April 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(420e) Active Learning for Data-Efficient Training of Machine Learning Models to Predict Adsorption in Metal-Organic Frameworks (MOFs).

AIChE Annual Meeting

2023

2023 AIChE Annual Meeting

Computational Molecular Science and Engineering Forum

Automated Molecular and Materials Discovery: Integrating Machine Learning, Simulation, and Experiment II

Tuesday, November 7, 2023 - 4:30pm to 4:45pm

Authors

Topics

More Conference Links

Visit Orlando

Universal Studios Offer

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams