(118c) Transfer Learning Using Large-Scale ML Models for Catalyst/Molecular Datasets | AIChE

(118c) Transfer Learning Using Large-Scale ML Models for Catalyst/Molecular Datasets

Authors 

Kolluru, A. - Presenter, Carnegie Mellon University
Ulissi, Z., Carnegie Mellon University
The development of renewable energy technologies relies heavily on efficient and cost-effective catalysts. Unfortunately, screening millions-billions of unique catalysts is infeasible with traditional ab-initio techniques. Model development has been limited by the availability of large, diverse datasets spanning across composition and chemical space. To address this, we present a transfer learning approach to use learned representations from a large pre-trained model to aid in training alternative, smaller datasets. Pretrained models from the Open Catalyst 2020 Dataset, trained on 130M+ calculations spanning 55 elements and 80 adsorbates, were used to retrain smaller, densely sampled datasets. Initial layers of ML models tend to learn basic representations of input features, atomic embeddings in this case. These representations and embeddings can be assumed to not change much across different catalytic systems and therefore a pre-trained model helps in providing a good prior for the task. We demonstrate this approach for datasets from 2 different domains. One in the catalysis domain using a literature CO dataset and the other in the domain of small molecules on the MD17 dataset. We show that the pre-trained model gives a 15% improvement for in-domain CO dataset and gives better results when tested on unseen molecules for MD17. This work aims to be the benchmark for how pre-trained ML models can be used for predictions of various small-scale catalyst or molecular datasets.