(684g) Open Challenges in Developing Generalizable Large Scale Machine Learning Models for Catalyst Discovery
AIChE Annual Meeting
Friday, November 18, 2022 - 9:48am to 10:06am
Recent developments in machine learning (ML) models and the curation of large-scale catalyst datasets have pushed the field of heterogeneous computational catalysis towards more accurate machine learning potentials (MLP). ML models have evolved from being developed for specific chemistries and element types to large-scale generalized datasets. A universal MLP has the potential to accelerate the catalyst discovery process across various applications (e.g. CO2 reduction, NH3 production, etc.) without additional specialized training efforts. In this perspective, we discuss the challenges associated with training multi-million parameter Graph Neural Networks (GNNs) on the recent Open Catalyst 2020 (OC20) Dataset, consisting of 200M+ adsorbate-catalyst Density Functional Theory (DFT) calculations spanning 55 elements and 82 adsorbates. We discuss the progress in model development on OC20 since its release. We further discuss the distribution of errors of these models across material types and adsorbates. Lastly, we discuss the challenges in constructing tasks and metrics for the prediction of energy, forces and positions.