Catalyst design can be accelerated by finding physical descriptors that predict the adsorption strength of molecules and atoms on metal and metal-oxide surfaces. These descriptors can be constructed from physical intuition when the systems under investigation are relatively simple, when only one physical phenomenon dominates the interaction between the surface and the adsorbate. However, it becomes increasingly difficult to design such descriptors in complex systems when multiple phenomena influence the interaction between the adsorbate and the surface. Take for example the binding energy of a metal atom and an oxide support, which is a critical metric for controlling the supported metalâs cluster-size distribution, sintering rate, and stability. This interaction is influenced by several charge transfers occurring between the metal, the oxide surface, surface defects (e.g., vacancies or heteroatom dopants), and adsorbates from the reaction environment. This complexity cannot be captured by simple physical descriptors derived from chemical intuition alone. This issue can be addressed with statistical learning (SL) techniques that apply feature engineering (FE) and feature selection (FS) to build complex descriptors, as shown in our previous work [1-2]. These studies demonstrate the capabilities of various state-of-the-art methods for FS, yet we found that these methods tend to suffer from two critical deficiencies. First, the FE procedures generate highly correlated feature spaces that challenge weaknesses in typical FS techniques (e.g., LASSO). This leads to higher error and an earlier onset of overfitting. Second, the FS methods require significant computational resources to handle the large features spaces that are necessary for constructing features with sufficient complexity. In this work, we developed a new FE and FS methodology, called iterative Bayesian additive regression trees (iBART), to address these issues. iBART reduces the size of the candidate feature space from millions/billions to hundreds without sacrificing feature complexity, which accelerates the computational speed and lessens the required system memory. We applied iBART to derive metal/oxide binding energy descriptors from our previously published data [1-2], and the results demonstrate that iBART improves descriptor accuracy and significantly reduces computational expense.
 Nolan J. O'Connor, A. S. M. Jonayat, Michael J. Janik, and Thomas P. Senftle. âInteraction Trends between Single Metal Atoms and Oxide Supports Identified with Density Functional Theory and Statistical Learning.â Nature Catalysis, 1 (2018): 531â39.
 Chun-Yen Liu, Shijia Zhang, Daniel Martinez, Meng Li, and Thomas P. Senftle. âUsing Statistical Learning to Predict Interactions between Single Metal Atoms and Modified MgO(100) Supports.â npj Computational Materials, 6 (2020): 102.