(342am) Improving Feature Selection Methods for Heterogeneous Catalysis | AIChE

(342am) Improving Feature Selection Methods for Heterogeneous Catalysis

Authors 

Liu, C. Y. - Presenter, Rice University
Ye, S., Rice University
Li, M., Rice University
We developed a novel feature selection (FS) method, called iterative Bayesian additive regression trees (iBART), for deriving robust physical descriptors in heterogeneous catalysis. Traditional FS approaches first collect the chemical properties of each constituent in the system as the primary features, and then apply a feature engineering (FE) step to construct complex, non-linear secondary features. These processes can be repeated indefinitely to generate features of arbitrary complexity. FS is then used to identify the features in the resulting candidate pool that best predict the property of interest. This approach suffers from the fact that the FS step must operate on an enormous feature space that contains millions of candidate features that are highly correlated by construction. This leads to a high rate of false positives, while also requiring significant computational resources. We developed the iBART method to address these issues, and we show that it generates more accurate models at reduced computational cost compared with other state-of-the-art methods for FS. This is accomplished by iteratively applying the FS step (i.e., Bayesian additive regression trees, BART) to first select the best-performing features before proceeding to the FE steps. This iterative FS/FE process can be repeated indefinitely without causing the candidate feature pool to grow beyond the limits imposed by available computational resources. Thus, high-quality descriptors can be constructed in a reduced feature space containing hundreds of candidates, as opposed to millions or billions. The enhanced performance of iBART is demonstrated here by applying it to both a simulated data set (i.e., statistical simulation) and to our previously published data sets describing metal atom adsorption on oxide surfaces. We find that iBART produces more accurate descriptors than those found in our previous work using traditional FS methods. This provides a powerful FS tool that can be implemented on laptop computers instead of requiring high-performance computing resources.

Topics