(584d) Combining Density Functional Theory Calculations with Machine Learning for Studying Complex Chemistries on Surfaces

Heyden, A., University of South Carolina
Global climate concerns, depletion of fossil resources, and increased demand for second-generation renewable fuels and chemicals have stimulated research in utilizing biomass derivatives. However, the complexity of the reaction network of heterogeneously catalyzed biomass derivatives, which often involve hundreds of intermediates and transition states, greatly hinders the computational study and design of new catalysts for utilizing biomass. To reduce the large cost of computing different adsorptions energies and transition state energies on various active site models, linear scaling relationships have been developed for adsorption and transition state energy prediction, such as transition state scaling (TSS) and BEP. In these linear models, only a few descriptor values are needed to calculate the energies of various intermediate species and transition states. However, these linear relationships are usually designed for small molecules and the effectiveness and applicability to more complex biomass-based chemistries is unknown. Therefore, we systematically studied the performance of linear scaling versus advanced non-linear machine learning (ML) models such as support vector regression (SVR), kernel ridge regression (KRR) and Gaussian process (GP) modeling in the hydrodeoxygenation (HDO) of organic acids. We first trained our models using data from the HDO of propanoic acid (Pac) to obtain best models and descriptors that can be applied for larger reactants, i.e., the HDO of succinic acid (SUCC). We show that with the help of ML models and proper selection of descriptors, we are able to extrapolate between molecules with a different number of carbon and oxygen atoms.