One current challenge in machine learning for heterogeneous catalysis is selecting descriptors which correctly relate catalyst composition to the surface properties that directly influence activity. For machine learning to be predictive, it requires that all features input to the model are easily accessible or calculable, but collection of surface information requires measurement of synthesized catalysts. This often precludes using surface information as a predictor for catalyst activity, since the goal of a catalyst discovery framework is to predict activity without needing to synthesize new catalysts. To bridge this gap, a machine learning model is created to encode Raman spectroscopy data into spectroscopic descriptors. This model is trained to predict these descriptors from easily accessible catalyst properties, such as electronegativity or ionization energy, and experimental synthesis conditions. A similar methodology has been demonstrated for x-ray diffraction data but has yet to be applied to surface sensitive techniques such as Raman spectroscopy . These intermediate surface descriptors are subsequently used alongside easily accessible catalyst properties as training data in a machine learning algorithm to predict catalyst activity for unknown catalyst formulations.
 A. Corma, J. M. Serra, P. Serna, M. Moliner, Integrating high-throughput characterization into combinatorial heterogeneous catalysis: Unsupervised construction of quantitative structure/property relationship models. J. Catal. 232, 335â341 (2005).