We present a machine learning framework to explore the predictability limits of catalytic activity from experimental descriptor data that characterizes catalyst formulations and reaction conditions. Artificial neural networks are used to fuse descriptor data to predict activity and we use principal component analysis (PCA)  and sparse PCA  to project the experimental data into an information space and with this identify regions that exhibit low- and high-predictability. Our framework also incorporates a constrained-PCA optimization formulation that identifies new experimental points while filtering out regions in the experimental space due to constraints on technology, economics, and expert knowledge. This allows us to navigate the experimental space in a more targeted manner. Our framework is applied to a comprehensive water-gas shift reaction data set, which contains 2,228 experimental data points collected from the literature . Neural network analysis reveals strong predictability of activity across reaction conditions (e.g., varying temperature) but also reveals important gaps in predictability across catalyst formulations (e.g., varying metal, support, and promoter). PCA analysis reveals that these gaps are due to the fact that most experiments reported in the literature lie within narrow regions in the information space. We demonstrate that our framework can systematically guide experiments and the selection of descriptors in order to improve predictability and identify new promising formulations.
 Jolliffe, Ian. Principal component analysis. Springer Berlin Heidelberg, 2011.
 Zou, Hui, Trevor Hastie, and Robert Tibshirani. "Sparse principal component analysis." Journal of computational and graphical statistics 15.2 (2006): 265-286.
 OdabaÅÄ±, ÃaÄla, M. Erdem GÃ¼nay, and Ramazan YÄ±ldÄ±rÄ±m. "Knowledge extraction for water gas shift reaction over noble metal catalysts from publications in the literature between 2002 and 2012." International Journal of Hydrogen Energy39.11 (2014): 5733-5746.