(477b) Leveraging Experimental Transition Metal Complex Information to Improve Generalizability of Machine Learning Models | AIChE

(477b) Leveraging Experimental Transition Metal Complex Information to Improve Generalizability of Machine Learning Models

Authors 

Taylor, M. - Presenter, Massachusetts Institute of Technology
Arunachalam, N., Massachusetts Institute of Technology
Nandy, A., Massachusetts Institute of Technology
Harper, D., University of Pittsburgh
Creating computationally inexpensive, yet accurate, models for materials and molecular properties is a primary driver for data science in chemical engineering. However, extending these models to target systems beyond the training data often requires models to aggregate large quantities of information from different chemistries. Our group has previously generated accurate artificial neural network (ANN) models trained on density functional theory (DFT) properties for mononuclear octahedral transition metal (TM) complexes primarily through exhaustive enumeration. Here, towards expanding the applicability of our ANN models to more diverse chemical space, we first mined all experimental, structurally identified mononuclear octahedral TM complexes in the Cambridge Structural Database (CSD). By featurizing these complexes with revised autocorrelation functions, a class of graph-based heuristic descriptors developed in our group, we develop a bird’s eye view of their similarity to each other and to prior data generated in our group’s database. Next, we compare a number of approaches for selecting the next CSD complexes to evaluate by DFT, with the aim of accelerating the generalizability of re-trained ANN models to these diverse chemistries. Approaches include physically-motivated sampling, such as most-diverse sampling of ligand symmetries, and ANN-derived uncertainty sampling, i.e., selecting complexes with the lowest model confidence. We anticipate our benchmarked methods for selecting complexes for evaluation will be useful across domains where rapid expansion to unseen materials is desired.