(477b) Leveraging Experimental Transition Metal Complex Information to Improve Generalizability of Machine Learning Models
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Topical Conference: Applications of Data Science to Molecules and Materials
Innovations in Methods of Data Science
Wednesday, November 18, 2020 - 8:15am to 8:30am
Creating computationally inexpensive, yet accurate, models for materials and molecular properties is a primary driver for data science in chemical engineering. However, extending these models to target systems beyond the training data often requires models to aggregate large quantities of information from different chemistries. Our group has previously generated accurate artificial neural network (ANN) models trained on density functional theory (DFT) properties for mononuclear octahedral transition metal (TM) complexes primarily through exhaustive enumeration. Here, towards expanding the applicability of our ANN models to more diverse chemical space, we first mined all experimental, structurally identified mononuclear octahedral TM complexes in the Cambridge Structural Database (CSD). By featurizing these complexes with revised autocorrelation functions, a class of graph-based heuristic descriptors developed in our group, we develop a birdâs eye view of their similarity to each other and to prior data generated in our groupâs database. Next, we compare a number of approaches for selecting the next CSD complexes to evaluate by DFT, with the aim of accelerating the generalizability of re-trained ANN models to these diverse chemistries. Approaches include physically-motivated sampling, such as most-diverse sampling of ligand symmetries, and ANN-derived uncertainty sampling, i.e., selecting complexes with the lowest model confidence. We anticipate our benchmarked methods for selecting complexes for evaluation will be useful across domains where rapid expansion to unseen materials is desired.