(246e) Machine Learning and Transition Metal Chemistry: Data-Driven Comparisons of First and Second Row Complexes

Janet, J. P., Massachusetts Institute of Technology
Harper, D., Massachusetts Institute of Technology
Arunachalam, N., Massachusetts Institute of Technology
Nandy, A., Massachusetts Institute of Technology
Duan, C., Massachusetts Institute of Technology
Kulik, H. J., Massachusetts Institute of Technology
The design of molecular transition metal (TM) complexes with targeted properties has great potential for homogeneous catalysis, molecular electronics and functional materials. However, the space of possible TM complexes is enormous and poorly understood, especially compared to organic chemistry where notions of chemical similarity are well established. High-throughput virtual screening of open-shell TM complexes by density functional theory can estimate the properties of novel complexes but is complicated by spin and oxidation state dependent behavior, long computation times and difficulty in reliably converging to the intended geometries and electronic states. To address these challenges, we have developed a unified strategy for the data-driven analysis and design of transition metal complexes, combining autonomous simulation, specifically engineered geometry-free representations, and fast property prediction with neural networks, uncertainty estimation and machine-learning informed workflow control. By exploiting a few thousand DFT calculations on first row TM complexes, we were able to train neural networks and use them to design new complexes with targeted spin state ordering and frontier orbital properties. By directly incorporating measures of model confidence, we are able to validate most leads at a DFT level, providing at least an order-of-magnitude increase in design efficiency compared to DFT alone. Additionally, we predict the likelihood of success of calculations to avoid performing unproductive simulations based on live analysis of the wavefunction, greatly increasing the success rate of screening and avoiding waste of computational resources. Now, we apply our automated framework to investigate properties of second-row TM complexes containing centers Mo-Rh, making a comparison to the electronic and geometric properties first-row equivalents. We assess transferability of models trained on first row data to this space and find that our outcome prediction model is transferable and can effectively reduce the number of unproductive simulations on these systems. Model transferability for property prediction is a strong function of the representation choice, and representations that directly encode periodic information can lead to improved learning both between rows and on out-of-sample errors for each row when re-training on the combined data. This demonstrates how data-driven methods can be used as a lens to examine chemical similarity and identify chemical trends in design spaces that are otherwise relatively poorly understood.