Thermochemical data fusion using graph representation learning | AIChE

Thermochemical data fusion using graph representation learning

TitleThermochemical data fusion using graph representation learning
Publication TypeJournal Article
Year of Publication2020
AuthorsBhattacharjee, H, Vlachos, DG
JournalJournal of Chemical Information and Modeling
Date Publishedoct
KeywordsModeling and Simulation, Project 9.5

Large databases are required for “Big Data” applications in catalysis and materials science. Thermochemical databases can be created by combining data from various sources and by correcting low-fidelity data sets to higher accuracy with minimal computation. To achieve this “data fusion”, thermochemical quantities of interest, calculated at various levels of density functional theory (DFT), need to be mapped to the same, high levels of theory. In this work, a graph theoretical, statistical framework is proposed for such tasks. Subgraph frequencies are shown to provide a natural representation for learning these fusion maps. The maps are linear and are learnt with automated descriptor selection. Using a data set of as few as ∼1% from the QM9 database of 133 885 molecules, these models can predict multiple thermochemical quantities at a higher level of theory with an accuracy of 1 kcal/mol. The method is explainable, generalizable, and provides a diagnostic tool for outlier identification.

PubMed ID32966072