(411d) Transfer Learning with Autoencoder Latent Spaces: A Novel Framework for Improved Chemical Prediction from Scarce Datasets

Savoie, B. - Presenter, Purdue University
Modern machine learning provides promising methods for accelerating the discovery and characterization of novel chemical species. However, in many areas experimental data remains costly and scarce, and computational models are unavailable for targeted figures of merit. Here we report a promising transfer learning approach to address this challenge by using chemical latent space enrichment, whereby disparate data sources are combined in joint prediction tasks to enable improved prediction in data-scarce applications. The approach is demonstrated for pKa prediction of moderately sized molecular species using a combination of experimentally available pKa data and DFT-based characterizations of the (de)protonation free energy. A novel autoencoder framework is used to create a continuous chemical latent space that is then used in single and joint training tasks for property prediction. By combining these two datasets in a jointly-trained autoencoder framework, we observe mutual improvement in property prediction tasks in the scarce data limit. We also demonstrate an enrichment mechanism that is unique to latent space training, whereby training on excess computational data can mitigate the prediction losses associated with scarce experimental data and advantageously organize the latent space. These results demonstrate that disparate chemical data sources can be advantageously combined in an autoencoder framework with potential general application to data-scarce chemical learning tasks.