(203d) Giving Attention to Generative Models for De Novo Molecular Design
AIChE Annual Meeting
2021
2021 Annual Meeting
Topical Conference: Applications of Data Science to Molecules and Materials
Innovations in Methods of Data Science
Monday, November 8, 2021 - 4:15pm to 4:30pm
Here we explore the impact of adding self-attention layers to generative β-VAE models and show that those with attention are able to learn a complex âmolecular grammarâ while improving performance on downstream tasks such as accurately sampling from the latent space (âmodel memoryâ) or exploring novel chemistries not present in the training data. There is a notable relationship between a modelâs architecture, the structure of its latent memory and its performance during inference. For instance, we find that there is an unavoidable tradeoff between model exploration and validity that is a function of the complexity of the latent memory. However, novel sampling schemes may be used that optimize this tradeoff.
We also demonstrate the ability of the transformer VAE to construct a set of complex, human-interpretable molecular substructural features in an unsupervised fashion. We compare these learned features across different input representations including SMILES and SELFIES6 strings as well as those extracted from traditional cheminformatics software packages. Finally, we discuss how these models may eventually be used in tandem with natural language models, high-throughput molecular dynamics simulations and reinforcement learning algorithms to present a unified AI-based framework for molecular discovery and optimization.
- Bahdanau, D., Cho, K. & Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473 [cs.CL] (2014).
- Brown, T. B. et al. Language Models are Few-Shot Learners. in 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada (arXiv, 2020).
- Service, R. F. âThe game has changed.â AI triumphs at protein folding. Science 370, 1144â1145 (2020).
- Payne, J., Srouji, M., Yap, D. A. & Kosaraju, V. BERT Learns (and Teaches) Chemistry. arXiv:2007.16012 [q-bio.BM] (2020).
- Schwaller, P., Hoover, B., Reymond, J.-L., Strobelt, H. & Laino, T. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Science Advances 7, eabe4166 (2021).
- Krenn, M., Häse, F., AkshatKumar, N., Friederich, P. & Aspuru-Guzik, A. Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation. Machine Learning: Science and Technology 1, (2020).