(416f) Graph Hysteria – Comparing the Generative Performance of Graph and String-Based Translation Vaes for Molecular Design | AIChE

(416f) Graph Hysteria – Comparing the Generative Performance of Graph and String-Based Translation Vaes for Molecular Design

Authors 

Joshi, N., University of Washington
Beck, D., University of Washington
Pfaendtner, J., University of Washington
Graph representations have grown steadily more popular as input representations for de novo molecular design models.1–3 The inherent coherence between the nodes and edges of a graph and the atoms and bonds within a molecule lend graph-based architectures a natural compatibility with tasks requiring a machine-learned representation of molecular structure. The flexibility of graphs also allows for the incorporation of additional physicochemical features which have been shown to be very effective at improving property prediction models.4 However, their increased complexity compared to 1D string representations imbues them with some undesirable properties such as reduced computational efficiency5 and reduced performance on standard generative benchmarks.6

Here we systematically compare the generative and predictive properties of graph and string-based encoders and decoders by framing the task of a VAE as a set of machine-translation problems – graph-to-graph, string-to-string, graph-to-string and string-to-graph. In doing so we can isolate the impact of the input representation on the quality of the learned molecular embeddings as well as the impact of the output representation on the novelty, diversity and validity of machine-generated structures. We find that the choice of encoder has a tangible effect on the model’s ability to explore molecular phase space and that the choice of decoder has significant influence on the practical viability of a model. Finally, we also compare the effect of input representation on property prediction and model interpretability and discuss in which scenarios each architecture is likely to be optimal.

  1. Jin, W., Barzilay, R. & Jaakkola, T. Junction Tree Variational Autoencoder for Molecular Graph Generation. arXiv:1802.04364 [cs.LG] (2018).
  2. Jin, W., Barzilay, R. & Jaakkola, T. Hierarchical Generation of Molecular Graphs using Structural Motifs. in Proceedings of the 37th International Conference on Machine Learning 4839–4848 (2020).
  3. Mahmood, O., Mansimov, E., Bonneau, R. & Cho, K. Masked graph modeling for molecule generation. Nature Communications 2021 12:1 12, 1–12 (2021).
  4. Gasteiger, J., Groß, J. & Günnemann, S. Directional Message Passing for Molecular Graphs. arXiv:2003.03123 (2020) doi:10.48550/arxiv.2003.03123.
  5. Mercado, R. et al. Graph networks for molecular design. Machine Learning: Science and Technology 2, 025023 (2021).
  6. Flam-Shepherd, D., Zhu, K. & Aspuru-Guzik, A. Keeping it Simple: Language Models can learn Complex Molecular Distributions. arXiv:2112.03041 (2021) doi:10.48550/arxiv.2112.03041.