(152f) Capturing Molecular Interactions in Graph Neural Networks: A Case Study in Multi-Component Phase Equilibrium | AIChE

(152f) Capturing Molecular Interactions in Graph Neural Networks: A Case Study in Multi-Component Phase Equilibrium


Qin, S. - Presenter, University of Wisconsin-Madison
Jiang, S., University of Wisconsin-Madison
Balaprakash, P., Argonne National Laboratory
Van Lehn, R., University of Wisconsin-Madison
Zavala, V., University of Wisconsin-Madison
Machine learning has been widely to predict diverse molecular properties such as water solubility, toxicity, and lipophilicity [1]. In these approaches, molecular descriptors or fingerprints are obtained as the input data to develop quantitative structure-property relationship (QSPR) models [2,3]. More recently, there has been an increasing trend in applying deep learning architectures to study more complex chemical systems with multiple components, such as alloys [4], copolymers [5,6], chemical reactions [7,8], and gas [9] and liquid [10,11,12] mixtures. Among these deep-learning techniques, graph neural networks (GNNs) [13] have gained special popularity because they can directly use molecular graph representations, thus avoiding the need to pre-calculate/pre-define molecular descriptors.

In a general GNN-based approach for molecular property prediction, atom and bond features are propagated based on the molecular structure for a single molecule input. The embedded features are then sent to fully-connected layers to construct predictive models [14]. When dealing with multiple components, several attempts have been made. The typical method is to average or concatenate the embedded features of individual molecules and use them as the system-level features for property inference with fully-connected or attentive layers [6,7,8]. Previous studies have also incorporated weighted sums or concatenation to take into account the composition information when needed [6]. However, these approaches have not captured intra- and inter- molecular interactions in an explicit manner.

In this work, we present a GNN architecture to incorporate both intra- and inter- molecular interactions via the combination of atomic-level (local) graph convolution and molecular-level (global) message passing for property prediction of multi-component chemical systems. To connect local features with global features, we constructed a molecular interaction network as the intermediate step. The molecular interaction network is a complete graph with each composition-weighted node representing a molecule and each edge representing a hypothetical inter-molecular interaction, such as hydrogen bonding information. It serves as a physics-informed topological prior to aid feature extraction from multi-component systems. Here, we tested the proposed GNN architecture through a case study on activity coefficient predictions of multi-component systems. We also provided a framework that can intake a given mixture (binary or ternary) and generate the corresponding phase diagrams (P-x-y) using the trained GNN along with thermodynamic calculations. We also performed counter-factual analysis [15] of the trained model to identify the impact of functional groups on activity coefficients to obtain physical insights.


[1] Sanchez-Lengeling, B., & Aspuru-Guzik, A. (2018). Inverse molecular design using machine learning: Generative models for matter engineering. Science, 361(6400), 360-365.

[2] Rogers, D., & Hahn, M. (2010). Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5), 742-754.

[3] Karelson, M., Lobanov, V. S., & Katritzky, A. R. (1996). Quantum-chemical descriptors in QSAR/QSPR studies. Chemical reviews, 96(3), 1027-1044.

[4] Natarajan, A. R., & Van der Ven, A. (2018). Machine-learning the configurational energy of multicomponent crystalline solids. npj Computational Materials, 4(1), 1-7.

[5] Wilbraham, L., Sprick, R. S., Jelfs, K. E., & Zwijnenburg, M. A. (2019). Mapping binary copolymer property space with neural networks. Chemical science, 10(19), 4973-4984.

[6] Hanaoka, K. (2020). Deep neural networks for multicomponent molecular systems. ACS omega, 5(33), 21042-21053.

[7] Wei, J. N., Duvenaud, D., & Aspuru-Guzik, A. (2016). Neural networks for the prediction of organic chemistry reactions. ACS central science, 2(10), 725-732.

[8] Coley, C. W., Jin, W., Rogers, L., Jamison, T. F., Jaakkola, T. S., Green, W. H., ... & Jensen, K. F. (2019). A graph-convolutional neural network model for the prediction of chemical reactivity. Chemical science, 10(2), 370-377.

[9] Pan, Y., Ji, X., Ding, L., & Jiang, J. (2019). Prediction of lower flammability limits for binary hydrocarbon gases by quantitative structure—property relationship approach. Molecules, 24(4), 748.

[10] Wang, T., Tang, L., Luan, F., & Cordeiro, M. N. D. (2018). Prediction of the toxicity of binary mixtures by QSAR approach using the hypothetical descriptors. International journal of molecular sciences, 19(11), 3423.

[11] Chinta, S., & Rengaswamy, R. (2019). Machine learning derived quantitative structure property relationship (QSPR) to predict drug solubility in binary solvent systems. Industrial & Engineering Chemistry Research, 58(8), 3082-3092.

[12] Jirasek, F., Alves, R. A., Damay, J., Vandermeulen, R. A., Bamler, R., Bortz, M., ... & Hasse, H. (2020). Machine learning in thermodynamics: Prediction of activity coefficients by matrix completion. The Journal of Physical Chemistry Letters, 11(3), 981-985.

[13] Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., ... & Pande, V. (2018). MoleculeNet: a benchmark for molecular machine learning. Chemical science, 9(2), 513-530.

[14] Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., ... & Sun, M. (2020). Graph neural networks: A review of methods and applications. AI Open, 1, 57-81.

[15] Wellawatte, G. P., Seshadri, A., & White, A. D. (2022). Model agnostic generation of counterfactual explanations for molecules. Chemical Science.