(219h) Characterizing Complex Solvent Environments in Acid-Catalyzed Reactions Using Molecular Dynamics Simulations and 3D Convolutional Neural Nets

Authors: 
Jiang, S., University of Wisconsin-Madison
Chew, A. K., University of Wisconsin
Zhang, W., University of Wisconsin-Madison
Van Lehn, R. C., University of Wisconsin-Madison
Zavala, V. M., University of Wisconsin-Madison
The catalytic conversion of lignocellulosic biomass is a promising strategy to obtain transportation fuels and high-value chemicals from renewable feedstocks [1]. The conversion of biomass-derived molecules is typically facilitated by liquid-phase, acid-catalyzed reactions that are hindered by low reactivity in aqueous solution. One method to increase acid-catalyzed reaction rates is to modify the solvent composition by mixing organic, polar aprotic cosolvents with water to create mixed-solvent environments [2, 3]. Compared with trial-and-error experimentation, computational tools have been applied to understand solvent effects on chemical reactivity and guide the design of solvent mixtures for efficient and cheap biomass conversion processes [4]. Molecular dynamics (MD) simulations can be utilized to understand and predict solvent effects on experimental reaction rates for the conversion of biomass-derived model compounds in aqueous mixtures of 1,4-dioxane (DIO), g-valerolactone (GVL), and tetrahydrofuran (THF) [3]. We developed an MD model consisting of only reactant, water, and cosolvent molecules and calculated three simulation-derived descriptors for a linear regression model to predict experimental reaction rates and found good agreement in DIO-water mixtures [3]. The regression model was less accurate for GVL- and THF-water mixtures, indicating that either descriptor computed with classical MD cannot quantify reaction rates in these systems or that more complex descriptors must be defined to capture reactivity trends. However, designing new descriptors of reaction kinetics based on human intuition is challenging, often requiring complex and time-consuming data analysis tools (e.g. solvation free energies [6] or three-dimensional solvent mapping [5]) that cannot be readily generalized across a range of solvent compositions.

As an alternative to designing descriptors via human intuition, machine learning methods have been increasingly used to infer molecular properties by automatically extracting features from complex sources of data [7-13]. For example, convolutional neural networks (CNNs) can be used to identify and quantify patterns within two-dimensional (2D) spatial datasets such as images [14]. By training on a suitable set of labeled image data, CNNs extract spatial features without requiring human supervision and can then utilize these features to classify image contents. CNNs can be further generalized to extract features from three-dimensional (3D) volumetric data [15], which can facilitate the analysis of 3D molecular structures. For example, 3D CNNs have recently been used to detect protein functional sites [16], evaluate protein-ligand binding sites [17], and quantify protein-ligand binding affinities [18] by training on protein database structures. Based on these examples and our prior success using classical MD simulations to predict acid-catalyzed reaction outcomes [3], we hypothesize that 3D CNNs can exploit the output of classical MD simulations to more accurately predict solvent effects on acid-catalyzed reaction rates.

In this work, we developed 3D CNNs that utilize atomic positions obtained from classical MD simulation trajectories to predict the rates of liquid-phase, acid-catalyzed biomass conversion reactions in mixed-solvent environments. We constructed 3D grids of voxels (the 3D analogs of 2D pixels) that represent atomistic positions sampled in corresponding MD simulations. We find that our 3D CNN model, which we call SolventNet, predicts experimental reaction rates more accurately than models based on human-selected, MD-derived descriptors [3] and previously developed 3D CNNs (ORION [19] and VoxNet [20]). Surprisingly, reaction rate predictions with SolventNet require as little as 2 ns of classical MD trajectory data, a 100-fold improvement from the original 205 ns of MD data used in models based on human-selected descriptors [3]. This indicates that 3D atomistic positions embed significant information. We further show that SolventNet generalizes to new system compositions using leave-one-out cross-validation in which all data for a cosolvent-water mixture or reactant were treated as the test set and excluded from model training. Finally, we tested the predictive power of SolventNet for reactants in three additional polar aprotic cosolvents not included in model training: dimethyl sulfoxide, acetonitrile, and acetone. SolventNet still accurately predicts experimentally measured reaction rates in solvent mixtures containing these cosolvents despite their distinct properties (e.g., functional groups, basicity, and polarizability). To our knowledge, this work is the first to integrate 3D CNNs and classical MD simulations for the prediction of acid-catalyzed reaction rates. We envision that the computational efficiency associated with the combination of 3D CNNs and classical MD simulations will enable the integration of these tools with process models to screen solvents and optimize reactor conditions for biomass conversion processes [21].

[1] L. Shuai and J. Luterbacher, Chemsuschem, 2016, 9, 133-155.

[2] M. A. Mellmer, C. Sener, J. M. R. Gallo, J. S. Luterbacher, D. M. Alonso and J. A. Dumesic, Angew Chem Int Edit, 2014, 53, 11872-11875.

[3] T. W. Walker, A. K. Chew, H. X. Li, B. Demir, Z. C. Zhang, G. W. Huber, R. C. Van Lehn and J. A. Dumesic, Energy & Environmental Science, 2018, 11, 617-628.

[4] J. J. Varghese and S. H. Mushrif, Reaction Chemistry & Engineering, 2019, 4, 165-206.

[5] S. H. Mushrif, S. Caratzoulas and D. G. Vlachos, PCCP, 2012, 14, 2637-2644.

[6] A. K. Chew and R. C. Van Lehn, Front Chem, 2019, 7, 439.

[7]Connor W. Coley, W. Jin, L. Rogers, T. F. Jamison, T. S. Jaakkola, W. H. Green, R. Barzilay and K. F. Jensen, Chem Sci, 2019, 10, 370-377.

[8] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik and R. P. Adams, 2015.

[9] R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández-Lobato, B. Sánchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams and A. Aspuru-Guzik, ACS central science, 2018, 4, 268-276.

[10] N. E. Jackson, A. S. Bowen, L. W. Antony, M. A. Webb, V. Vishwanath and J. J. de Pablo, Sci Adv, 2019, 5, eaav1190.

[11] E. Y. Lee, B. M. Fulan, G. C. L. Wong and A. L. Ferguson, Proceedings of the National Academy of Sciences, 2016, 113, 13588-13593.

[12] Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing and V. Pande, Chem Sci, 2018, 9, 513-530.

[13] S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Schütt and K.-R. Müller, Sci Adv, 2017, 3, e1603015.

[14] W. Rawat and Z. H. Wang, Neural Comput, 2017, 29, 2352-2449.

[15] R. D. Singh, A. Mittal and R. K. Bhatia, Multimedia Tools and Applications, 2019, 78, 15951-15995.

[16] W. Torng and R. B. Altman, Bioinformatics, 2018, 35, 1503-1512.

[17] J. Jiménez, S. Doerr, G. Martínez-Rosell, A. S. Rose and G. De Fabritiis, Bioinformatics, 2017, 33, 3036-3042.

[18] J. Jiménez, M. Škalič, G. Martínez-Rosell and G. De Fabritiis, J Chem Inf Model, 2018, 58, 287-296.

[19] N. Sedaghat, M. Zolfaghari, E. Amiri and T. Brox, arXiv preprint arXiv:1604.03351, 2016.

[20] D. Maturana and S. Scherer, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, 922-928.

[21] D. M. Alonso, S. H. Hakim, S. Zhou, W. Won, O. Hosseinaei, J. Tao, V. Garcia-Negron, A. H. Motagamwala, M. A. Mellmer, K. Huang, C. J. Houtman, N. Labbé, D. P. Harper, C. T. Maravelias, T. Runge and J. A. Dumesic, Sci Adv, 2017, 3, e1603301.