(543d) Computer-Aided Fuel Design with Generative Graph Machine Learning | AIChE

(543d) Computer-Aided Fuel Design with Generative Graph Machine Learning


Rittig, J. G. - Presenter, RWTH Aachen University
Ritzert, M., RWTH Aachen University
Schweidtmann, A. M., Delft University of Technology
Winkler, S., RWTH Aachen University
Weber, J. M., University of Cambridge
Morsch, P., RWTH Aachen University
Heufer, K. A., RWTH Aachen University
Grohe, M., RWTH Aachen University
Mitsos, A., RWTH Aachen University
Dahmen, M., FZ Jülich
Fuel components that enable higher engine efficiency are of crucial importance for reducing carbon emissions in the transportation sector and their identification is actively pursued by means of computer-aided molecular design (CAMD), e.g., in [1-3]. CAMD employs computational methods for the generation of candidate molecules and the prediction of application-relevant properties from molecular structure. For the latter, quantitative structure-property relationships (QSPRs) are typically employed which first decompose the molecular structure into structural groups or compute molecular descriptors and then correlate the number of occurrences of these groups or the molecular descriptor values to the physicochemical properties of interest. The generation of candidate structures in CAMD usually builds on the concept that structural groups can be combined into chemically feasible molecules with desired properties, e.g., by using concepts from evolutionary theory [4] or by formulating and solving a mathematical program [5,6].

Recently, novel machine learning (ML) methods in the form of generative graph-ML models and graph neural networks have been utilized for CAMD [7,8]. These ML methods directly operate on a representation of molecules as graphs and thus circumvent the need for selecting meaningful structural groups or molecular descriptors. Instead, generative graph-ML models are trained on a data set of molecules in an unsupervised manner to generate new molecules not included in the data set. Specifically, they learn to generate molecules from a continuous latent space, overcoming discrete representations in the form of structural groups used in well-established CAMD methods and enabling the application of continuous optimization approaches [7,8]. Furthermore, graph neural networks are able to learn molecular properties directly from the molecular graph [9,10] and thus allow for end-to-end property prediction. However, applications of graph-ML-based CAMD have mainly focused on drug discovery so far; chemical engineering applications, including model-based fuel design, are widely missing.

We present a graph-ML CAMD framework for designing fuel components with desirable research octane number (RON) and octane sensitivity (OS) for increased engine efficiency. Our framework utilizes generative graph-ML models, graph neural networks, and optimization for strategic sampling of candidate structures from a continuous latent molecular space. We analyze the effect of different generative graph-ML models, including variational autoencoders and generative adversarial networks, cf. [11-13], on the identified candidate molecules. Further, we explore different strategies for optimization within the continuous molecular space, i.e., Bayesian optimization and genetic algorithms. The results show that our framework can find well-established fuel components as well as new candidate molecules that require further investigation. By experimentally assessing one novel candidate, we demonstrate the importance of experimental validation in model-based fuel design and highlight the need for additional RON/OS training data for a broader range of molecular classes. Extension of our graph-ML CAMD framework to other chemical engineering applications is straight-forward. Publication of the models and codes is currently in preparation.

[1] Dahmen, M., & Marquardt, W. (2016). Model-based design of tailor-made biofuels. Energy & Fuels, 30(2), 1109-1134.

[2] König, A., Siska, M., Schweidtmann, A. M., Rittig, J. G., Viell, J., Mitsos, A., & Dahmen, M. (2021). Designing production-optimal alternative fuels for conventional, flexible-fuel, and ultra-high efficiency engines. Chemical Engineering Science, 237, 116562.

[3] Li, R., Herreros, J. M., Tsolakis, A., & Yang, W. (2022). Integrated machine learning-quantitative structure property relationship (ML-QSPR) and chemical kinetics for high throughput fuel screening toward internal combustion engine. Fuel, 307, 121908.

[4] Douguet, D., Munier-Lehmann, H., Labesse, G., & Pochet, S. (2005). LEA3D: a computer-aided ligand design for structure-based drug design. Journal of Medicinal Chemistry, 48(7), 2457-2468.

[5] Zhang, L., Cignitti, S., & Gani, R. (2015). Generic mathematical programming formulation and solution for computer-aided molecular design. Computers & Chemical Engineering, 78, 79-84.

[6] Austin, N. D., Sahinidis, N. V., & Trahan, D. W. (2016). Computer-aided molecular design: An introduction and review of tools, applications, and solution techniques. Chemical Engineering Research and Design, 116, 2-26.

[7] Elton, D. C., Boukouvalas, Z., Fuge, M. D., & Chung, P. W. (2019). Deep learning for molecular design—a review of the state of the art. Molecular Systems Design & Engineering, 4(4), 828-849.

[8] Alshehri, A. S., Gani, R., & You, F. (2020). Deep learning and knowledge-based methods for computer-aided molecular design—toward a unified approach: State-of-the-art and future directions. Computers & Chemical Engineering, 141, 107005.

[9] Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message passing for quantum chemistry. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:1263-1272.

[10] Schweidtmann, A. M., Rittig, J. G., König, A., Grohe, M., Mitsos, A., & Dahmen, M. (2020). Graph neural networks for prediction of fuel ignition quality. Energy & fuels, 34(9), 11395-11407.

[11] Jin, W., Barzilay, R., & Jaakkola, T. (2018). Junction tree variational autoencoder for molecular graph generation. Proceedings of the 35th International Conference on Machine Learning, vol. 80 of Proceedings of Machine Learning Research (PMLR), pp. 3632–3648, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018

[12] Kajino, K. (2019). Molecular Hypergraph Grammar with Its Application to Molecular Optimization. Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research (PMLR), pp. 3183–3191, Long Beach, CA, USA, June 09–15, 2019.

[13] De Cao, N., & Kipf, T. (2018). MolGAN: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973.