(197aw) ML-SAFT: A Machine Learning Framework for PCP-SAFT Parameter Prediction | AIChE

(197aw) ML-SAFT: A Machine Learning Framework for PCP-SAFT Parameter Prediction


Lapkin, A. A., University of Cambridge
Raßpe-Lange, L., RWTH Aachen University
Leonhard, K., RWTH Aachen University
Mitsos, A., RWTH Aachen University
Fast and accurate prediction of fluid-phase thermodynamics is an important aspect of molecular discovery and process development. For example, solubility is an essential molecular descriptor in many drug discovery programs, and vapor liquid equilibrium is often a key consideration during the development of separation processes. Over the last fifty years, a variety of predictive thermodynamic methods have been developed ranging from group contribution methods[1, 2, 3] to quantum chemical simulations[4] to machine learning methods.[5, 6, 7, 8] However, there is still a need for methods that can extend to a wide range of compounds without significant tuning from the end user. Currently, group contribution methods require careful and often manual identification of predefined functional groups, while existing quantum mechanical (QM) methods often require significant expertise and computational cost. Machine learning methods have demonstrated promising results in the prediction of thermodynamic quantities such as activity coefficients, yet many lack the thermodynamic consistency of classical thermodynamic models.[9] Recently, it has been shown that using a machine learning model to predict the parameters of a Gibbs Excess model can overcome the challenges of including thermodynamic consistency inside neural networks.[10] We sought to apply the same concept of predicting parameters to an established Equation of State (EoS).

In this work, we develop ML-SAFT, a framework for predicting parameters of the PCP-SAFT EoS using machine learning.[11, 12] We were interested in the PCP-SAFT EoS because it can be used for a wide variety of thermodynamic prediction tasks including vapor liquid equilibrium,[11] solubility,[13] and surface tension,[14] yet each new molecule needs to be parametrized by regression to experimental data. To enable training of machine learning models, ML-SAFT includes the largest database of regressed PCP-SAFT parameters published in the literature (969 molecules) and a set of machine learning models trained on this dataset. We extract data from the Dortmund Databank[15] and develop a robust regression method to determine pure component PCP-SAFT parameters from experimental vapor pressure and liquid density data. Within ML-SAFT, we train random forests,[16] feed forward networks and message passing neural networks (MPNNs)[17] to predict the regressed PCP-SAFT parameters.

Our results show that random forests obtain the most accurate predictions of the regressed PCP-SAFT parameters. Furthermore, the best prediction of vapor pressure in terms of the average absolute deviation percentage (% AAD) on unseen molecules is obtained from the random forest. However, the best results on density predictions are obtained with parameters predicted by a MPNN. We attribute this difference to the increased representation capability of the MPNN for polar molecules, which we find to be important for density predictions. We also compare ML-SAFT models to two existing predictive PCP-SAFT models: SEPP[4] and group contribution PC-SAFT.[2] We find that ML-SAFT makes accurate predictions for a wider range of molecules than both methods while maintaining computational efficiency.

Overall, our work demonstrates that machine learning is a powerful tool for PCP-SAFT parameter prediction. We foresee that the results shown in this work can form a baseline for future work that explores multi-component mixture predictions using PCP-SAFT.

[1] A. Fredenslund, R. L. Jones, J. M. Prausnitz, Group-contribution estimation of activity coefficients in nonideal liquid mixtures, AiChE Journal 21 (6) (1975) 1086–1099. doi:10.1002/aic.690210607. URL https://doi.org/10.1002/aic.690210607
[2] E. Sauer, M. Stavrou, J. Gross, Comparison between a homo- and a heterosegmented group contribution approach based on the perturbedchain polar statistical associating fluid theory equation of state, Industrial and Engineering Chemistry Research 53 (38) (2014) 14854–14864. doi:10.1021/ie502203w.
URL https://doi.org/10.1021/ie502203w
[3] D. Constantinescu, J. Gmehling, Further development of modified UNIFAC (dortmund): Revision and extension 6, Journal of Chemical and Engineering Data 61 (8) (2016) 2738–2748. doi:10.1021/acs.jced.6b00136.
URL https://doi.org/10.1021/acs.jced.6b00136
[4] S. Kaminski, K. Leonhard, SEPP: Segment-based equation of state parameter prediction, Journal of Chemical and Engineering Data 65 (12) (2020) 5830–5843. doi:10.1021/acs.jced.0c00733.
URL https://doi.org/10.1021/acs.jced.0c00733
[5] J. Habicht, C. Brandenbusch, G. Sadowski, Predicting PC-SAFT purecomponent parameters by machine learning using a molecular fingerprint as key input, Fluid Phase Equilibria 565 (2023) 113657. doi:10.1016/j.fluid.2022.113657.
URL https://doi.org/10.1016/j.fluid.2022.113657
[6] F. Jirasek, R. A. S. Alves, J. Damay, R. A. Vandermeulen, R. Bamler, M. Bortz, S. Mandt, M. Kloft, H. Hasse, Machine learning in thermodynamics: Prediction of activity coefficients by matrix completion, The Journal of Physical Chemistry Letters 11 (3) (2020) 981–985. doi:10.1021/acs.jpclett.9b03657.
URL https://doi.org/10.1021/acs.jpclett.9b03657
[7] E. I. S. Medina, S. Linke, M. Stoll, K. Sundmacher, Graph neural networks for the prediction of infinite dilution activity coefficients, Digital Discovery 1 (3) (2022) 216–225. doi:10.1039/d1dd00037c. URL https://doi.org/10.1039/d1dd00037c
[8] J. G. Rittig, K. Ben Hicham, A. M. Schweidtmann, M. Dahmen, A. Mitsos, Graph neural networks for temperature-dependent activity coefficient prediction of solutes in ionic liquids, Computers & Chemical Engineering 171 (2023) 108153. doi:10.1016/j. compchemeng.2023.108153. URL https://doi.org/10.1016/j.compchemeng.2023.108153
[9] K. C. Felton, H. Ben-Safar, A. Lapkin, DeepGamma: A deep learning model for activity coefficient prediction (2022).
[10] B. Winter, C. Winter, T. Esper, J. Schilling, A. Bardow, SPT-NRTL: A physics-guided machine learning model to predict thermodynamically consistent activity coefficients, Fluid Phase Equilibria 568 (2023) 113731. doi:10.1016/j.fluid.2023.113731.
URL https://doi.org/10.1016/j.fluid.2023.113731
[11] J. Gross, G. Sadowski, Perturbed-chain SAFT: an equation of state based on a perturbation theory for chain molecules, Industrial and Engineering Chemistry Research 40 (4) (2001) 1244–1260. doi:10.1021/ie0003887.
URL https://doi.org/10.1021/ie0003887
[12] J. Gross, J. Vrabec, An equation-of-state contribution for polar components: Dipolar molecules, AIChE Journal 52 (3) (2006) 1194–1204. doi:10.1002/aic.10683.
URL https://doi.org/10.1002/aic.10683
[13] M. Klajmon, Investigating various parametrization strategies for pharmaceuticals within the PC-SAFT equation of state, Journal of Chemical &amp Engineering Data 65 (12) (2020) 5753–5767. doi:10.1021/acs.jced.0c00707.
URL https://doi.org/10.1021/acs.jced.0c00707
[14] P. Rehner, J. Gross, Multiobjective optimization of PCP-SAFT parameters for water and alcohols using surface tension data, Journal of Chemical &amp Engineering Data 65 (12) (2020) 5698–5707. doi:10.1021/acs.jced.0c00684.
URL https://doi.org/10.1021/acs.jced.0c00684
[15] Dortmund databank (2022).
URL www.ddbst.com
[16] L. Breiman, Random forests, Machine Learning 45 (1) (2001) 5–32.
[17] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, G. E. Dahl, Neural message passing for quantum chemistry, in: D. Precup, Y. W. Teh (Eds.), Proceedings of the 34th International Conference on Machine Learning, Vol. 70 of Proceedings of Machine Learning Research, PMLR, 2017, pp. 1263–1272.
URL https://proceedings.mlr.press/v70/gilmer17a.html