(47h) AI-Based Hybrid Modelling for Chemicals-Based Product Design | AIChE

(47h) AI-Based Hybrid Modelling for Chemicals-Based Product Design


Venkatasubramanian, V. - Presenter, Columbia University
Gani, R., Technical University of Denmark
In chemicals-based product design, the use of model-based techniques such as computer-aided molecular and/or mixture design is quite common presently [1]. While these techniques have advanced and the models have become more versatile, their application range in terms of products that can be designed is still severely limited for two reasons: (i) the available data, and (ii) the application range of the models used for prediction of the product behavior and/or functions in terms of a set of identified thermo-physical properties of molecules and mixtures. Therefore, these techniques are able to design and/or analyze only a small fraction of the actual products that are in current use.

The most common class of property models in product design is the well-known group contribution (GC)-based models, because they are simple, easy to use, computationally inexpensive, have acceptable accuracy, and they are predictive in nature. However, their well-known limitations such as inability to handle complex molecular structures, isomer distinctions, etc., also limit the application range of product design techniques using them. As the product synthesis and/or design is iterative, use of models that are computationally expensive but have wider application range are practically infeasible. Considering that model (including data) interpretability is a continuum, the GC-based models are at one end and very complex models with millions of parameters (such as state-of-the-art deep learning models) are at the other end. Also, solution of product design problems requires finding alternatives that match a desired target defined in terms of a set of properties. That is, in addition to data and models, it is also necessary to have efficient pattern matching techniques to identify the promising alternatives. Note also that while the computational steps for different classes of products are similar, what is different are the molecule (or mixture) types, the properties that define their function in the product, the data (knowledge) available and the application ranges of the models used to estimate the needed properties.

Recently, the use of deep learning and knowledge-based methods for molecular design have been reviewed [2]. However, it has been suggested that the most useful data-driven methods are those that combine domain knowledge (in the form of symbolic information) with numeric machine learning [3]. Also, in the modelling area, development of more accurate and multi-dimensional models that are computationally expensive or suitable as black-box applications have been reported [4, 5].

In this work, we would like to propose a hybrid scheme that takes advantage of the latest developments in all kinds of modelling as well as data analysis techniques to extend the application range of product design methods and associated tools. For problems that do not require search of large amounts of data and/or do not involve too many dimensions (or parameters), we propose to use the latest models and updated databases through the already available tools for CAMD (molecule and/or mixture). However, for problems that require a wider search space and are potentially multi-dimensional, we propose to create very large databases with apriori generated molecular structures and properties estimated with the computationally expensive and/or ML-based black-box models. This means that the problem solution requires efficient and reliable pattern matching from within the very large databases of measured and generated but reliable and consistent data. The perspectives of this hybrid scheme will be illustrated through several conceptual but also practical product design problems.

Keywords: Product design, group-contribution, machine learning, artificial intelligence, data-analysis


  1. EN Pistikopoulos, A Barbosa-Povoa, JH Lee, R Misener, A Mitsos, GV Reklaitis, V. Venkatasubramanian, F You, R Gani, 2021, Process Systems Engineering–The Generation Next?, Computers & Chemical Engineering, 147, 107252
  2. AS Alshehri, R Gani, F You, 2020, Deep Learning and Knowledge-Based Methods for Computer Aided Molecular Design--Toward a Unified Approach: State-of-the-Art and Future Directions, Computers & Chemical Engineering, 141, 107005.
  3. V Venkatasubramanian, 2019, The promise of artificial intelligence in chemical engineering: Is it here, finally? AIChE Journal, 65 (2), 466-478.
  4. AS Alshehri, AK Tula, F You, R Gani, 2021, Next generation pure component property estimation models: With and without machine learning techniques. AIChE Journal, e17469 (in press).
  5. V Mann, K Brito, R Gani, V Venkatasubramanian, 2022, Hybrid, Interpretable Machine Learning for Thermodynamic Property Estimation using Grammar2vec for Molecular Representation, Fluid Phase Equilibria (submitted)