(415e) A Hybrid Model Feature Relevance Analysis for Mechanistic Model Refinement Suggestions | AIChE

(415e) A Hybrid Model Feature Relevance Analysis for Mechanistic Model Refinement Suggestions

Authors 

Deng, Y. - Presenter, Auburn University
Cremaschi, S., Auburn University
Eden, M., Auburn University
Cheng, S., Chevron Energy Technology Company
Gao, H., Chevron Energy Technology Company
The hybrid models combine first principle knowledge with data, which is used to infer the missing information due to an insufficient understanding of the mechanistic details [1]. The mechanistic (MH) models rely on process knowledge, while the data-driven (DD) models depend on the information obtained from process data. As a combination, the hybrid models enhance the modeling methodology by incorporating the physical significance and generalization capabilities of the mechanistic models with the data-driven models’ ability to capture knowledge that is too complex to be captured by the mechanistic model [1]. There are numerous hybrid modeling applications in chemical engineering. For example, Zahedi et al. [2] developed an ethylene oxide fixed bed reactor model with reaction kinetics estimated by an artificial neural network (ANN). Tsen et al. [3] used a hybrid ANN model to perform predictive control of quality in a polymerization process. Mahalec and Sanchez [4] modeled a distillation tower incorporating a partial least square (PLS) regression and fundamental conservation balances. Bangi and Kwon [5] integrated the first principles models with a deep neural network and applied them to a hydraulic fracturing process to predict unobserved process parameters.

There are different ways to construct hybrid models. One of them is the serial structure, where the data-driven model is used to supplement the mechanistic model where it is missing the process mechanism [1]. Given the experimental measurements, the difference between the measurements and the mechanistic model predictions, called model discrepancy [6], is modeled by the data-driven model. We hypothesize that, in a series structure hybrid model, the data-driven model development studies and the resulting model can be utilized to refine a mechanistic model in addition to serving as an approach to enhance the prediction accuracy. In a series structure, the information captured by the data-driven model from the data represents the information that the mechanistic model fails to capture. Suppose a feature is important in estimating the model discrepancy but makes no significant contribution to the mechanistic model predictions. Such an outcome suggests that the mechanistic model does not correctly incorporate the impact of that feature. Hence, the sensitivity of the data-driven model output to input features contains evidence regarding information missing from the mechanistic model. Identification of these features provides the mechanistic model developers with potential avenues for refinement by pointing out the equations and parameters related to these identified features.

Feature selection is a commonly used approach to selecting a subset of variables with better representation of predictive information in building data-driven models. For this reason, most machine learning techniques have an embedded feature selection approach [7]. The other two widely-used feature selection approaches are wrapper methods and filtering approaches [8]. All feature selection approaches provide relative importance of the inputs for determining the outputs. These importance values combined with the sensitivity of the mechanistic model predictions to input variables can be used for mechanistic model refinement.

This study introduces a framework that provides suggestions for refining a mechanistic model from an input-variable perspective using a feature relevance study with a hybrid model. The framework starts by constructing a serial structure hybrid modeling framework. The input feature relevances of the mechanistic and the data-driven model predictions are obtained using two approaches. For the mechanistic model, a Sobol sensitivity analysis [10] is performed. Gaussian Process Regression (GPR) [9] is used as the data-driven model to estimate the model discrepancy. A Gaussian Process embedded feature selection approach, automatic relevance determination (ARD) [11] with sensitivity analysis, is applied to determine the relevant features for the model discrepancy. The framework compares the Sobol sensitivity analysis and the ARD results to determine which features the mechanistic model fails to capture information from. Finally, the mechanistic model refinement suggestions for which equations and parameters should be paid more attention is inferred from the comparison results.

This proposed framework is also applied to nine models that predict the liquid entrainment fraction in two-phase flow. The results reveal that equations and parameters related to gas properties should be further studied for vertical pipeline orientation models, while the ones related to liquid properties need to be refined for horizontal pipeline orientation models. For inclined pipeline orientation models, liquid density is a feature that needs to be paid more attention to for improving model predictions.

References

  1. Zendehboudi, S.; Rezaei, N.; Lohi, A. Applications of hybrid models in chemical, petroleum, and energy systems: A systematic review. Appl. Energy 2018, 228, 2539–2566.
  2. Zahedi, G.; Lohi, A.; Mahdi, K.A. Hybrid modeling of ethylene to ethylene oxide heterogeneous reactor. Fuel Process. Technol. 2011, 92, 1725–1732.
  3. Tsen, A.Y.; Jang, S.S.; Wong, D.S.H.; Joseph, B. Predictive control of quality in batch polymerization using hybrid ANN models. AIChE J. 1996, 42, 455–465.
  4. Mahalec, V.; Sanchez, Y. Inferential monitoring and optimization of crude separation units via hybrid models. Comput. Chem. Eng. 2012, 45, 15–26.
  5. Bangi, M.S.F.; Kwon, J.S. Il Deep hybrid modeling of chemical process: Application to hydraulic fracturing. Comput. Chem. Eng. 2020, 134.
  6. Jiang, Z.; Chen, W.; Fu, Y.; Yang, R.J. Reliability-based design optimization with model bias and data uncertainty. SAE Int. J. Mater. Manuf. 2013, 6.
  7. Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28.
  8. Rodriguez-Galiano, V.F.; Luque-Espinar, J.A.; Chica-Olmo, M.; Mendes, M.P. Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods. Sci. Total Environ. 2018, 624, 661–672.
  9. Williams, C.K.I.; Rasmussen, C.E. Gaussian Processes for Machine Learning; 2004; Vol. 14; ISBN 026218253X.
  10. Homma, T.; Saltelli, A. Importance measures in global sensitivity analysis of nonlinear models. Reliab. Eng. Syst. Saf. 1996, 52, 1–17.
  11. Blix, K.; Eltoft, T. Evaluation of Feature Ranking and Regression Methods for Oceanic Chlorophyll-a Estimation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1403–1418.