(433f) Next-Generation Hybrid Models: Combining Attention Mechanisms and Lstm for Improved Predictions and Process Control in the Chemical Industry | AIChE

(433f) Next-Generation Hybrid Models: Combining Attention Mechanisms and Lstm for Improved Predictions and Process Control in the Chemical Industry

Authors 

Shah, P. - Presenter, Texas A&M University
Sitapure, N., Texas A&M University
Kwon, J., Texas A&M University
The chemical industry is undergoing a significant transformation, which is driven by recent advancements in artificial intelligence (AI) and machine learning (ML) techniques that harness large quantities of process data from chemical plants. At the forefront of this revolution are hybrid models, which have gained substantial popularity over purely data-driven approaches such as deep and recurrent neural networks (DNN and RNN) and long short-term memory (LSTM) networks [1, 2]. This is because hybrid models combine a first-principles model with a suitable data-driven approach, benefiting from both a priori system information and feature-rich process data to synergistically provide better predictions than purely data-driven ML models [3, 4]. Although the literature is rife with DNN-based hybrid models for fermentation, paper and pulp process, hydraulic fracking, and other chemical systems that show good predictive performance [5,6], these models have limitations. Most chemical systems exhibit process uncertainties, such as sensor noise, feed and temperature fluctuations, and changing kinetics, which can distort the measurements fed to the hybrid model and result in a noticeable plant-model mismatch. For example, in a fermentation process, the substrate feed slurry shows the variance in solute content, and bacterial actions change in response to state evolution, leading to time-varying kinetics. As a result, a DNN-based hybrid model that does not accurately capture these temporal dependencies will display a plant-model mismatch with process operation. Therefore, the development of next-generation hybrid models that can (a) account for these process uncertainties and (b) accurately predict these time-varying parameters is necessary.

Recently, attention-based ML models have been in the spotlight due to their remarkable ability to establish strong correlations between input and outputs, even in the presence of system noise or uncertainties. These models adeptly focus on short- and long-term dependences in the evolution of system states [7,8]. In essence, the attention mechanism performs a scaled-dot product calculation between various input vectors, enabling it to selectively pay attention to significant long-term (e.g., concentration evolution) and short-term (e.g., sudden change in temperature due to control actions) process alterations by assigning higher attention scores to such instances. As a result, the attention mechanism serves as a filtering mechanism to dynamically handle process uncertainties and data noise by effectively dampening weak correlations and amplifying strong interactions between the system states. On the other hand, LSTM-based sequential time-series models have shown superior predictive performance due to their ability to explicitly consider the time evolution of system states (i.e., battery dynamics, stock market estimates, energy forecasting) as opposed to DNNs. This is because, LSTM utilizes several internal gates to dynamically update, forget, and store relevant changes in the state dynamics, whereas DNNs tend to assign roughly equal weight to all input channels [9]. Thus, considering the abovementioned details, a combination of attention mechanism and LSTM can pose a highly effective solution that (a) can account for process uncertainties by selective attention to changes in system dynamics, and (b) accurately predict time-varying parameters.

To this end, we propose a novel attention-LSTM-based hybrid model for a complex, non-trivial fed-batch fermentation process. Specifically, the input to the data-driven module of the hybrid model consists of state measurements for the previous time steps. This input is sent through an encoder module to lift the states into higher dimensions, and then an attention mechanism with a subsequent LSTM layer is applied to obtain time-series predictions of uncertain parameters for the next steps. The uncertain parameters are a lumped representation of different process variations, such as varying bacterial kinetics, and feed and temperature fluctuations, and are represented by the most sensitive kinetic parameters determined through global sensitivity analysis [6]. Next, the predicted uncertain parameters are then fed to the first-principles model, which includes mass and energy balance equations, concentration dynamics, and kinetic equations to obtain state predictions for the next time steps. The training and validation dataset is generated by simulating a high-fidelity (HF) model of a fermenter system for over 100 different arbitrarily initialized operating conditions, such as temperature, substrate flow rate, and catalyst rate. Additionally, the prediction results (i.e., biomass, substrate, oxygen concentration, and product) were compared against an existing DNN-based hybrid model to highlight the superior performance of the proposed attention-LSTM-based hybrid model. Finally, the developed hybrid model is incorporated within a model predictive controller (MPC) to achieve set-point targets for product amount and operating cost by determining optimal input profiles for feed flow rate and temperature. In a nutshell, the combined benefits of attention mechanism and LSTM-based sequential modeling give rise to the next generation of hybrid models that can regulate process uncertainties while providing accurate process predictions. Moreover, the current work lays the groundwork for developing attention-based hybrid models for more complex chemical processes and further gaining insights regarding unknown process uncertainties, leading to more accurate predictions and intelligent control.


References:

  1. Sansana, J., Joswiak, M. N., Castillo, I., Wang, Z., Rendall, R., Chiang, L. H., & Reis, M. S. (2021). Recent trends on hybrid modeling for Industry 4.0. Computers & Chemical Engineering, 151, 107365.
  2. Chen, Yingjie, and Marianthi Ierapetritou. "A framework of hybrid model development with identification of plant‐model mismatch." AIChE Journal 66.10 (2020): e16996
  3. Thompson, M. L., & Kramer, M. A. (1994). Modeling chemical processes using prior knowledge and neural networks. AIChE Journal, 40(8), 1328-1340.
  4. Sharma, Niket, and Y. A. Liu. "A hybrid science‐guided machine learning approach for modeling chemical processes: A review." AIChE Journal 68.5 (2022): e17609.
  5. Shah, P., Sheriff, M. Z., Bangi, M. S. F., Kravaris, C., Kwon, J. S. I., Botre, C., & Hirota, J. (2022). Deep neural network-based hybrid modeling and experimental validation for an industry-scale fermentation process: Identification of time-varying dependencies among parameters. Chemical Engineering Journal, 135643.
  6. Bangi, Mohammed Saad Faizan, and Joseph Sang-Il Kwon. "Deep hybrid modeling of a chemical process: Application to hydraulic fracturing." Computers & Chemical Engineering 134 (2020): 106696.
  7. Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
  8. Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
  9. Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.