(327d) Crystalgpt: Revolutionizing Crystallization Process Prediction and Control with a Multivariate Time-Series Approach Leveraging Transformer Networks | AIChE

(327d) Crystalgpt: Revolutionizing Crystallization Process Prediction and Control with a Multivariate Time-Series Approach Leveraging Transformer Networks

Authors 

Kwon, J. - Presenter, Texas A&M University
Sitapure, N., Texas A&M University
The past decade has seen a significant increase in the use of various machine learning (ML) techniques (e.g., deep and recurrent neural networks (DNN and RNN), and long-short-term-memory (LSTM) networks) for creating surrogate models for different chemical systems [1,2]. Although these models show good performance, they face two key challenges: (a) Accurately predicting long-term dependencies often encountered in systems with slow and complex dynamics, and (b) the system-specific nature of these models that makes it difficult to directly implement or repurpose a model for one system to another [3,4]. This has major implications for plant-wide operations in process control or monitoring, as the bespoke nature of different surrogate models makes them incompatible with each other due to different calibration or model architecture. Thus, to resolve this issue, a unified surrogate model that leverages the ‘transfer learning’ ability to achieve good predictive performance across a variety of systems is required. Very recently, transformer-based large-language models (LLMs) and vision-transformers have paved the way for groundbreaking applications like ChatGPT, CodeGPT, Dall-E, and others, providing a unified model within the natural language processing (NLP) and computer vision space [5,6]. This revolution is driven by two key factors: positional encoding (PE), and multiheaded attention mechanism (MAM), which together enhance transfer learning capabilities [7]. Specifically, PE and the MAM architecture enableeasy parallelization of model training on large datasets, and MAH learns the underlying syntax, grammar, and context in language [8]. Unfortunately, despite their remarkable predictive and transfer learning capabilities, the application of transformers to time-series prediction of complex chemical systems has not yet been realized in practice.

To address this knowledge gap, we developed a first-of-a-kind time-series transformer (TST) for accurate time-series prediction of complex multivariate chemical systems. More specifically, the TST model consists of N encoder blocks and M decoder blocks, each with n attention heads and an internal dimension of . We considered a non-trivial case study involving batch crystallization of various sugars and protein systems prevalent in the food and pharmaceutical industries. Training data was generated by simulating 20+ such crystal systems with varying nucleation (B) and growth rates (G) using a population-balance model (PBM) [9]. This large corpus of time-series data was used to train a TST model (named ‘CrystalGPT’) and tested on an unseen crystallization system in two ways: with and without finetuning. Visualization of the attention scores indicates that TST learns the underlying relationship between different states, mimicking the coupling of system states observed in dynamic models (i.e., mass and energy balance equations, growth kinetics, etc.). Thus, CrystalGPT can provide baseline predictions for an unseen crystal system without fine-tuning [9]. Additionally, fine-tuning CrystalGPT with a small amount of data from the unseen system boosts its predictive capabilities, resulting in a normalized mean-squared-error (NMSE) of and an value of 0.95+. Next, we also developed a first-of-a-kind TST-based set-point tracking model predictive controller (MPC) to further demonstrate the practical applicability of CrystalGPT for process monitoring and control applications. Also, CrystalGPT was benchmarked against existing state-of-the-art (SOTA) DNN and LSTM models, showcasing the superior transfer learning performance of transformer-based models. Finally, the current work provides a concrete foundation for developing unified surrogate models for different chemical systems (i.e., ReactorGPT, DistillationGPT, etc.) that exhibit remarkable transfer learning capabilities for direct application to chemical processes or next-level integration with first-principle models to developed TST-based hybrid models [10]. Given these exciting prospects, we are eager to see what the future holds as we continue to push the boundaries of what is possible with these ‘transformer-tive’ technologies.

Literature Cited:

  1. Wu, Zhe, David Rincon, and Panagiotis D. Christofides. "Process structure-based recurrent neural network modeling for model predictive control of nonlinear processes." Journal of Process Control 89 (2020): 74-84.
  2. Lima, Fernando Arrais RD, et al. "Development of a recurrent neural networks-based NMPC for controlling the concentration of a crystallization process." Digital Chemical Engineering 5 (2022): 100052.
  3. Shah, Parth, Hyun-Kyu Choi, and Joseph Sang-Il Kwon. "Achieving Optimal Paper Properties: A Layered Multiscale kMC and LSTM-ANN-Based Control Approach for Kraft Pulping." Processes 11.3 (2023): 809.
  4. Bhadriraju, Bhavana, et al. "Operable adaptive sparse identification of systems: Application to chemical processes." AIChE Journal 66.11 (2020): e16980.
  5. Brown, Tom, et al. "Language models are few-shot learners." Advances in neural information processing systems 33 (2020): 1877-1901.
  6. Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).
  7. Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
  8. Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
  9. Kit, Yuheng, and Musa Mohd Mokji. "Sentiment Analysis Using Pre-Trained Language Model With No Fine-Tuning and Less Resource." IEEE Access 10 (2022): 107056-107065.
  10. Lin, Fan, et al. "Kernel-based Hybrid Interpretable Transformer for High-frequency Stock Movement Prediction." 2022 IEEE International Conference on Data Mining (ICDM). IEEE, 2022.