(199c) Self-Optimization of Chemical Reaction Under Dynamic Flow Conditions Using Reinforcement Learning | AIChE

(199c) Self-Optimization of Chemical Reaction Under Dynamic Flow Conditions Using Reinforcement Learning

Authors 

Benyahia, B., Massachusetts Institute of Technology
Rielly, C. D., Loughborough University
The development of robust and cost-effective synthetic pathways is critical in many industries such as the pharmaceutical and chemical sectors. Continuous manufacturing, and flow chemistry have gained wider adoption within these industries, over the last decade, due to their tremendous advantages compared to batch processing, which include cost-efficiency, reduced environmental footprints, and resilience (Masci et al. 2013). Continuous reactors are particularly more effective with the integration of real-time process analytical tools and advanced optimization and flow automation, which is also known as self-optimization. Based on real time monitoring of the reaction key performance indicators and manipulation of the reaction conditions, the self-optimization algorithm can drive the reaction towards the optimal conditions through a sequence of well-designed experiments. Self-optimization may be designed to address steady state or/and transient performance pertaining to startup, system transition, and shutdown (Wyvratt et al 2019, McMullen and Wyvratt 2023).

The advent of Industry 4.0 and inherent holistic digitalization opened unprecedented opportunities to develop resilient and smart manufacturing platforms in various industrial sectors. The achievement of these objectives requires an effective ecosystem comprising products and processes digital twins, connected plants, advanced process analytical technologies and automation, etc. Most importantly, it requires self-optimisation capabilities to optimize /reoptimize plant performance in real time and deliver seamless and reliable operation with built-in product quality assurance. Self-optimization may also play a crucial role at early development stage with the development of smarter design of experiments to generate information rich data and reliably drive the process to optimal performance. Several algorithms and methodologies have been development to achieve self and real time optimization. More recently, artificial intelligence and machine learning techniques, particularly Reinforcement Learning (RL) received increased attention to addresses sequential decision-making problems commonly encountered in dynamic optimization and tracking control. RL has been successfully implemented in various research areas and more recently in reaction optimization (Zhou et al. 2017, Nuemann et al 2021).

In this research, a model-based reinforcement learning (RL) strategy is proposed to self-optimize the performance of a flow reaction system which includes the maximization of yield, purity, and selectivity. The synthesis of N-Benzylidenebenzylamine in a tubular reactor is used as a case study. Firstly, a mathematical model was used to train the RL agent where several manipulated variables were considered namely temperature, residence time, and reactants ratio. Deep Deterministic Policy Gradient (DDPG) and Deep Q-learning is implemented to identify the best actions to each reaction stage and the performance of both the agents were compared. To improve the learning and minimize training computational cost different reward shaping and transfer learning strategies were used (Benyahia et al. 2021). The performance designed RL strategies also compared with gradient based and gradient free methods.

Abstract keywords: Deep Q-learning, Deep Deterministic Policy Gradient, Reinforcement Learning, Flow Chemistry, Reaction Optimization, Dynamic Flow Optimization