(138b) Applying Reinforcement Learning for Batch Trajectory Optimization in an Industrial Chemical Process
AIChE Spring Meeting and Global Congress on Process Safety
Thursday, April 22, 2021 - 1:55pm to 2:20pm
There are multiple challenges when applying RL in an industrial setting, but the main one concerns the training of the agent. In the learning phase, the agent estimates and improves its policy and value functions through a large number of trial and error iterations. Many input-output experimentations are required, which is obviously not feasible in an industrial chemical plant. As an alternative, a model of the plant can be utilized for training the agent and provide the input-output data. Both first principles and data-driven models are suitable, and both options are explored in this work.
In this work, we test two state-of-the-art RL approaches to optimize an industrial batch case study: Proximal Policy Optimization (PPO), Soft Actor Critic (SAC) and Advantage Actor Critic (A2C). These RL methods optimize the batch process by controlling the reaction conditions and maximizing the total reward (the reward is defined as the profit margin, subject to certain process and safety constraints). The batch optimal trajectories are compared in two scenarios. The first scenario uses, as an environment, a first principles model for training the agent. In the second scenario, a surrogate Long-Short-Term-Memory (LSTM) model is utilized, which combines both historical data from the reactorâs operation and the first principle model estimates. The use of the LSTM is motivated by the fact that it helps mitigate accuracy issues from the first principle model by relying on the relationships found in the plant data. The optimized trajectories were compared to the current trajectories, and the RL optimal batch profiles show a 3% increase in product profit.
- Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
- Badgwell, T. A., Liu, K. H., Subrahmanya, N. A., & Kovalski, M. H. (2019). U.S. Patent Application No. 16/218,650.
- Gokhale, A., Trasikar, C., Shah, A., Hegde, A., & Naik, S. R. (2021). A Reinforcement Learning Approach to Inventory Management. In Advances in Artificial Intelligence and Data Engineering (pp. 281297). Springer, Singapore.
- Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290.