2021 AIChE Virtual Spring Meeting and 17th Global Congress on Process Safety

(138b) Applying Reinforcement Learning for Batch Trajectory Optimization in an Industrial Chemical Process

Checkout You must be logged in to view this content. Log in now.

Pricing

Individuals

List Price	225.00
AIChE Pro Members	150.00
AIChE Emeritus Members	105.00
Employees of CCPS Member Companies	150.00
AIChE Graduate Student Members	Free
AIChE Undergraduate Student Members	Free

Authors

Ricardo Rendall - Presenter, University of Coimbra

Yan Ma, Louisiana State University

Ivan Castillo, Dow Inc.

Zhenyu Wang, Dow Inc.

Leo Chiang, Dow Inc.

David Bentley, The Dow Inc.

You Peng, The Dow Chemical Co

Reinforcement Learning (RL) is one of the three basic machine learning paradigms, alongside supervised and unsupervised learning. RL focuses on training an agent to learn an optimal policy, maximizing cumulative rewards from the environment of interest [1]. The recent developments in model-free RL have achieved remarkable success in various process optimization and control tasks, where multiple applications have been reported in the literature, including parameter tuning for existent single PID control loops [2], supply chain management [3] and robotics operations [4].

There are multiple challenges when applying RL in an industrial setting, but the main one concerns the training of the agent. In the learning phase, the agent estimates and improves its policy and value functions through a large number of trial and error iterations. Many input-output experimentations are required, which is obviously not feasible in an industrial chemical plant. As an alternative, a model of the plant can be utilized for training the agent and provide the input-output data. Both first principles and data-driven models are suitable, and both options are explored in this work.

In this work, we test two state-of-the-art RL approaches to optimize an industrial batch case study: Proximal Policy Optimization (PPO), Soft Actor Critic (SAC) and Advantage Actor Critic (A2C). These RL methods optimize the batch process by controlling the reaction conditions and maximizing the total reward (the reward is defined as the profit margin, subject to certain process and safety constraints). The batch optimal trajectories are compared in two scenarios. The first scenario uses, as an environment, a first principles model for training the agent. In the second scenario, a surrogate Long-Short-Term-Memory (LSTM) model is utilized, which combines both historical data from the reactorâs operation and the first principle model estimates. The use of the LSTM is motivated by the fact that it helps mitigate accuracy issues from the first principle model by relying on the relationships found in the plant data. The optimized trajectories were compared to the current trajectories, and the RL optimal batch profiles show a 3% increase in product profit.

References

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Badgwell, T. A., Liu, K. H., Subrahmanya, N. A., & Kovalski, M. H. (2019). U.S. Patent Application No. 16/218,650.
Gokhale, A., Trasikar, C., Shah, A., Hegde, A., & Naik, S. R. (2021). A Reinforcement Learning Approach to Inventory Management. In Advances in Artificial Intelligence and Data Engineering (pp. 281297). Springer, Singapore.
Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290.

Breadcrumb

2021 AIChE Virtual Spring Meeting and 17th Global Congress on Process Safety

(138b) Applying Reinforcement Learning for Batch Trajectory Optimization in an Industrial Chemical Process

Authors