(176a) Development of Algorithms for Reinforcement Learning Augmented Model Predictive Control

Conference

AIChE Annual Meeting

Year

2021

Proceeding

2021 Annual Meeting

Group

Computing and Systems Technology Division

Session

Advances in Process Control II

Time

Monday, November 8, 2021 - 3:30pm to 3:49pm

Authors

Hedrick, E. - Presenter, West Virginia University

Reynolds, K., West Virginia University

Bhattacharyya, D., West Virginia University

Zitney, S., National Energy Technology Laboratory

Omell, B. P., National Energy Technology Laboratory

Reinforcement learning (RL) is a powerful machine learning approach for learning via direct interaction with a system. While RL is finding application in many areas of practice, there are still considerable opportunities in exploiting RL for process control. One of the difficulties in applying RL to process control is that the states and actions for process systems are typically continuous, rather than discrete as in many applications where RL has been successfully applied. Some efforts have been made in direct application of RL to process control, but these approaches can often suffer computationally due to their large system sizes [1], [2]. One promising application of RL is to augment model predictive control (MPC) by taking advantage of the similarities and compatibilities of RL and MPC [3], [4]. To this end, the main focus in the existing literature on applying RL to MPC has been on updating the control model with RL [5]â€“[7]. However, there are considerable opportunities in augmenting MPC with RL not only by updating the model but by cooperative and synergistic implementation of RL with MPC in multiple ways, which is the focus of this work.

In this work, two novel RL-based MPCs are presented. The first MPC directly combines the RL action-value function with MPC by using it as the MPC objective, thus maximizing the expected reward across the prediction horizon of the controller. This approach is attractive because it allows for the combination of the adaptability of RL with the explicit constraint handling of MPC but does require traditional optimization methods to be used online. In this controller, the stateâ€“actionâ€“rewardâ€“stateâ€“action with eligibility traces - SARSA(Î») - RL algorithm is used to update the action-value function based on temporal difference. To ensure exploration, the proposed policy is to use Îµ-MPC, where the control move provided by MPC is taken with probability Îµ, otherwise a random control move is selected. To ensure stability under exploration, the explored control moves are selected from a stable set of action trajectories constructed a priori.

The second controller focuses on the application of actor-critic RL inspired by an MPC. While the actor-critic structure does not need a predetermined policy, MPC can be leveraged to improve the performance of the learning. First, the agentâ€™s value function and the parameterized policy are treated as two (optionally deep) recurrent neural networks. Clearly, random initialization of the policy would not be acceptable for application to process systems. However, similar to explicit MPC, the optimal control move of the MPC can be computed offline and used to initialize the policy via supervised learning. This algorithm also facilitates using a model similar to MPC to perform policy rollouts over a given horizon to improve the rate of convergence of the policy and state-value approximators.

These RL-augmented MPC algorithms are applied to a classic nonlinear chemical reactor as well as for the challenging control of load and main steam temperature and pressure for a supercritical pulverized coal power plant. Application is shown for both episodic and continuing cases, showing the flexibility of the algorithms under simple modifications. The results show that compared to traditional linear and nonlinear MPC, the RL-MPC algorithms improve control performance, especially when the system faces similar control tasks. The study also shows where improvement in computational time would be desired for real-life application of these algorithms.

[1] P. Slade, Z. N. Sunberg, and M. J. Kochenderfer, â€œEstimation and Control Using Sampling-Based Bayesian Reinforcement Learning,â€ Jul. 2018, [Online]. Available: http://arxiv.org/abs/1808.00888.

[2] Y. Kim and J. M. Lee, â€œModel-based reinforcement learning for nonlinear optimal control with practical asymptotic stability guarantees,â€ AIChE J., vol. 66, no. 10, Oct. 2020, doi: 10.1002/aic.16544.

[3] J. Shin, T. A. Badgwell, K. H. Liu, and J. H. Lee, â€œReinforcement Learning â€“ Overview of recent progress and implications for process control,â€ Comput. Chem. Eng., vol. 127, pp. 282â€“294, Aug. 2019, doi: 10.1016/j.compchemeng.2019.05.029.

[4] D. GÃ¶rges, â€œRelations between Model Predictive Control and Reinforcement Learning,â€ IFAC-PapersOnLine, vol. 50, no. 1, pp. 4920â€“4928, Jul. 2017, doi: 10.1016/j.ifacol.2017.08.747.

[5] J. E. Morinelly and B. E. Ydstie, â€œDual MPC with Reinforcement Learning,â€ 2016.

[6] M. Zanon, S. Gros, and A. Bemporad, â€œPractical reinforcement learning of stabilizing economic MPC,â€ in 2019 18th European Control Conference, ECC 2019, Jun. 2019, pp. 2258â€“2263, doi: 10.23919/ECC.2019.8795816.

[7] S. Gros and M. Zanon, â€œData-driven economic NMPC using reinforcement learning,â€ IEEE Trans. Automat. Contr., vol. 65, no. 2, pp. 636â€“648, Feb. 2020, doi: 10.1109/TAC.2019.2913768.

Topics

Computing and Systems Engineering

Energy

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

Upcoming Conferences & Events

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

2024 DIERS Virtual Spring Meeting

2024 Pacific Northwest Student Regional Conference

2024 Western Student Regional Conference

CCPS Middle East Regional Meeting

Hydrogen Fueling Station Safety

Streamlining Permit-to-Work Processes With a Digital Solution

6th Middle East Process Engineering Conference and Exhibition

CEP: April 2024

CEP: March 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.