(176a) Development of Algorithms for Reinforcement Learning Augmented Model Predictive Control | AIChE

(176a) Development of Algorithms for Reinforcement Learning Augmented Model Predictive Control

Authors 

Hedrick, E. - Presenter, West Virginia University
Reynolds, K., West Virginia University
Bhattacharyya, D., West Virginia University
Zitney, S., National Energy Technology Laboratory
Omell, B. P., National Energy Technology Laboratory
Reinforcement learning (RL) is a powerful machine learning approach for learning via direct interaction with a system. While RL is finding application in many areas of practice, there are still considerable opportunities in exploiting RL for process control. One of the difficulties in applying RL to process control is that the states and actions for process systems are typically continuous, rather than discrete as in many applications where RL has been successfully applied. Some efforts have been made in direct application of RL to process control, but these approaches can often suffer computationally due to their large system sizes [1], [2]. One promising application of RL is to augment model predictive control (MPC) by taking advantage of the similarities and compatibilities of RL and MPC [3], [4]. To this end, the main focus in the existing literature on applying RL to MPC has been on updating the control model with RL [5]–[7]. However, there are considerable opportunities in augmenting MPC with RL not only by updating the model but by cooperative and synergistic implementation of RL with MPC in multiple ways, which is the focus of this work.

In this work, two novel RL-based MPCs are presented. The first MPC directly combines the RL action-value function with MPC by using it as the MPC objective, thus maximizing the expected reward across the prediction horizon of the controller. This approach is attractive because it allows for the combination of the adaptability of RL with the explicit constraint handling of MPC but does require traditional optimization methods to be used online. In this controller, the state–action–reward–state–action with eligibility traces - SARSA(λ) - RL algorithm is used to update the action-value function based on temporal difference. To ensure exploration, the proposed policy is to use ε-MPC, where the control move provided by MPC is taken with probability ε, otherwise a random control move is selected. To ensure stability under exploration, the explored control moves are selected from a stable set of action trajectories constructed a priori.

The second controller focuses on the application of actor-critic RL inspired by an MPC. While the actor-critic structure does not need a predetermined policy, MPC can be leveraged to improve the performance of the learning. First, the agent’s value function and the parameterized policy are treated as two (optionally deep) recurrent neural networks. Clearly, random initialization of the policy would not be acceptable for application to process systems. However, similar to explicit MPC, the optimal control move of the MPC can be computed offline and used to initialize the policy via supervised learning. This algorithm also facilitates using a model similar to MPC to perform policy rollouts over a given horizon to improve the rate of convergence of the policy and state-value approximators.

These RL-augmented MPC algorithms are applied to a classic nonlinear chemical reactor as well as for the challenging control of load and main steam temperature and pressure for a supercritical pulverized coal power plant. Application is shown for both episodic and continuing cases, showing the flexibility of the algorithms under simple modifications. The results show that compared to traditional linear and nonlinear MPC, the RL-MPC algorithms improve control performance, especially when the system faces similar control tasks. The study also shows where improvement in computational time would be desired for real-life application of these algorithms.

[1] P. Slade, Z. N. Sunberg, and M. J. Kochenderfer, “Estimation and Control Using Sampling-Based Bayesian Reinforcement Learning,” Jul. 2018, [Online]. Available: http://arxiv.org/abs/1808.00888.

[2] Y. Kim and J. M. Lee, “Model-based reinforcement learning for nonlinear optimal control with practical asymptotic stability guarantees,” AIChE J., vol. 66, no. 10, Oct. 2020, doi: 10.1002/aic.16544.

[3] J. Shin, T. A. Badgwell, K. H. Liu, and J. H. Lee, “Reinforcement Learning – Overview of recent progress and implications for process control,” Comput. Chem. Eng., vol. 127, pp. 282–294, Aug. 2019, doi: 10.1016/j.compchemeng.2019.05.029.

[4] D. Görges, “Relations between Model Predictive Control and Reinforcement Learning,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 4920–4928, Jul. 2017, doi: 10.1016/j.ifacol.2017.08.747.

[5] J. E. Morinelly and B. E. Ydstie, “Dual MPC with Reinforcement Learning,” 2016.

[6] M. Zanon, S. Gros, and A. Bemporad, “Practical reinforcement learning of stabilizing economic MPC,” in 2019 18th European Control Conference, ECC 2019, Jun. 2019, pp. 2258–2263, doi: 10.23919/ECC.2019.8795816.

[7] S. Gros and M. Zanon, “Data-driven economic NMPC using reinforcement learning,” IEEE Trans. Automat. Contr., vol. 65, no. 2, pp. 636–648, Feb. 2020, doi: 10.1109/TAC.2019.2913768.