2019 AIChE Annual Meeting

(243d) Learning and Adapting Model Predictive Controllers with Reinforcement Learning for Time-Varying Systems

Authors

Elijah Hedrick - Presenter, West Virginia University

Katherine Reynolds, West Virginia University

Parikshit Sarda, West Virginia University

Debangsu Bhattacharyya, West Virginia University

Stephen Zitney, National Energy Technology Laboratory

Benjamin P. Omell, National Energy Technology Laboratory

While model predictive control (MPC) has been a primary tool of the process control community¹, automatic learning of models and transitioning to previously-developed models, if available, is challenging for time-varying systems. The main challenge in transitioning to previously developed models lies in identifying the similarity of the current or immediate control problem with one or more previous control problem(s). Here we propose a novel MPC augmented with reinforcement learning (RL) for learning and adapting.

RL entails a Markov-decision process (MDP) whereby an actor applies an input to a system and a critic determines a reward based on the new state². Use of the RL with MPC in presence of parametric uncertainty for a linear system has been addressed³. For utilizing the evolving information about future uncertainty, a RL approach has been proposed by using the learned cost-to-go as the terminal penalty in a MPC⁴. However, one of the critical issue that can lead to poor performance of the MPC is model discrepancy. In this work, we propose a novel RL algorithm for learning as well as adapting the MPC.

Q-Learning, one of the RL methods, can be used to learn the state dynamics and the value function⁵. However, adapting the MPC model based on the Q-function for time-varying systems is impractical since the space of Q-function is infinite dimensional for a time-varying system. Here we use a BSS-ANOVA GP where the eigenfunctions in the Karhunen-LoÃ©ve (KL) expansion are used as the orthogonal basis functions. One of the key advantages of using a KL expansion with the GP model for the discrepancy function is that the stochasticity is represented by the discrepancy parameters, since the basis functions for each functional component do not change with the covariance function parameters. This translates to reduced computational costs. Residual analysis of the Bellmanâs optimality equation along with the policy gradient and actor-critic methods are then used for model adaptation. The value functions and the policy as a map of control actions are stored in compact clusters by using a subtractive clustering technique for unsupervised learning of unique, or core, control features. Cores are automatically updated as new information are gathered. The algorithm also includes directed exploration methods that add an intrinsic award to the original reward ensuring that the infinite-horizon cost function converges to the exact cost-to-go function as the discrepancy vanishes. Feasibility and optimality conditions of the proposed algorithm are also analyzed.

The algorithm developed here is applied to the load-following problem in the operation of a supercritical pulverized coal (SCPC) power plant. Here, one of the critical control problems is that of the main steam temperature control under load changes. Due to sliding pressure operation and due to significant nonlinearity of steam properties in the operating domain as well as evolving ash buildup on the tubes, this system is a time-varying nonlinear system. The proposed RL-augmented MPC algorithm is evaluated using a high-fidelity dynamic model of the SCPC plant⁶.

Bibliography

[1] S. J. Qin and T. A. Badgwell, âA survey of industrial model predictive control technology,â Control Eng. Pract., vol. 11, no. 7, pp. 733â764, Jul. 2003.

[2] T. A. Badgwell, J. H. Lee, and K.-H. Liu, âReinforcement Learning â Overview of Recent Progress and Implications for Process Control,â in Computer Aided Chemical Engineering, vol. 44, M. R. Eden, M. G. Ierapetritou, and G. P. Towler, Eds. Elsevier, 2018, pp. 71â85.

[3] J. E. Morinelly and B. E. Ydstie, âDual MPC with Reinforcement Learning,â 11th IFAC Symp. Dyn. Control Process Syst. Biosyst. DYCOPS-CAB 2016, vol. 49, no. 7, pp. 266â271, Jan. 2016.

[4] J. Lee, W. Wong, âApproximate Dynamic Programming Approach for Process Control,â Journal of Process Control, vol. 20, pp. 1038-1048, 2010.

[5] C. Watkins, âLearning from Delayed Rewards,â Dissertation, Cambridge, London, 1989.

[6] P. Sarda, E. Hedrick, K. Reynolds, D. Bhattacharyya, E. S. Zitney, and B. Omell, âDevelopment of a Dynamic Model and Control System for Load-Following Studies of Supercritical Pulverized Coal Power Plants,â Processes, vol. 6, no. 11, 2018.

List Price	225.00
AIChE Pro Members	150.00
AIChE Graduate Student Members	Free
AIChE Undergraduate Student Members	Free

Breadcrumb

2019 AIChE Annual Meeting

(243d) Learning and Adapting Model Predictive Controllers with Reinforcement Learning for Time-Varying Systems

Authors