(59ao) Augmented Control Using Reinforcement Learning and Conventional Process Control | AIChE

(59ao) Augmented Control Using Reinforcement Learning and Conventional Process Control


Beahr, D. - Presenter, West Virginia University
Bhattacharyya, D., West Virginia University
Allan, D. A., University of Wisconsin Madison
Zitney, S., National Energy Technology Laboratory
Reinforcement learning is one of several classifications of machine learning. Specifically, reinforcement learning is an approach that allows for refining control online through policy updates. A bootstrap approach for reinforcement learning negates the need for labelled datasets, and instead utilizes online measurements to create a model-free mode of control. Applications of RL to process control often utilize RL within the structure of some control method [1]. Combination with and augmentation of MPC have been evaluated in several ways [2]–[5]. Coupling of RL with conventional PID control has been evaluated in terms of tuning or augmentation for better transient performance [6], [7]. Both approaches utilize RL in order to simply improve the performance of PID control. This work focuses on developing algorithms for augmenting and eventually replacing PID with RL.

In this work, we propose a control structure by augmenting existing conventional process control (CPC) methods with an RL agent. An actor-critic structure was adopted for the RL that utilized a deep deterministic policy gradient (DDPG) algorithm. Due to the generally slow learning rates and high exploration requirements of RL, it was desired that the existing conventional process control (PID, MPC) continues to compute their own generated control action that enhances the learning rate of the RL agent. A weighted sum is derived from the control actions of the RL and CPC, and subsequently applied to the plant; the resultant states and actions are then used to supplement the RL agent’s learning. The proposed algorithm avoids direct action of the naive RL agent that may not result in acceptable performance and even be unsafe under worst case scenarios. Algorithms are developed for the weighting function based on a measure of instantaneous and historical performance as well as based on a game-theoretic approach that can incorporate expert advice, if available. Performance of both the RL and CPC method are assessed using a moving horizon that also decays with time, valuing more recent actions as more relevant than older actions. In this way, the RL agent can take over control as and when its control performance exceeds that of the conventional control method. If the RL’s performance begins to deteriorate, the conventional control method would again assume full control. The advantage of the DDPG approach is that it is able to handle continuous action spaces and is able to be compared to on a one-to-one basis with control actions provided by the conventional control method.

The above approach is applied to a flowsheet of solid oxide fuel cell (SOFC) [8]. The conventional control process in place is a series of PIDs, some arranged in cascade loops. Because of the mode-switching nature of the SOFC as well as the complex dynamics, the PID performance is often poor whereas the actor-critic structure of the RL algorithm facilitates capturing the dynamics accurately. For this case study, the RL is proposed to augment, and eventually phase out, the feed-side cascade loop. Other arrangements of the PID-RL structure are also considered, incorporating other PIDs. The episodic learning used for the RL-PID arrangement is a series of mode-switches from maximum hydrogen production to maximum power production and back to maximum hydrogen production. While learning is episodic in nature, the states are continuous across episodes, creating a consistent measure of performance improvement.

[1] S. Gros and M. Zanon, “Data-driven economic NMPC using reinforcement learning,” IEEE Trans. Automat. Contr., vol. 65, no. 2, pp. 636–648, Feb. 2020, doi: 10.1109/TAC.2019.2913768.

[2] E. Hedrick, K. Hedrick, D. Bhattacharyya, S. E. Zitney, and B. Omell, “Reinforcement learning for online adaptation of model predictive controllers: Application to a selective catalytic reduction unit,” Comput. Chem. Eng., vol. 160, p. 107727, 2022, doi: 10.1016/j.compchemeng.2022.107727.

[3] X. Pan, X. Chen, Q. Zhang, and N. Li, “Model Predictive Control : A Reinforcement Learning-based Approach,” J. Phys. Conf. Ser., vol. 2203, no. 1, p. 012058, 2022, doi: 10.1088/1742-6596/2203/1/012058.

[4] Y. Yang and S. Lucia, “Multi-step greedy reinforcement learning based on model predictive control,” IFAC-PapersOnLine, vol. 54, no. 3, pp. 699–705, 2021, doi: 10.1016/j.ifacol.2021.08.323.

[5] M. Zanon and S. Gros, “Safe Reinforcement Learning Using Robust MPC,” IEEE Trans. Automat. Contr., vol. 66, no. 8, pp. 3638–3652, Aug. 2021, doi: 10.1109/TAC.2020.3024161.

[6] Y. Wu, L. Xing, F. Guo, and X. Liu, “On the Combination of PID control and Reinforcement Learning: A Case Study with Water Tank System,” in Proceedings of the 16th IEEE Conference on Industrial Electronics and Applications, ICIEA 2021, Aug. 2021, pp. 1877–1882, doi: 10.1109/ICIEA51954.2021.9516140.

[7] N. P. Lawrence, M. G. Forbes, P. D. Loewen, D. G. McClement, J. U. Backström, and R. B. Gopaluni, “Deep reinforcement learning with shallow controllers: An experimental application to PID tuning,” Control Eng. Pract., vol. 121, 2022, doi: 10.1016/j.conengprac.2021.105046.

[8] D. Bhattacharyya and R. Rengaswamy, “A review of solid oxide fuel cell (SOFC) dynamic models,” Ind. Eng. Chem. Res., vol. 48, no. 13, pp. 6068–6086, Jul. 2009, doi: 10.1021/ie801664j.