(106g) Development of Algorithms for Augmenting and Replacing Conventional Process Control Using Reinforcement Learning | AIChE

(106g) Development of Algorithms for Augmenting and Replacing Conventional Process Control Using Reinforcement Learning

Authors 

Alastanos, M., West Virginia University
Hedrick, E., West Virginia University
Bhattacharyya, D., West Virginia University
Reinforcement learning (RL) is a machine learning approach that can be used to learn policies for automatic control. The ability of RL to learn online allows for learning from signals rather than labeled data sets. Applications of RL for the control of process systems are currently being investigated in the literature [1]. Combination with and augmentation of MPC have been evaluated in several ways [4]-[7]. Coupling of RL with conventional PID control has been evaluated in terms of both tuning and augmentation for better transient performance [2], [3]. Both approaches utilize RL in order to simply improve the performance of PID control. This work focuses on developing algorithms for augmenting and eventually replacing PID with RL.

In this work, a novel approach to introduce RL controllers alongside existing process controllers is developed. Because there are significant exploration requirements, learning for and implementation of RL agents that directly generate input moves can pose significant degradation in control performance. In approaching this problem, it is assumed that there exists a process controller of a standard form (PID, MPC, etc.) that regulates the plant. The RL controller is then used to calculate a control move, and a weighted sum of the two inputs is injected to the plant. The weighting factor in this approach determines when to bring the RL controller online. The RL controller starts by learning from the input profile of the standard controller, and as its performance improves its input move receives heavier weight while phasing out the standard controller. This weighting is calculated as a function of the expected return (the action-value function) approximated by the RL agent with respect to the maximum expected return (i.e., zero for a quadratic reward function), quantifying when the RL controller is performing sufficiently to regulate the plant. For the formulation of the RL controller, multiple function approximators are considered.

The approach detailed above is exhibited for the control of a benchmark, nonlinear CSTR. Here, a PID is initially implemented to regulate the plant and the RL agent, learning from a quadratic reward, is implemented alongside. Learning is carried out in episodes starting from a random state, followed by randomly sequenced disturbance injections and setpoint changes. Under this structure the performance of the controller is first presented with respect to episodic return, standardized over multiple runs. Different thresholds for bringing the RL controller online are evaluated, and controller performance is then presented for a comparison of the first and last episodes control performance (where the PID controller is fully active in the first episode, marking baseline performance). Finally, the concept of “re-learning” is introduced, and slow changes in the plant model are used to evaluate thresholds for which the RL controller must be updated online in an average reward setting after implementation, rather than the episodic setting that was originally used for training. Due to the continuing learning, performance of the RL based control exceeds the performance of the PID or other controllers that it learns from. The algorithm is generic since the learning is based on the continuous signal and therefore can be readily extended to other control approaches such as MPC.

[1] S. Gros and M. Zanon, “Data-driven economic NMPC using reinforcement learning,” IEEE Trans. Automat. Contr., vol. 65, no. 2, pp. 636–648, Feb. 2020, doi: 10.1109/TAC.2019.2913768.

[2] Y. Wu, L. Xing, F. Guo, and X. Liu, “On the Combination of PID control and Reinforcement Learning: A Case Study with Water Tank System,” in Proceedings of the 16th IEEE Conference on Industrial Electronics and Applications, ICIEA 2021, Aug. 2021, pp. 1877–1882, doi: 10.1109/ICIEA51954.2021.9516140.

[3] N. P. Lawrence, M. G. Forbes, P. D. Loewen, D. G. McClement, J. U. Backström, and R. B. Gopaluni, “Deep reinforcement learning with shallow controllers: An experimental application to PID tuning,” Control Eng. Pract., vol. 121, 2022, doi: 10.1016/j.conengprac.2021.105046.

[4] E. Hedrick, K. Hedrick, D. Bhattacharyya, S. E. Zitney, and B. Omell, “Reinforcement learning for online adaptation of model predictive controllers: Application to a selective catalytic reduction unit,” Comput. Chem. Eng., vol. 160, p. 107727, 2022, doi: 10.1016/j.compchemeng.2022.107727.

[5] X. Pan, X. Chen, Q. Zhang, and N. Li, “Model Predictive Control : A Reinforcement Learning-based Approach,” J. Phys. Conf. Ser., vol. 2203, no. 1, p. 012058, 2022, doi: 10.1088/1742-6596/2203/1/012058.

[6] Y. Yang and S. Lucia, “Multi-step greedy reinforcement learning based on model predictive control,” IFAC-PapersOnLine, vol. 54, no. 3, pp. 699–705, 2021, doi: 10.1016/j.ifacol.2021.08.323.

[7] M. Zanon and S. Gros, “Safe Reinforcement Learning Using Robust MPC,” IEEE Trans. Automat. Contr., vol. 66, no. 8, pp. 3638–3652, Aug. 2021, doi: 10.1109/TAC.2020.3024161.