(386d) End-to-End Reinforcement Learning of Koopman Models for Economic Model Predictive Control | AIChE

(386d) End-to-End Reinforcement Learning of Koopman Models for Economic Model Predictive Control

Authors 

Mitsos, A., RWTH Aachen University
Dahmen, M., FZ Jülich
Data-driven surrogate models for dynamic models are a promising way to make economic nonlinear model predictive control (eNMPC) viable by reducing the computational burden of the underlying optimal control problems [1]. System identification (SI) is the most common approach to training data-driven dynamic surrogate models, but it is narrowly focused on maximizing average prediction accuracy on a set of simulation samples. In contrast, dynamic surrogate models, trained directly for optimal performance in a control application using reinforcement learning (RL), were recently shown to outperform SI-trained models [2-6]. These findings, however, were restricted to the learning of linear models [2], applications not including state constraints [2-5], or primarily focused on the adaptation of bounds and cost function instead of the dynamic MPC model itself [6].

The vast majority of RL research focuses on learning model-free control policies. Such policies do not use system state predictions to determine the control actions. In contrast, learning a dynamic model and using its predictions to obtain a control law, as in eNMPC, has some advantages: First, no retraining is required if constraints or objective functions change, as long as the system dynamics remain identical. Second, learning the system dynamics may be more sample efficient than learning a sensible control policy [2,3]. Third, MPC has a rich theory regarding performance and stability guarantees, especially for linear models, and recent publications aim to extend this established theory to (learned) eNMPC [6,7].

We present a framework for end-to-end learning of nonlinear dynamic surrogate models for optimal performance in eNMPC applications with hard constraints on states. Specifically, we use applied Koopman theory and its extension to controlled systems [8] to obtain a model structure that can capture systems with nonlinear dynamics but gives rise to a convex MPC problem. We use post-optimal sensitivity analysis [9,10] to construct MPC policies whose control outputs can be differentiated with respect to the parameters of the surrogate models. This enables us to use state-of-the-art model-free RL algorithms like Proximal Policy Optimization [11] to train the dynamic surrogate models for optimal performance in eNMPC. We test our approach on two different case studies derived from a well-studied continuous stirred-tank reactor model [12,13]: (i) an NMPC case study where the controller shall stabilize the product concentration given a fluctuating product flow rate, and (ii) a demand response case study wherein an eNMPC shall minimize electricity costs subject to hard constraints on state variables. We compare the resulting control performances to those from dynamic models trained solely using SI and model-free policies trained using RL. We show that end-to-end trained dynamic surrogate models, like model-free policies, consistently outperform models trained by SI. Additionally, we find that, unlike model-free policies, the MPCs employing an end-to-end trained dynamic surrogate model can successfully adapt to constraint changes without the need for re-training.

[1] McBride, K., & Sundmacher, K. (2019). Overview of surrogate modeling in chemical process engineering. Chemie Ingenieur Technik, 91(3), 228-239.

[2] Chen, B., Cai, Z., & Bergés, M. (2019). GNU-RL: A precocial reinforcement learning solution for building HVAC control using a differentiable MPC policy. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (pp. 316-325).

[3] Amos, B., Jimenez, I., Sacks, J., Boots, B., & Kolter, J. Z. (2018). Differentiable MPC for end-to-end planning and control. Advances in Neural Information Processing Systems, 31.

[4] Yin, H., Welle, M. C., & Kragic, D. (2022). Embedding Koopman Optimal Control in Robot Policy Learning. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 13392-13399).

[5] Iwata, T., & Kawahara, Y. (2022). Data-driven End-to-end Learning of Pole Placement Control for Nonlinear Dynamics via Koopman Invariant Subspaces. arXiv preprint arXiv:2208.08883.

[6] Gros, S., & Zanon, M. (2019). Data-driven economic NMPC using reinforcement learning. IEEE Transactions on Automatic Control, 65(2), 636-648.

[7] Angeli, D., Amrit, R., & Rawlings, J. B. (2011). On average performance and stability of economic model predictive control. IEEE Transactions on Automatic Control, 57(7), 1615-1626.

[8] Korda, M., & Mezić, I. (2018). Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica, 93, 149-160.

[9] Fiacco, A. V., & Ishizuka, Y. (1990). Sensitivity and stability analysis for nonlinear programming. Annals of Operations Research, 27(1), 215-235.

[10] Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., & Kolter, J. Z. (2019). Differentiable convex optimization layers. Advances in Neural Information Processing Systems, 32.

[11] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

[12] Petersen, D., Beal, L. D., Prestwich, D., Warnick, S., & Hedengren, J. D. (2017). Combined noncyclic scheduling and advanced control for continuous chemical processes. Processes, 5(4), 83.

[13] Baader, F. J., Bardow, A., & Dahmen, M. (2022). Simultaneous mixed‐integer dynamic scheduling of processes and their energy systems. AIChE Journal, 68(8), e17741.