(371g) Stabilization-Oriented Learning Algorithm for Optimal Control of Nonlinear Control-Affine System
AIChE Annual Meeting
Tuesday, November 12, 2019 - 3:30pm to 5:00pm
The conventional methods, excluding the third approach, are focusing more on optimality than stability. For example, to guarantee the stability of MPC, it is necessary to include restraints for terminal constraint or terminal cost. In the second method, in order for the LgV-type optimal formula of the control-affine system to be an asymptotic stabilizing policy, the value function should be the solution of the Lyapunov equation with an asymptotic stabilizing policy. However, the Lyapunov equation is also a PDE, which is difficult to solve. Even by using a neural network to approximate the solution of HJB, it is difficult to guarantee the closed-loop stability using LgV-type optimal formula in the process of updating the weights of a neural network, especially in the early stages of learning.
Although the conventional inverse control approach focuses on stability rather than optimality, it is important that the CLF has the same shape of a level set as the value function, in order for Sontag's formula to provide the optimal controller with a user-defined cost function. However, only few studies have been conducted to develop the algorithm to find such CLF . In , the CLF is learned by adjusting the cost function for which the CLF is the optimal value function, in order to provide similar performance to the optimal controller that minimizes the user-specified cost function. In our study, we modify the algorithm developed in the second approach to develop a new algorithm that learns CLF having the same level set as the optimal value function for the user-specified cost function, unlike the method of .
In this study, we propose a new PI-based algorithm for learning both the optimal value function and the optimal control for nonlinear control-affine systems, while guaranteeing the stability. We prove the stability of the system and the convergence to the optimal controller when solving Lyapunov equations for policy evaluation as well as using Sontag's formula for policy improvement. Even when using function approximation and gradient descent method for policy evaluation, the closed-loop stability is guaranteed during the whole learning process. Since Sontag's formula can provide the asymptotically stabilizing controller with the neural network, which is constrained as CLF, our controller can asymptotically stabilize the system even with the approximation errors.
However, the LgV-type optimal formula cannot guarantee stability because the approximate function is not exactly same as the optimal value function. Therefore, the optimal formula cannot yield the optimal controller, and CLF is not a sufficient condition to use the optimal formula to obtain a stabilizing controller. Through simulation, we show several cases where the system becomes unstable during the training process when the LgV-type optimal formula is used for policy improvement.
 Rohrweck, H., Schwarzgruber, T., Re, L. (2015). Approximate optimal control by inverse CLF approach. IFAC-PapersOnLine, 48(11), 286-291.
 Vamvoudakis, K. G., Lewis, F. L. (2010). Online actorâcritic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 46(5), 878-888.
 Kamalapurkar, R., Rosenfeld, J. A., Dixon, W. E. (2016). Efficient model-based reinforcement learning for approximate online optimal control. Automatica, 74, 247-258.
 Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L. (2009). Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 45(2), 477-484.
 Freeman, R. A., Kokotovic, P. V. (1996). Inverse optimality in robust stabilization. SIAM Journal on Control and Optimization, 34(4), 1365-1391.
 Primbs, J. A., NevistiÄ, V. and Doyle, J. C. (1999). Nonlinear optimal control: a control lyapunov function and receding horizon perspective. Asian Journal of Control, (1), 14-24.