(434e) Machinelearning and Adaptive Model Predictive Control: Conflict or Conflux | AIChE

(434e) Machinelearning and Adaptive Model Predictive Control: Conflict or Conflux

Authors 

Ydstie, E. - Presenter, Carnegie Mellon University
Cai, Z., Carnegie Mellon University
Adaptive and machine learning controllers perform tracking and regulation tasks by tuning control parameters. In the case of RL/Q-Learning, the control function is adapted directly in such a way that the performance improves over time. In the case of adaptive model predictive control (aMPC) a model is updated and then MPC controller is updated using optimization.

The purpose of the current paper is to compare and contrast the direct and indirect control methods using a linear system with a quadratic performance objective as an example. The system to be controlled and the control objectives can now be made equivalent and it is possible to compare the two approaches in simulation and Lyapunov type stability-theory. In either case we will apply the same recursive least square (RLS) algorithm to tune control parameters. The control design methods are quite different however; and the performance and stability properties two approaches also turn out to be quite different. To compare the methods, we will say that a conflux exists if the parameter tuning and control algorithms exhibit coinciding and unique fixed points. In such a case we can guarantee optimality of the algorithm if it is stable and converges. Otherwise, tuning and control are said to be in conflict and optimality cannot be guaranteed if when the algorithm converges.

Certainty equivalence aMPC treats control and learning as separate tasks. This approach does not generate input signals that are rich enough for good parameter estimation. It is known that stability can be guaranteed while optimality cannot be guaranteed unless additional excitation is provided. In this paper we show that the RL/Q-Learning approach provides neither stability nor optimality unless the signals are excited. Moreover, the excitation requirements are stringent and to make matters worse, RL/Q-Learning has to use slow learning to be stable. This requirement is not paralleled in aMPC. Thus, the same asymptotic properties optimality properties hold direct and indirect learning as long as excitation holds; but the convergence rate of RL/Q-Learning is much slower.

Dual control theory [4] discusses the trade-off between two tasks: trajectory and parameter convergence. A dual adaptive controller is designed to optimally improve the learning while maintaining robust control performance. The ideas go back to the original work of Feldbaum [1]. However, algorithms and theory for application in non-trivial examples have been developed only recently [2].

In the current paper we explore the application of dual control as a vehicle to improve the performance of aMPC and RL/Q-Learning. In particular we want to address the issue Polderman [3] described as the “conflux or conflict” between estimation and control as described above. In his seminal work he showed that the conflux existed between and estimation and control for minimum variance and pole-placement control, however it does not exist for infinite horizon aMPC. Instead there is conflict resulting in stable, but sub-optimal controls. Bradke et al. [4] demonstrated a similar conflict between estimation and control in RL/Q-learning. The purpose of our presentation is to reviews these results very briefly and explore the possibility of using dual control to stabilize learning controllers and provide asymptotic optimality, thus resolving the above-mentioned conflict that exist in both the direct and indirect approaches to learning control.

References

  1. Feldbaum, A. (1960). Dual control theory. i. Avtomatika i Telemekhanika, 21(9), 1240– 1249.
  2. Heirung, T. A. N., Ydstie, B. E., & Foss, B. (2017). Dual adaptive model predictive control. Automatica, 80, 340–348.
  3. Polderman, J.W., 1989. Adaptive control & identification: conflict or conflux?. CWI Tracts, (67).
  4. Bradtke, S.J., Ydstie, B.E. and Barto, A.G., 1994, June. Adaptive linear quadratic control using policy iteration. In Proceedings of 1994 American Control Conference-ACC'94(Vol. 3, pp. 3475-3479). IEEE