(393e) Distributed Approximate Dynamic Programming (dADP) for Data-Driven Optimal Control of Nonlinear Systems

Tang, W., University of Minnesota, Twin Cities
Daoutidis, P., University of Minnesota, Twin Cities
Data-driven techniques have been developed in the field of chemical engineering for multiple purposes, including process monitoring and control. Data-driven control offers an alternative to model-based control strategies that does not rely on an accurate control-oriented process model. Among the wide spectrum of methods of data-driven control [1], approximate dynamic programming (ADP) is a type of algorithms which has been successfully used for discrete-state systems in artificial intelligence applications, and discussed for process control (see e.g. [2]).

ADP is a data-driven optimal control strategy, where historical datasets are exploited to train the control policy and value function iteratively towards the optimal solution which satisfies the Bellman’s principle of optimality. For systems with continuous (infinite) states, the optimality principle assumes a specific form of the Hamilton-Jacobi-Bellman (HJB) equations. For nonlinear input-affine systems, the HJB equations can be transformed such that the model functions can be substituted with some data information, so that by choosing suitable basis functions for the optimal control policy and the value function, the policy and value iterations can be approximately solved as a regression problem [3].

In this work, we propose a novel approach different from that of [3], which directly formulates the HJB equations as a nonlinear regression problem, so that the approximate control policy and value function can be directly obtained. The framework is also extended to the cases where input constraints are present. This formulation is suitable for solving ADP in a big-data setting where a centralized optimization exploiting all the data in the regression procedures is infeasible. Specifically, we employ the alternating direction of multipliers (ADMM) [4], which is the most widely used distributed optimization algorithm, as well as its accelerated version [5] to regress the parameters of the optimal control policy and value function. We call the resulting framework distributed adaptive dynamic programming (dADP) as it adaptively updates the parameters to approach the optimum throughout the distributed optimization iterations, and we will illustrate this method in a chemical reactor example.


[1] Hou, Z. S., & Wang, Z. (2013). From model-based control to data-driven control: Survey, classification and perspective. Inf. Sci., 235, 3-35.

[2] Lee, J. H., & Wong, W. (2010). Approximate dynamic programming approach for process control. J. Process Control, 20(9), 1038-1048.

[3] Luo, B., et al. (2014). Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica, 50(12), 3281-3290.

[4] Boyd, S., et al. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn., 3(1), 1-122.

[5] Goldstein, T., et al. (2014). Fast alternating direction optimization methods. SIAM J. Imaging Sci., 7(3), 1588-1623.


This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.


Do you already own this?



AIChE Members $150.00
AIChE Graduate Student Members Free
AIChE Undergraduate Student Members Free
Non-Members $225.00