(105d) Deep Kernel Distributionally Robust Joint Chance-Constrained Process Optimization | AIChE

(105d) Deep Kernel Distributionally Robust Joint Chance-Constrained Process Optimization

Authors 

Yang, S. B. - Presenter, University of Alberta
Li, Z., University of Alberta
Uncertainty widely exists in real-world process optimization problems. While different approaches have been proposed to tackle this problem, distributionally robust optimization (DRO) has been emerging as an intermediate between stochastic optimization and robust optimization. DRO assumes the true distribution of the uncertainty lies in an ambiguity set and makes decisions to hedge against the worst-case distribution. This work focuses on the distributionally robust joint chance constrained process optimization problem.

As a popular approach to handle uncertainty in constraints, chance-constrained programming enforces the uncertain constraints to be satisfied with a user-defined probability. There are two types of chance constraints: individual chance-constraint and joint chance constraint (JCC) [1]. The joint chance constraint is more general since it ensures multiple uncertain constraints to be satisfied simultaneously to a confidence level. However, the major difficulty of solving a JCC problem is that the exact true distribution of uncertainty is usually not completely known. In many cases, only partial information about the uncertainty may be obtained from historical data. In such situations, the distributionally robust chance-constrained programming (DRCCP) [2] is an effective and powerful approach to address the chance constrained problem.

The DRCCP can be intuitively deemed as optimizing a certain objective function with the acceptable worst-case constraint satisfaction probability over a set of data-generating candidate distributions. The candidate distributions are learned from historical data and characterized through certain known distributional information of the unknown data-generating distribution. The set of candidate distributions is typically referred to as the ambiguity set. The ambiguity set should contain the true distribution of uncertainty with high confidence, and it should also be small enough to exclude distributions that may cause overly conservative solutions. In addition, the ambiguity set should be easily extracted from data, and it should enable a tractable reformulation of the DRCCP as a mathematical program that can be solved using an off-the-shelf optimization solver [3].

Among different types of ambiguity sets, the moment-based, phi-divergence, and Wasserstein ambiguity sets are popular and widely used. A moment-based ambiguity set contains all distributions satisfying certain moment constraints. Thus, a certain level of moment information should be available in advance for establishing a moment-based ambiguity set. However, the moment constraints in a moment-based ambiguity set might be too loose to exclude distributions that may lead to overly conservative solutions. Furthermore, distributionally robust chance constraints based on moment-based ambiguity sets are not able to tightly approximate the original chance constraints even if sufficient data is available [4]. The phi-divergence and Wasserstein ambiguity sets are metric-based ambiguity sets that can overcome the above disadvantages of the moment-based ambiguity sets. All candidate distributions in a metric-based ambiguity set are centered around the data-generating nominal distribution within a radius determined by the prescribed probability metric (phi-divergence, Wasserstein metric, etc). Although the phi-divergence and Wasserstein ambiguity sets are attractive alternatives to the moment-based ambiguity sets, they still have some drawbacks that limit their practical applications. A phi-divergence ambiguity set only contains distributions with the same support as the nominal distribution, which means that it does not necessarily contain the true distribution. Although Wasserstein ambiguity sets do not suffer from this issue, they rely on some nontrivial assumptions to enable tractable reformulations of underlying DRCCP problems. For instance, the constraints in the joint chance constraint of a Wasserstein DRCCP should be affine in uncertain parameters [5].

A novel kernel ambiguity set based DRCCP approach is proposed in this research to overcome the above-mentioned limitations of the existing DRCCP methods. The kernel ambiguity set is constructed by using the kernel mean embedding (KME) and the maximum mean discrepancy (MMD) [6]. More specifically, the kernel ambiguity set incorporates distributions into a reproducing kernel Hilbert space (RKHS) norm-ball through the KME. In the RKHS norm-ball, the KME representations of candidate distributions are centered around the KME representation of the nominal distribution within a radius (RKHS norm-ball radius) determined by the MMD. Unlike phi-divergence ambiguity sets, the kernel ambiguity set would contain the true distribution if the RKHS norm-ball radius is large enough [5]. Additionally, unlike Wasserstein DRCCPs, the uncertain constraints are not restricted to be affine in uncertain parameters while implementing the kernel DRCCP, which is evidenced in this study.

We present three different formulations based on this new ambiguity set. The first one is a mixed-integer model employing the indicator function for handling the joint chance constraint. The second one is a mixed-integer model using the Conditional Value-at-Risk (CVaR) approximation. In this approach, the worst-case CVaR approximation [7] is employed to approximate the distributionally robust joint chance constraint (DRJCC) to reduce the computational burden of problem-solving. The third one is the relaxed version of the second model, which is a continuous optimization model. The performance of all the three formulations is compared with the popular Wasserstein ambiguity set based DRCCP method.

We also investigated the impact of kernel selection in the proposed approach. Specifically, we proposed to use the multi-layer arc-cosine kernel (MLACK) [8] which possesses a deep architecture to reinforce the performance of the presented method. The MLACK has been shown to outperform shallow kernels such as the linear, polynomial, and Gaussian kernels, by extracting more complex structures and generating more efficient representations via exploiting raw features [9]. The MLACK mimics the computation in a multi-layer deep neural network with infinite hidden and output units. In this sense, this research not only proposes an innovative DRCCP approach, but also bridges the fields of deep learning and distributionally robust chance-constrained optimization.

The performances of the different model formulations are compared through a numerical example and a nonlinear process optimization problem. In the numerical example, we demonstrate that the presented method can significantly outperform the Wasserstein DRCCP approach while solving a JCC problem. In the case study, the presented method is applied to a nonlinear alkylation process optimization problem, and we show that the performance of the proposed approach can be enhanced by using the MLACK.

References

[1] P. Li, M. Wendt, and G. Wozny, "A probabilistically constrained model predictive controller," Automatica, vol. 38, no. 7, pp. 1171-1176, 2002.

[2] W. Xie, "On distributionally robust chance constrained programs with Wasserstein distance," Mathematical Programming, vol. 186, no. 1, pp. 115-155, 2021.

[3] P. Mohajerin Esfahani and D. Kuhn, "Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations," Mathematical Programming, vol. 171, no. 1, pp. 115-166, 2018.

[4] Z. Chen, D. Kuhn, and W. Wiesemann, "Data-driven chance constrained programs over Wasserstein balls," arXiv preprint arXiv:1809.00210, 2018.

[5] M. Staib and S. Jegelka, "Distributionally robust optimization and generalization in kernel methods," Advances in Neural Information Processing Systems, vol. 32, 2019.

[6] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, "A kernel two-sample test," The Journal of Machine Learning Research, vol. 13, no. 1, pp. 723-773, 2012.

[7] B. Liu, Q. Zhang, X. Ge, and Z. Yuan, "CVaR-based approximations of wasserstein distributionally robust chance constraints with application to process scheduling," Industrial & Engineering Chemistry Research, vol. 59, no. 20, pp. 9562-9574, 2020.

[8] Y. Cho and L. Saul, "Kernel methods for deep learning," Advances in Neural Information Processing Systems, vol. 22, 2009.

[9] A. Afzal, N. K. Nair, and S. Asharaf, "Deep kernel learning in extreme learning machines," Pattern Analysis and Applications, vol. 24, no. 1, pp. 11-19, 2021.