(119g) Entropic Regularization for Wasserstein Distributionally Robust Chance-Constrained Process Optimization | AIChE

(119g) Entropic Regularization for Wasserstein Distributionally Robust Chance-Constrained Process Optimization

Authors 

Yang, S. B. - Presenter, University of Alberta
Li, Z., University of Alberta
Uncertainty is a common issue in real-world optimization problems, and finding optimal solutions robust to uncertainty is challenging due to the possibility of unexpected random effects leading to constraint violations. To address this problem, chance-constrained programming (CCP) is a commonly employed method, which guarantees that the optimal decision for an optimization problem is feasible with a defined level of confidence. There are two types of CCP, namely individual CCP (ICCP) and joint CCP (JCCP). The ICCP enforces a constraint satisfaction probability for each uncertain constraint, while the JCCP ensures that multiple constraints are jointly satisfied to a given probability, making it a more general approach than the ICCP. However, solving a JCCP problem is complicated because it requires handling multidimensional distribution. Therefore, approximation methods including analytical and sampling-based approaches, are usually used to solve JCCP problems [1].

One major difficulty of solving a JCCP problem is that the true distribution of uncertainty should be known beforehand. However, in practical scenarios, obtaining the true distribution of uncertainty can be challenging. As a solution, the distributionally robust chance-constrained programming (DRCCP) approach is proposed, which solves the JCCP problem by considering the worst-case distribution among all possible data-generating distributions. The DRCCP method aims to optimize the objective with the worst-case joint constraint satisfaction probability above the acceptable level, over a set of candidate distributions. This set is named as the ambiguity set. The ambiguity set may include certain distributional information about the unknown true distribution, such as moments, structural properties, and domain knowledge obtained from collected data. To ensure that the DRCCP approach is practical and robust, a well-designed ambiguity set should contain the true distribution with high confidence and should exclude irrelevant distributions that could result in overly conservative decisions. Additionally, the ambiguity set must be designed in a way that enables the DRCCP problem to be formulated as a tractable mathematical program that can be solved using off-the-shelf solvers [2].

In the DRCCP field, moment-based ambiguity sets and metric-based ambiguity sets are commonly used. Moment-based ambiguity sets consist of distributions that satisfy specific moment constraints. However, moment-based ambiguity sets may not be tight enough to eliminate irrelevant distributions, as different distributions may share the same moments. Additionally, even with sufficient data, chance constraints based on moment-based ambiguity sets cannot tightly approximate original chance constraints [3]. To overcome these limitations, metric-based ambiguity sets are more effective. These sets can be seen as balls in the space of probability distributions, and all the candidate distributions in a metric-based ambiguity set are centered around a nominal distribution established using collected data. The radius size of the ambiguity set determined by a probability metric, is a user-defined hyper-parameter that controls the solution's conservatism.

The Wasserstein distance is a popular metric for constructing metric-based ambiguity sets, which is more efficient than the f-divergence metric at eliminating irrelevant distributions and avoiding overly conservative decisions [4]. However, existing studies require nontrivial assumptions on uncertain constraints to obtain tractable DRCCP models based on Wasserstein ambiguity sets (Wasserstein DRCCPs). For example, uncertain constraints must be assumed to be affine in uncertainty, quadratic convex in uncertainty, or either concave or convex in uncertainty [5]. Furthermore, a limitation of Wasserstein ambiguity sets is that the worst-case distribution they produce is restricted to a discrete distribution supported on at most M+1 samples, where M represents the number of collected real samples [6]. This limitation arises because the optimal transportation plan from the Wasserstein distance is always solved at a vertex of the transportation plan's polyhedral set, resulting in an extreme and sparse transportation plan. As the worst-case distribution of the Wasserstein ambiguity set is derived from this sparse optimal transportation plan, it results in a worst-case distribution that is also sparse and discrete [4]. This restriction could lead to the Wasserstein DRCCP making overly conservative decisions and hedging against an incorrect family of distributions, which may cause the decision to deviate significantly from the true optimal decision if the true distribution is continuous [7].

This work proposes a novel DRCCP approach, which aims to overcome the limitations of the traditional Wasserstein DRCCP. The proposed approach is based on the ambiguity set constructed by the Sinkhorn distance [8] (Sinkhorn ambiguity set). The Sinkhorn distance is a variant of the Wasserstein distance that includes entropic regularization to smooth out the transportation plan [9]. The smooth transportation plan enables the Sinkhorn ambiguity set to be capable of producing a smoother worst-case distribution which is not limited to a discrete distribution. Thus, the proposed DRCCP based on the Sinkhorn ambiguity set (Sinkhorn DRCCP), is based on a more general worst-case distribution than the Wasserstein DRCCP. The presented approach is formulated as a tractable conic optimization model based on the Conditional value-at-risk (CVaR) approximation [10] and the discretized kernel distribution relaxation. This model does not require any assumptions on uncertain constraints. The above advantages enable the Sinkhorn DRCCP to be more practical for real-world problems than the popular Wasserstein DRCCP. On the other hand, unlike existing studies which only employ the Sinkhorn ambiguity set for distributionally robust optimization (DRO) [7] containing uncertainty in the objective function, this work applies the Sinkhorn ambiguity set to DRCCP containing uncertainty in constraints.

The performances of the presented Sinkhorn DRCCP and the popular Wasserstein DRCCP are compared through a numerical example and a nonlinear alkylation process optimization problem. As shown by the results of the numerical example and case study in this work, the Sinkhorn DRCCP outperforms the Wasserstein DRCCP by obtaining less conservative solutions with lower variability.

References

[1] Y. Yuan, Z. Li, and B. Huang, "Robust optimization approximation for joint chance constrained optimization problem," Journal of Global Optimization, vol. 67, pp. 805-827, 2017.

[2] P. Mohajerin Esfahani and D. Kuhn, "Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations," Mathematical Programming, vol. 171, no. 1-2, pp. 115-166, 2018.

[3] Z. Chen, D. Kuhn, and W. Wiesemann, "Data-driven chance constrained programs over Wasserstein balls," Operations Research, 2022.

[4] R. Gao and A. Kleywegt, "Distributionally robust stochastic optimization with Wasserstein distance," Mathematics of Operations Research, 2022.

[5] A. R. Hota, A. Cherukuri, and J. Lygeros, "Data-driven chance constrained optimization under Wasserstein ambiguity sets," in 2019 American Control Conference (ACC), 2019, pp. 1501-1506: IEEE.

[6] M.-C. Yue, D. Kuhn, and W. Wiesemann, "On linear optimization over Wasserstein balls," Mathematical Programming, vol. 195, no. 1-2, pp. 1107-1122, 2022.

[7] J. Wang, R. Gao, and Y. Xie, "Sinkhorn distributionally robust optimization," arXiv preprint arXiv:2109.11926, 2021.

[8] M. Cuturi, "Sinkhorn distances: Lightspeed computation of optimal transport," Advances in neural information processing systems, vol. 26, 2013.

[9] S. Kammammettu and Z. Li, "Scenario reduction and scenario tree generation for stochastic programming using Sinkhorn distance," Computers & Chemical Engineering, vol. 170, p. 108122, 2023.

[10] R. T. Rockafellar and S. Uryasev, "Optimization of conditional value-at-risk," Journal of risk, vol. 2, pp. 21-42, 2000.