(417b) Rare-Event Sampling for Efficient Scenario Generation for Stochastic Programs | AIChE

(417b) Rare-Event Sampling for Efficient Scenario Generation for Stochastic Programs


Young, D. - Presenter, Auburn University
Cremaschi, S., Auburn University
Carpenter, M., Auburn University
Stochastic programming (SP) is a common approach to solve optimization problems under uncertainty (Birge and Louveaux, 2011). Such problems arise when the decision-maker incorporates aspects of a problem that are not fully known or cannot be controlled, i.e., the weather, demand, or estimated process parameters. This uncertain information is generally modeled as a probability distribution based on previous data or expert opinion. To solve these problems with a SP framework, samples from the distribution, or scenarios, are used to approximate the distribution. Naturally, the better the set of scenarios are at depicting the underlying distribution, the more accurate the solution is. However, as the number of the scenarios grows, so does the problem size, leading to the model tractability issues. Balancing the model size and the accuracy of the distribution estimation leads to the need for efficient scenario generation (Henrion and Römisch, 2018; Park et al., 2019; Shapiro et al., 2014).

In many stochastic programs, the most probable scenarios are enough to estimate the distribution for a high-quality solution. There are, however, some problems, such as those in safety and reliability engineering, supply chain resilience, and health services, in which the least likely events have the most impact on the optimal decisions. These events occur in the tail(s) of the distribution and are likely to be not accurately represented within the scenarios. To fully describe the tail(s), it is necessary to either create a large number of samples or use a rare-event sampling technique. In these types of problems, the need to properly represent the tail(s) of the distribution while maintaining a computationally tractable model becomes even more of an issue.

In this work, we evaluate the efficacy of two different implementations of importance sampling (Papavasiliou and Oren, 2013; Parpas et al., 2014) and two clustering approaches to generate scenarios for SP models emphasizing the rare-event space efficiently. Importance sampling is a variance reduction technique in which samples are taken from an auxiliary distribution, the importance distribution (ID), which emphasizes the region of the distribution that has the greatest impact on a measure of interest. The samples are then tied back to the original distribution through an assigned weight, the likelihood ratio, to maintain an unbiased estimation of that measure of interest. The determination of an ID is a non-trivial task and is where the two tested implementations differ. The first implementation (Parpas et al., 2014) uses a Markov Chain Monte Carlo approach to generate samples from a theoretical ID where the normalization constant is unknown, and the ID is approximated using kernel density estimation. The approximation of the ID is then used to generate samples with known probabilities. The second approach (Papavasiliou and Oren, 2013) utilizes a large number of samples from the original distribution to accurately estimate the expectation of the measure of interest. A subset of the initial samples is then chosen using an approximated ID, constructed using the estimated expectation of the measure of interest. Clustering is a data analysis technique that takes in a large dataset and groups the data into subsets, or clusters, of points that share similarities, originally used as a classification technique (Driver and Kroeber, 1932). This technique can be used for scenario reduction by clustering a large dataset and generating representative samples from each of the clusters as the scenarios for an optimization problem (Li et al., 2020). The clustering techniques used are the k-means and affinity propagation algorithms. The k-means is a general-purpose algorithm that separates the data set into k different clusters based on the distance between the mean vectors of the data (Yadav and Sharma, 2013). The affinity propagation algorithm utilizes a similarity measure to tie data points to a representative point, known as an exemplar, to form the clusters (Dueck, 2009). We also include a crude Monte Carlo generation approach as a baseline in the framework.

We test the scenario reduction techniques on a healthcare-related problem, in which the objective is to maximize the expected life-years gained by screening for Colorectal cancer (CRC). The solution identifies at what ages screening tests should occur within a population to maximize the benefit. For this problem, only a small percentage of the population, 4-4.5 % (American Cancer Society, 2020), develops CRC within their lifetime. Hence, for the majority of the population, screening for CRC is a burden. To properly maximize the expected benefit of screening, it is necessary to accurately portray the impact of the lower probability scenarios on the objective function. After the scenario sets are generated, we solved the optimization problem and compare the resulting objective function to a rigorous simulation to determine how effective the approximation is for the various techniques. This presentation will discuss the effectiveness in estimating the true objective value, the quality of the solutions obtained, and the computation budget needed for each scenario generation method.


American Cancer Society, 2020. Colorectal Cancer Facts & Figures [WWW Document]. URL https://www.cancer.org/research/cancer-facts-statistics/colorectal-cance... (accessed 2.27.19).

Birge, J.R., Louveaux, F., 2011. Introduction to stochastic programming. Springer Science & Business Media.

Driver, H.E., Kroeber, A.L., 1932. Quantitative expression of cultural relationships. Berkeley: University of California Press.

Dueck, D., 2009. Affinity propagation: clustering data by passing messages. (Thesis) Citeseer.

Henrion, R., Römisch, W., 2018. Problem-based optimal scenario generation and reduction in stochastic programming. Math. Program. 1–23.

Li, B., Sedzro, K., Fang, X., Hodge, B.M., Zhang, J., 2020. A clustering-based scenario generation framework for power market simulation with wind integration. J. Renew. Sustain. Energy 12. https://doi.org/10.1063/5.0006480

Papavasiliou, A., Oren, S.S., 2013. Multiarea stochastic unit commitment for high wind penetration in a transmission constrained network. Oper. Res. 61, 578–592. https://doi.org/10.1287/opre.2013.1174

Park, S., Xu, Q., Hobbs, B.F., 2019. Comparing scenario reduction methods for stochastic transmission planning. IET Gener. Transm. Distrib. 13, 1005–1013.

Parpas, P., Ustun, B., Webster, M., Tran, Q.K., 2014. Importance Sampling in Stochastic Programming: A Markov Chain Monte Carlo Approach. INFORMS J. Comput. 27, 358–377. https://doi.org/10.1287/ijoc.2014.0630

Shapiro, A., Dentcheva, D., Ruszczyński, A., 2014. Lectures on Stochastic Programming: Modeling and Theory, Second Edition. Lect. Stoch. Program. Model. Theory, Second Ed. https://doi.org/10.1137/1.9781611973433

Yadav, J., Sharma, M., 2013. A Review of K-mean Algorithm. Int. J. Eng. trends Technol. 4, 2972–2976.