(198f) A Kriging Optimization Algorithm Incorporating Efficient Sampling for Problems Containing Black-Box Models | AIChE

(198f) A Kriging Optimization Algorithm Incorporating Efficient Sampling for Problems Containing Black-Box Models

Authors 

Davis, E. - Presenter, Rutgers University


When accurate closed-form descriptions of chemical processes are unavailable, gradient-based optimization methods can fail due to the lack of reliable derivative information for these black-box (BB) systems. To overcome this problem, direct search techniques can be used, but convergence to an optimum can be slow. Convergence can be accelerated using surrogate models, but it is possible for misleading search directions to be identified and/or for premature termination to occur, as can happen when the input-output data are noisy. Furthermore, the value of the information obtained must not be outweighed by model building costs for the nonlinear program (NLP). This last consideration is very important if integer variables are also present, as the solution of many relaxed NLP subproblems may be required. In addition, a global optimum must be obtained as the solution to the corresponding NLP in order to guarantee a lower or upper bound; early termination at a local optimum can lead to longer search and/or termination of the original problem at a suboptimal solution. Local methods ensure a global optimal solution only under convexity conditions, and deterministic surrogate models may be inaccurate when noisy data are present. To overcome these problems, an iteratively refined stochastic data-driven global model generated using kriging can instead be used to: 1) describe noisy system behavior [1,2], and 2) identify the set of local optima [5-7]. Each local optimum is then refined by sequentially applying response surface methodology and direct search in order to optimize the continuous and integer variables, respectively, appearing inside the black-box system [3-5,7]. When a problem formulation contains integer variables appearing outside a black-box system, as in the case of process synthesis problems, Branch-and-Bound is applied in order to efficiently find the optimal solution for this second class of integer variables [6]. Given a relaxed NLP at any node of a Branch-and-Bound tree, a kriging predictor is constructed in terms of the input variables with respect to the corresponding objective function. The NLP solution is identified after confirming the best model optima using sampling, and improved models can be generated using additional sampling information. However, accurate model generation, especially in the case of high-dimension problems, may necessitate a high sampling cost. Sampling is required for the initial model-building, and the iterative model refinement. In our previous work [4-7], random sampling has been employed for initial model construction; however inaccurate models may be generated from a poorly chosen sampling set. Heuristic sampling is employed for iterative model refinement whereby a new sampling set is comprised of test points proportionally distributed among regions of high model variance, prediction optima, and where model predictions significantly change between iterations. However, a method for determining the optimal number of additional sampling experiments to perform at each iteration, i.e. the optimal sampling set size, has not yet been developed. In our previous work [4-7], the sampling size has also remained an arbitrarily chosen user-determined parameter. If the number of additional sampling points is low, model improvement may only be incremental. Conversely, extensive sampling may be similarly undesirable, not only because the cost is high, but also because model improvement may be similarly small if the information value obtained from a poorly chosen sampling set fails to yield new global system information. The limitations of these two techniques thereby motivate the subject of this work, which is the development and implementation of a new sampling strategy employed for both the initial and iterative stages of model-building. The strategy is comprised of five steps: 1) subdivision of the feasible region into nonoverlapping convex hulls, 2) construction of local kriging models based on sampling at convex hull vertices, 3) sampling at optima residing inside each convex hull subregion, 4) subdivision of each of convex hull subregions into smaller convex hulls whose vertices now include the new sampling points, and 5) iteration of steps 2) ? 4), until no better optima are found. In step 1, the initial vertices of the feasible region are defined by low/high operating ranges for the process variables. Sampling at the feasible region vertices ensures that the subsequent local models are built using data covering the entire feasible region. This sampling method can be contrasted with random sampling, in which a sampling set may fail to contain vectors located at low/high operating ranges. Extrapolation of the kriging predictor to the feasible region extremes may result in the generation of correspondingly inaccurate models. Nonoverlapping convex hulls are employed in order to minimize the number of localized kriging predictors needed. In order for optima to be identified in Step 3 using analytical kriging predictors, the solution of a set of complex nonlinear gradient equations is needed. In step 2, local models are generated by making predictions at a test point set within each subregion. Each prediction is generated as a sum of weighted sampling data, where a unique weight is generated for each sampled function value at a given convex hull vertex. The weights are easily obtained for an arbitrary test point based on the solution of a small linear program. In step 3, optima are identified either by determining the best predictions from a corresponding set obtained over a grid of test points, or applying gradient methods to an analytical form of the kriging model. In step 4, the purpose of convex hull splitting is to enable the generation of more accurate predictors using increasingly localized sampling information. In step 5, the repetition of steps 2 ? 4 enables more accurate optima to be identified due to the higher number of more localized models being employed. The number of NLP problems solved at each iteration, to find model optima, is higher than the corresponding number at the previous iteration. However, the additional computational expense is balanced against the possible discovery of improved optima. The procedure terminates when further improvement in sampled optima cannot be accomplished, and at this point response surface methodology, direct search, and branch-and-bound are used to further refine the best kriging solutions. Numerical and industrial case study examples are both presented to provide proof of concept.

References:

[1] Goovaaerts, P., 1997, Geostatistics for Natural Resources Evolution.

[2] Isaaks, E., Srivistava, R., 1989, An Introduction to Applied Geostatistics.

[3] Myers, R. and D. Montgomery, 2002, Response Surface Methodology.

[4] Davis, E. and M. Ierapetritou (2007) Adaptive Optimization of noisy black-box functions inherent in microscopic models, Comp. Chem. Eng. 31, 466-476.

[5] Davis, E. and M. Ierapetritou (2007) A Kriging Method for the Solution of Nonlinear Programs with Black-Box Functions, AIChE J., 53, 2001-2012.

[6] Davis, E. and M. Ierapetritou (2007) A Kriging Based Method for the Solution of Mixed-Integer Nonlinear Programs Containing Black-Box Functions, JOGO, DOI 10.1007/s10898-007-9217-2.

[7] Davis, E. and M. Ierapetritou (2008) A Kriging-Based Approach to MINLP containing Black-Box Models and Noise, Ind. Eng. Chem. Res., Accepted for Publication.