(558a) A Novel Constrained Bayesian Optimization Method for Computationally Expensive Grey-Box Models with Composite Objective and Constraint Functions | AIChE

(558a) A Novel Constrained Bayesian Optimization Method for Computationally Expensive Grey-Box Models with Composite Objective and Constraint Functions

Authors 

Paulson, J. - Presenter, The Ohio State University
Lu, C., Ohio State University, Department of Chemical and
Modern nonlinear programming (NLP) solvers can efficiently solve very large-scale optimization problems when derivatives can be exactly computed [1]–[3]. However, the development and formulation of multi-scale process models for which derivative information is readily available remains a significant challenge in many real-world applications. A particularly challenging class of problems is when at least one component of the model is very expensive or time-consuming to evaluate including high-fidelity computer simulations (e.g., molecular dynamics calculations) or laboratory experiments (e.g., measurement of critical material, chemical, or biological properties). In such cases, one must often resort to “black-box” optimization methods (as opposed to rigorous “white-box” NLP approaches) as they assume very little about the structure of the objective and/or the constraint functions. Bayesian optimization (BO) is one of the most widely applied approaches for data-driven / simulation-based optimization in recent years due to its data efficiency [4]–[7]. BO relies on the construction of Gaussian process (GP) surrogate models to represent the posterior distribution for the objective and/or constraints [8]. By combining the posterior distributions with an acquisition function, BO selects the next point at which to evaluate the unknown functions in a way that tradeoffs the exploration of regions where the surrogate model is uncertain and exploitation of the model’s confidence in good (feasible and low-cost) locations.

Although BO has been empirically shown to perform very well in a variety of complex engineering applications in which the problem dimensions is fairly small (on the order of 10 or less), its sample efficiency tends to decrease as problem size increases due to exponential growth in the size of the search space [9]. This is mostly due to the fact that the developed GP model(s) ignore any structure in the underlying objective or constraints. In many practical engineering problems, however, only a portion of the model is not known, so that a more realistic representation is a “grey-box” problem where the objective and constraints can be thought of as having a composite structure, e.g., f(x) = g(h(x)) where g and h are white- and black-box functions, respectively. For example, composite objectives appear in the calibration of expensive process simulators to data wherein h(x) denotes the simulator output data for a given set of parameters x, which should match measured data ymeas as closely as possible. In this case, the objective to be minimized is often of the form g(h(x)) = ||h(x)-ymeas|| where ||.|| is some norm or monotonic transformation of the likelihood of the measurement errors. Recent work in [10] showed significant improvements in convergence rate over traditional BO when the acquisition function is modified to account for the composite structure. Extending this approach to consider composite equality and/or inequality constraints is not trivial, as it relies on a stochastic gradient descent algorithm, whose convergence is not easily guaranteed in the presence of nonlinear and non-convex constraints. One way to overcome these challenges is to resort to a trust region filter algorithm, similar to that introduced in [11], [12], that provides strong convergence guarantees to a local optimum. A disadvantage of such an approach, however, is the lack of a global model for h that may result in convergence to a highly suboptimal local solution. Another possibility is to fit a deterministic surrogate model (that ignores any relevant uncertainty information) and solve the resulting optimization to global optimality [13],[14]. However, this type of adaptive sampling method can be suboptimal in that it is purely exploitative approach that does not systematically address the exploitation/exploration tradeoff.

In this work, we propose a novel algorithm that addresses the aforementioned challenges in the context of constrained grey-box optimization problems; we refer to this algorithm as COBALT (COnstrained Bayesian optimizAtion of computationaLly expensive grey-box models exploiting derivaTive information), which is based on several improvements relative to previous approaches. First, building on [10], we extend the expected improvement (EI) acquisition function to composite functions using a sample average approximation (SAA). We then show how to further modify EI to reduce its degree of multimodality through proper choices of scale and shifting factors that are related to the predicted mean objective, which makes it easier to locate the global maximum of the subproblems. To account for grey-box constraints, we then extend the upper trust bound (UTB) framework presented in [15] to composite functions. The UTB method, as opposed to alternative constraint handling methods such as [16], directly promotes exploration of the feasible region by incorporating variance information into the definition of the constraints. Since the variance cannot be analytically computed in the case of composite GP functions, we propose a moment-based approximation of the constraints that can be efficiently optimized using state-of-the-art NLP solvers. We demonstrate the effectiveness of COBALT, relative to traditional BO and other readily available black-box optimizers, on several benchmark global optimization problems as well as a challenging, realistic parameter estimation problem derived from a genome-scale bioreactor model in [17]. We observed that COBALT provided multiple orders of magnitude improvement in terms of the average regret (i.e., difference between best found solution and true global optimum) versus number of black-box function evaluations in all considered cases.

References

[1] L. T. Biegler and V. M. Zavala, “Large-scale nonlinear programming using {IPOPT}: An integrating framework for enterprise-wide dynamic optimization,” Comput. Chem. Eng., vol. 33, pp. 575–582, 2009.

[2] N. V. Sahinidis, “BARON: A general purpose global optimization software package,” J. Glob. Optim., vol. 8, no. 2, pp. 201–205, 1996, doi: 10.1007/bf00138693.

[3] R. Misener and C. A. Floudas, “ANTIGONE: Algorithms for coNTinuous / Integer Global Optimization of Nonlinear Equations,” J. Glob. Optim., vol. 59, no. 2–3, pp. 503–526, 2014, doi: 10.1007/s10898-014-0166-2.

[4] F. Sorourifar, G. Makrygirgos, and A. Mesbah, “A Data-driven automatic tuning method for MPC under uncertainty using constrained Bayesian optimization,” in IFAC Symposium on Advanced Control of Chemical Processes, 2021 (accepted).

[5] D. C. Manheim and R. L. Detwiler, “Accurate and reliable estimation of kinetic parameters for environmental engineering applications: A global, multi objective, Bayesian optimization approach,” MethodsX, vol. 6, pp. 1398–1414, 2019, doi: 10.1016/j.mex.2019.05.035.

[6] R. R. Griffiths and J. M. Hernández-Lobato, “Constrained bayesian optimization for automatic chemical design,” arXiv Prepr. arXiv1709.05501, 2017.

[7] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas, “Taking the human out of the loop: A review of Bayesian optimization,” Proceedings of the IEEE, vol. 104, no. 1. Institute of Electrical and Electronics Engineers Inc., pp. 148–175, Jan. 01, 2016, doi: 10.1109/JPROC.2015.2494218.

[8] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient Global Optimization of Expensive Black-Box Functions," , vol. 13, no. 4, pp. 455-492, 1998.,” J. Glob. Optim., vol. 13, pp. 455–492, 1998.

[9] S. Flaxman, A. Gelman, D. Neill, and A. G. Wilson, “Fast hierarchical Gaussian processes,” Manuscr. Prep., pp. 1–18, 2015.

[10] R. Astudillo and P. I. Frazier, “Bayesian Optimization of Composite Functions,” in Proceedings of the 36th International Conference on Machine Learning, May 2019, pp. 354–363, Accessed: Apr. 12, 2021. [Online]. Available: http://proceedings.mlr.press/v97/astudillo19a.html.

[11] J. P. Eason and L. T. Biegler, “A trust region filter method for glass box/black box optimization,” AIChE J., vol. 62, no. 9, pp. 3124–3136, Sep. 2016, doi: 10.1002/aic.15325.

[12] J. P. Eason and L. T. Biegler, “Advanced trust region optimization strategies for glass box/black box models,” AIChE J., vol. 64, no. 11, pp. 3934–3943, 2018, doi: 10.1002/aic.16364.

[13] F. Boukouvala and C. A. Floudas, “ARGONAUT: AlgoRithms for Global Optimization of coNstrAined grey-box compUTational problems,” Optim. Lett., vol. 11, no. 5, pp. 895–913, 2017, doi: 10.1007/s11590-016-1028-2.

[14] S. H. Kim and F. Boukouvala, “Surrogate-based optimization for mixed-integer nonlinear problems,” Comput. Chem. Eng., vol. 140, p. 106847, 2020, doi: 10.1016/j.compchemeng.2020.106847.

[15] R. Priem, N. Bartoli, Y. Diouane, and A. Sgueglia, “Upper Trust Bound Feasibility Criterion for Mixed Constrained Bayesian Optimization with Application to Aircraft Design,” arXiv, 2020.

[16] J. R. Gardner, M. J. Kusner, Z. E. Xu, K. Q. Weinberger, and J. P. Cunningham, “Bayesian Optimization with Inequality Constraints.,” in ICML, 2014, vol. 2014, pp. 937–945.

[17] J. A. Paulson, M. Martin-Casas, and A. Mesbah, Fast uncertainty quantification for dynamic flux balance analysis using non-smooth polynomial chaos expansions, vol. 15, no. 8. 2019, pp. 1–35.