Designing new plants and processes often requires solution of large-scale optimization problems. Therefore, data-driven models find various use cases in process optimization, e.g., to substitute implicit equations , as surrogate models of complex units for superstructure optimization , or to bridge scales for model order reduction in large-scale simulation . Most research in surrogate modeling for optimization focuses on improving computational efficiency by leveraging synergies between various optimization algorithms and surrogate models [4,5,6], with a tacit assumption that surrogate model accuracy can be achieved by sufficient data or model fidelity. When modelers desire to find as compact surrogate models as possible, they inevitably run into a regime where a trade-off with model accuracy is required. In such cases, the surrogate models may exhibit scarce distinct errors. Regular training of artificial neural networks (ANNs) and other surrogate models does not guard against these errors in function approximation but dismisses them as outliers and, thus, sacrifices local accuracy in order to minimize the average loss. While such behavior is desired in native applications of ANNs, where data is noisy and prone to errors, it needs to be avoided in engineering design problems. It is also not justified in model order reduction where data is generated by evaluating some simulation and thus considered error-free. One possibility to address this challenge is to employ adaptive sampling [7,8], which attempts to increase the density of training samples at points where the function fit is subpar. Cozad et al.  have used bilevel optimization to enforce various non-negativity constraints on surrogate models. However, there is presently no method to train models to a guaranteed minimal worst-case error. Recently, Schweidtmann et al.  have demonstrated that deterministic global optimization in the solver MAiNGO  can efficiently compute a guaranteed worst-case error between a neural network surrogate model and the original model using a reduced-space formulation  for finding global optima for problems with ANNs embedded . We extend this idea and propose a bilevel optimization algorithm for training surrogate models with minimal worst-case error. The basic procedure of our implementation follows the cutting-plane algorithm of Blankenship and Falk . The upper level consists of surrogate model training using uniform norm as a loss metric, whereas the lower level finds the worst-case model error by means of global optimization. In the lower level, we use MAiNGO . When applied to surrogate models with convex training problems, our proposed algorithm guarantees to find the model representation with minimal worst-case error. For ANNs, which generally possess a non-convex training problem, we are able to demonstrate the efficacy of our algorithm empirically. We show that in some cases our approach can even be used to fine-tune regularly pretrained ANNs. Finally, we compare the bilevel training to non-deterministic approaches, e.g., adaptive sampling , that can be employed to improve the worst-case accuracy of a surrogate model.
 Schweidtmann, A. M.; Huster, W. R.; LÃ¼thje, J. T.; Mitsos, A. (2019): Deterministic global process optimization: Accurate (single-species) properties via artificial neural networks. In Computers & Chemical Engineering 121, 67â74.
 Henao, C.A.; Maravelias, C. T. (2011): Surrogate-based superstructure optimization framework. In AIChE Journal 57(5), 1216-1232.
 Tsay, C.; Baldea, M. (2019): 110th Anniversary: Using Data to Bridge the Time and Length Scales of Process Systems. In Industrial & Engineering Chemistry Research. 58 (36), 16696â16708.
 Cozad, A.; Sahinidis, N. V.; Miller, D. C.: Learning surrogate models for simulation-based optimization. In AIChE Journal 60(6), 2211-2227 (2014).
 Schweidtmann, A. M.; Mitsos, A. (2018): Deterministic Global Optimization with Artificial Neural Networks Embedded. In Journal of Optimization Theory and Applications, 180 (3), 925â948.
 Mistry, M.; Letsios, D.; Krennrich, G.; Lee, R. M.; Misener, R. (2020). Mixed-integer convex nonlinear optimization with gradient-boosted trees embedded. INFORMS Journal on Computing.
 Eason, J.; Cremaschi, S. (2014): Adaptive sequential sampling for surrogate model generation with artificial neural networks. In Computers & Chemical Engineering, 68, 220â232.
 Wilson, Z. T.; Sahinidis, N. V. (2017): The ALAMO approach to machine learning. In Computers & Chemical Engineering, 106, 785-795.
 Cozad, A.; Sahinidis, N. V.; Miller, D. C.: A combined first-principles and data-driven approach to model building. In Computers & Chemical Engineering, 73, 116-127 (2015).
 Schweidtmann, A. M.; Bongartz, D.; Huster, W. R.; Mitsos, A. (2019): Deterministic Global Process Optimization: Flash Calculations via Artificial Neural Networks. In 29th European Symposium on Computer Aided Process Engineering, 46, 937â942.
 Bongartz, D.; Najman, J.; Sass S.; Mitsos A. (2018): MAiNGO: McCormick-based Algorithm for mixed-integer Nonlinear Global Optimization. In Technical Report. Process Systems Engineering (AVT. SVT), RWTH Aachen University. https://git.rwth-aachen.de/avt.svt/public/maingo
 Bongartz, D.; Mitsos, A. (2017): Deterministic global optimization of process flowsheets in a reduced space using McCormick relaxations. In Journal of Global Optimization, 69 (4), 761â796.