(193e) A New Tool for Selection Surrogate Modeling Technqiues for Design Space Approximation and Surrogate-Based Optimization | AIChE

(193e) A New Tool for Selection Surrogate Modeling Technqiues for Design Space Approximation and Surrogate-Based Optimization


Williams, B. - Presenter, Auburn University
Cremaschi, S., Auburn University
A New Tool for Selecting Surrogate Modeling Techniques for Design Space Approximation and Surrogate-Based Optimization

Session: Advances in Machine Learning and Intelligent Systems

Bianca Williams, Selen Cremaschi

Surrogate models are used to map input data to output data when the actual relationship between the two is unknown or computationally expensive to evaluate (Han and Zhang, 2012). Surrogate models can be constructed for use in design space approximation, which attempts to model the overall behavior of the data, and in surrogate-based optimization, which allows for optimization when a closed analytical form of the relationship between input data and output data does not exist or is not conducive for use in traditional gradient based optimization methods. Some recent examples include several process synthesis applications, for example, in optimization of carbon fiber production plant energy consumption (Golkaranarenji et al., 2018), and process controls applications in the pharmaceutical production industry (Icten et al., 2015).

Several machine learning and regression techniques have been developed for surrogate model construction. However, there has been little work on how to best select the appropriate model for a particular application either for design space approximation or surrogate based optimization. For studies applying surrogate modeling techniques for process design and optimization, models are mostly selected based on user specific expertise and familiarity with particular techniques. Previous work on this topic has shown that the performance for approximation is dependent on data characteristics such as the input dimension and the underlying function shape (Davis et al. 2017; Williams and Cremaschi, 2018; Williams and Cremaschi, 2019). Recent works by Cui et al. (2016) and Garud et al. (2018) have made progress in generalizing the process for selecting a surrogate model to approximate a design space by using meta-learning approaches to build selection frameworks. These frameworks extract information from the data being modeled and use that information to provide recommendations for which surrogate modeling technique(s) would be appropriate. The dataset information is extracted in the form of attributes that are calculated based on the input and output values. The attributes include common statistical measures, such as mean and standard deviation, gradient based attributes, and attributes related to the extrema of the output values. Each of these frameworks gives a recommendation for which surrogate model form would be best based on the calculated attributes. In addition, the framework developed by Garud et al. gives a ranking of all the considered surrogate models based on their predicted accuracies. This strict ranking of performance and identification of one technique as the best one may be restrictive, as multiple models might be similar in terms of their accuracies in approximating the design space. Furthermore, the model complexity is not taken into account by either framework, which can lead to overfitting. Selection of appropriate surrogate modeling techniques for surrogate-based optimization is still an open challenge.

This work aims to address this knowledge gap by building a recommendation tool that uses data attributes as inputs to predict the performance of surrogate modeling techniques and provide recommendations for which techniques to use based on those predictions for both design space approximation and surrogate based optimization. Forty-seven optimization test functions from the Virtual Library of Simulation Experiments (Surjanovic and Bingham, 2013) were used to generate 101 datasets for use in extracting information to construct the recommendation tool. The tool considers all attributes suggested by Cui et al. (2016) and Garud et al. (2018). In addition, we define 12 new attributes related to both the estimated gradients of the datasets and the extreme values of the outputs. The performance metrics used for evaluating the surrogate modeling technique are the adjusted R-squared value (Miles, 2014), which considers both model accuracy and complexity, for design space approximation and the distance between the extreme point(s) estimated by the models and the actual extrema of the true model for surrogate based optimization. The attributes that have the strongest relationships with the performance metrics are determined using feature reduction methods, including the ReliefF algorithm (Kira and Rendell, 1992) and principal component analysis (Hotelling, 1933). The tool utilizes a random forest model (Brieman, 2001) to link the identified attributes to the performance metrics (Miles, 2014) for each surrogate modeling technique considered. The current tool considers these surrogate-modeling techniques: Artificial Neural Networks, Automated Learning of Algebraic Models using Optimization (ALAMO), Radial Basis Networks, Extreme Learning Machines, Gaussian Progress Regression, Random Forests, Support Vector Machines, and Multivariate Adaptive Regression Splines. For a given dataset, after the value of the performance metric is predicted by the random forest model for each surrogate modeling technique considered, the tool classifies each technique as being recommended or not recommended. In order to evaluate the quality of the classifications made by the tool, we performed five-fold cross validation with the 101 generated datasets. For each fold, the actual performance metric from each of the techniques for the validation data were used to assign them their actual classification of recommended or not recommended. The predicted classifications made by the recommendation tool were compared to the actual classifications for evaluating the quality of the predictions. The selection tool identified which surrogates should be recommended for design space approximation for the dataset correctly with an average accuracy of 98% and a precision, or the probability that a surrogate modeling technique predicted to be recommended should actually be recommended, of 93%.


Brieman, L. (2001). Random Forests. Machine Learning, 45, 5.

Davis, S., Cremaschi, S., Eden, M., 2017, “Efficient Surrogate Model Development: Optimum Model Form Based on Input Function Characteristics”,Computer Aided Chemical Enginering 70.1, 457-462.

Garud, S. et al. (2018). Learning based Evolutionary Assistive Paradigm for Surrogate Selection. Computers and Chemical Engineering., 119, 352.

Golkaranarenji, G. et al. (2018). Support vector regression modelling and optimization of energy consumption in carbon fiber production line. Computers and Chemical Engineerig.,109, 276.

Han, Z. and Zhang, K. (2012). Surrogate-Based Optimization. Real-World Applications of Genetic Algorithms. InTech Europe, Rijeka, Croatia.

Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology. 24, 417-441 and 498-5210.

Icten, E., Nagy, Z., Reklaits, G, 2015, “Process control of a dropwise additive manufacturing system for pharmaceuticals using polynomial chaos expansion based surrogate model”, 2015, Computers and Chemical Engineering 83.1, 221-231.

Kira, K. and Rendell, L. (1992). A Practical Approach to Feature Selection. Proceedings of the Ninth International Workshop on Machine Learning. 249-256.

Miles, J. (2014). R Squared, Adjusted R Squared. Encyclopedia of Statistics in Behavioral Science.

Surjanovic, S., Bingham, D., 2013, “Virtual Library of Simulation Experiments: Test Functions and Datasets”, http://www.sfu.ca/~ssurjano.

Williams, B., Cremaschi, S. (2018), “Comparison of Surrogate Modeling Techniques for Surrogate-Based Optimization”, AIChE Annual Meeting, October 28 – November 2, 2018, Pittsburgh, PA.

Williams, B., Cremaschi, S. (2019), “Surrogate Model Selection for Design Space Approximation and Surrogate-Based Optimization”, FOCAPD 2019, July 14 – 18, 2019, Copper Mountain Resort, CO.