(126b) Towards Developing a Learning Based Evolutionary Assistive Paradigm for Surrogate Selection (LEAPS2) | AIChE

(126b) Towards Developing a Learning Based Evolutionary Assistive Paradigm for Surrogate Selection (LEAPS2)

Authors 

Garud, S. S. - Presenter, National University of Singapore
Karimi, I. A., National University of Singapore
Kraft, M., Uiv of Cambridge
Complex systems are typically studied via physical experiments, computer experiments or their combinations. Physical experiments involve conducting laboratory trials, making field observations, etc. and usually compute intensive. Besides, some experiments may not even be feasible in practice. In such cases, computer experiments may be preferred over physical experiments. These involve experimenting on a rigorous first-principles or physics-based model instead of a real system. Revolutionary advances in algorithmic and computing technologies over the last two decades have empowered researchers to incorporate greater details and accuracy into such models. However, this comes at the expense of larger model size and greater computational burden. Repetitive evaluations of such high-fidelity models for tasks such as sensitivity analysis and optimization often prove uneconomical. Commercial simulators, which are essentially black-box in nature, are often used for developing and simulating these high-fidelity models. Additionally, the development and implementation of such models demand deep domain expertise, which makes them inaccessible to users from other disciplines. Therefore, it is beneficial to replace high-fidelity models (in case of computer experiments) or real physical systems (in case of physical experiments) by computationally cheaper surrogate models that offer a simpler overall picture of the underlying system.

A surrogate model, also known as a meta-model or a response surface, is an empirical expression quantifying the relationships among the most important or relevant inputs and outputs of a system. It is computationally much cheaper alternative to a physical system or its high-fidelity model and is easily comprehensible to users with little domain knowledge. Its construction requires one to obtain the system response at several sample points via experiments or simulations. Then, one needs to choose a form and technique for the surrogate model development. The form is a mathematical relationship between the inputs and outputs, while the technique is a procedure to derive the best parameters of that form for the sampled input-output data. These two steps together yield a final model that we call the surrogate model. Henceforth, we will use “modelling technique” to mean the surrogate modelling technique, and “surrogate” to mean the final surrogate model.

The literature offers a variety of modelling techniques such as polynomial response surface models, support vector regression, kriging, radial basis functions (RBF), multivariate adaptive regression splines, and artificial neural networks. Many techniques (e.g. kriging and RBF) offer a variety of functional forms, hence can yield several surrogates. Therefore, selecting the best modelling technique and the final surrogate are non-trivial tasks that have largely been done based on trial and error or intuition. Therefore, we focus on the choice of the modelling technique, corresponding functional form, and the factors that impact these decisions. To this end, our work addresses the following two key questions that a typical user may ask while developing a surrogate from a given set of input-output data.

  1. Which is the best surrogate for approximating my system/data set?
  2. Is there a systematic, automated, user-friendly, efficient, and reliable procedure for selecting a few surrogates that are likely to be the best?

Although several previous works have attempted to answer the above questions, most are straightforward and enumerative benchmarking studies. They numerically compare and rank the surrogates for some well-known test functions. Inferences from such studies are obviously limited and their extrapolation to unknown systems may not be reliable. Naturally, this has inspired many researchers to explore alternate approaches [1, 2]. Their key idea is to use a set of simple basis functions (e.g. sin, cos, exp, etc.), and then iteratively evolve their best mix by solving a series regression problems. In other words, this approach is more adaptive and more system-targeted than the earlier benchmarking studies. It is further enhanced by some researchers by employing a variety of more complex surrogates in place of simple analytical functions [3]. Despite their generalized appeal, all these approaches consider surrogate selection as a stand-alone task. For every system, a user needs to solve an optimization problem that guides surrogate selection. Solving such optimization problems is not only arduous but also compute-intensive; and defeats the underlying incentive for using surrogates.

In this work, we develop an evolving knowledge-based framework that defines and computes prior information from sampled data of several systems to aid surrogate selection for other systems. We call this Learning based Evolutionary Assistive Paradigm for Surrogate Selection (LEAPS2) and its key features are as follows:

  • It relies on the most commonly employed modelling techniques and their corresponding functional forms.
  • It defines several system attributes (based on the response, gradient, and extrema) and performance metrics to characterize and quantify information about surrogate selections for various systems.
  • It utilizes this prior information to derive knowledge that drives the surrogate selection for unknown systems with minimal surrogate construction efforts.
  • It can perform surrogate selections over wide ranges of dimensions and sample sizes; thus, making it practically useful.
  • It enables users to add new modelling techniques as well as system attributes without disturbing the existing architecture of the paradigm.

LEAPS2 relies on the concept of knowledge pyramid to select surrogates based on the input-output data alone and with minimal surrogate fitting efforts. The knowledge pyramid involves three phases: (1) Data collection, (2) Information extraction, and (3) Knowledge derivation. The development of LEAPS2 begins with the data collection phase involving the following steps:

(D1) Select 66 test functions as representative systems.

(D2) Select 25 surrogates resulting from six modelling techniques mentioned earlier.

(D3) Use Sobol sampling (QS) to generate four input-output data sets for each test function. This amounts to total 264 data sets.

(D4) Construct 25 surrogates for each input-output data set yielding 6600 surrogates in total.

This yields a rich set of system-surrogate data which can be updated dynamically as and when more data become available. We then process these data to extract information.

(I1) Compute system attributes to extract the essence of the data sets from (D3).

(I2) Compute performance metrics to quantify the quality of the surrogates.

From the information computed in steps (I1) and (I2), we quantify and embed knowledge into LEAPS2 using regression tree ensemble.

(K1) Discard correlated attributes via the attribute selection called RReliefF.

(K2) Correlate the selected system attributes with surrogate performance metrics using regression tree ensemble. This serves as the brain of LEAPS2 for future surrogate selections. The fully-grown ensemble then can be used to select the best possible surrogate out of 25 surrogates for any given data set.

The learning and evolution of LEAPS2 towards a powerful system modelling tool can occur along three dimensions, viz. data sets, attributes, and surrogates. Our progressive development of LEAPS2 in this work shows its evolution with respect to data sets. As we added more and more data sets progressively, LEAPS2 learnt. In other words, by progressively adding data, we demonstrate that LEAPS2 learns to improve computational efficiency and functional accuracy. Apart from data sets, the architecture of LEAPS2 enables its evolution via more attributes and surrogates. The addition of the new attributes will equip LEAPS2 for better system identification while the new surrogates will provide more options for system approximation. Overall, such learning will impart more versatility to LEAPS2 and extend its ability to handle more complex systems.

Besides its learning ability, we demonstrate the practicality of LEAPS2 by employing it to recommend surrogates for estimating the bubble and dew point temperatures of LNG. Interestingly, our assistive tool suggests a different surrogate for each temperature, and hints that DPT may be harder to approximate than BPT. Finally, we compare the performance of LEAPS2 with the metalearning-based recommendation scheme (CRS) of Cuit et al. [4]. CRS in contrast to LEAPS2 recommends only a surrogate modelling technique rather than a surrogate model. Therefore, a straightforward comparison is not entirely fair. However, we can still compare CRS and LEAPS2 based on their success in identifying the best modelling technique. To this end, we use the LEAPS2 to recommend the best modelling techniques for all 264 data sets. LEAPS2 identifies the true best for 236 data sets, thus achieves coefficient of success (CoS) of 0.89. CRS achieves an average CoS of 0.84 [4]. Clearly, LEAPS2 performs better than CRS in recommending modelling techniques despite its generalized philosophy.

In brief, LEAPS2 is a comprehensive, practical, and assistive data-driven surrogate selection paradigm that holds much promise.

References:

[1] Lessmann, S., Stahlbock, R., & Crone, S. F. (2006). Genetic algorithms for support vector machine model selection. In Neural Networks, 2006. IJCNN’06. International Joint Conference on (pp. 3063–3069). IEEE.

[2] Yeun, Y.-S., Ruy, W.-S., Yang, Y.-S., & Kim, N.-J. (2004). Implementing linear models in genetic programming. IEEE transactions on evolutionary computation, 8, 542–566.

[3] Goel, T., Haftka, R. T., Shyy, W., & Queipo, N. V. (2007). Ensemble of surrogates. Structural and Multidisciplinary Optimization, 33, 199–216.

[4] Cui, C., Hu, M., Weir, J. D., & Wu, T. (2016). A recommendation system for metamodeling: A meta-learning based approach. Expert Systems with Applications, 46, 33–44.