(402b) Updated LEAPS2 for Surrogate Recommendation

Conference

AIChE Annual Meeting

Year

2020

Proceeding

2020 Virtual AIChE Annual Meeting

Group

Computing and Systems Technology Division

Session

Advances in Machine Learning and Intelligent Systems I

Time

Tuesday, November 17, 2020 - 8:15am to 8:30am

Authors

Ahmad, M. - Presenter, National University of Singapore

Karimi, I. A., National University of Singapore

The growing need for accurately modeling complex physical systems is increasing the complexity of high-fidelity models. Simpler, often analytical, computationally inexpensive surrogates or meta-models offer an attractive alternative. Surrogates are data-driven models that mimic input and output patterns in the data by means of response surfaces. Selecting a surrogate model that approximates a complex system most accurately is critical. A straightforward approach for this selection problem is to â€œtryâ€ several surrogates and select the best. Several other approaches exist in the literature. Genetic Programming (GP) has been used (Koza 1994; Streeter and Becker 2003; Lessmann, Stahlbock, and Crone 2006) to derive an optimum combination of operators and simple basis functions defining a surrogate model. Cozad, Sahinidis, and Miller (2014) developed a low-complexity, accurate model (ALAMO) that uses MILP optimization to identify the best mix of basis functions. Apart from these, MINLP formulations (Cozad and Sahinidis 2018) and extended GP (Kaizen Programming) (Rad, Feng, and Iba 2018) have also been used to good effect. All these works must be exhaustively applied to every new data set to determine the best surrogate. However, a smarter, faster, and learning-based approach is to first unearth patterns that match the meta-features or attributes of a data set with surrogate model performance. Such meta-learning can help select the surrogate for a future data set. Cui et al. 2016 and Garud, Karimi, and Kraft (2018) developed this basic idea into CRS and LEAPS2 frameworks respectively. Later, Davis, Cremaschi, and Eden 2018 studied the performance of surrogates with respect to sample sizes, input dimensions, and shapes of input functions. Although LEAPS2 addressed the limitations of CRS with respect to sample sizes, dimensionality, and surrogates, LEAPS2 has its shortcomings. It was trained only on noise-free synthetic data, so it may recommend an over-fitting model for real-world, noisy data. Moreover, LEAPS2 used an error-based metric that requires splitting the data into train/test sets. Furthermore, LEAPS2 includes both data-distribution based attributes such as local and global fluctuations, dimensionality, etc. along with statistical data-based attributes like mean, standard deviation, gradient, etc. Intuitively, it appears that only distribution-based attributes should determine surrogate performance, as it is the underlying trends or features in data that determine surrogate performance.

In this work, we modified and broadened the scope of LEAPS2 in several significant ways. First, we incorporated noisy and real-world data sets to address a key challenge in surrogate modeling. Second, we added one more metric for surrogate selection, namely a complexity-based metric called AIC weight. This metric provides an alternative for surrogate selection when splitting the dataset into train/test sets is not feasible. Third, we essentially revamped the attribute set of LEAPS2 to use only those attributes that quantify the underlying features of â€œdata-distributionâ€, rather than the data itself. Some of these new attributes quantify the degree and variations of non-linearity in the data, asymmetry, and flatness of response with respect to standard distribution. Thus, we now have fewer (11 vs 14) but intuitively more appealing attributes in LEAPS2. Fourth, we have improved the surrogate recommendation strategy by developing simple heuristics. Finally, we have updated our surrogate pool by adding 10 new surrogates in LEAPS2. Our improved LEAPS2 framework was evaluated with respect to the two metrics (Garud et al. 2018), namely â€œTotal Degree of Successâ€ (TDoS) that quantifies the success in recommending the best surrogates, and â€œTotal Coefficient of Rewardâ€ (TCoR) that combines the success and computational savings in a single score. The new framework gives a TDoS = 91% and a TCoR = 42% for the error-based metric, and TDoS = 83% but a much higher TCoR = 63% for AIC weight on test data. However, they improved during the learning process. We tested the new framework on two case studies with real data, one on a compressor, and the other on COVID-19 data. In both cases, our improved LEAPS2 achieved a TDoS of 100%. This framework acts as a smart tool for surrogate selection to model complex physical systems.

References:

Cozad, Alison, and Nikolaos V. Sahinidis. 2018. â€œA Global MINLP Approach to Symbolic Regression.â€ Mathematical Programming 170 (1): 97â€“119. https://doi.org/10.1007/s10107-018-1289-x.

Cozad, Alison, Nikolaos V. Sahinidis, and David C. Miller. 2014. â€œLearning Surrogate Models for Simulation-Based Optimization.â€ AIChE Journal 60 (6): 2211â€“27. https://doi.org/10.1002/aic.14418.

Cui, Can, Mengqi Hu, Jeffery D. Weir, and Teresa Wu. 2016. â€œA Recommendation System for Meta-Modeling: A Meta-Learning Based Approach.â€ Expert Systems with Applications 46 (March): 33â€“44. https://doi.org/10.1016/j.eswa.2015.10.021.

Davis, Sarah E., Selen Cremaschi, and Mario R. Eden. 2018. â€œEfficient Surrogate Model Development: Impact of Sample Size and Underlying Model Dimensions.â€ In Computer Aided Chemical Engineering, 44:979â€“84. Elsevier. https://doi.org/10.1016/B978-0-444-64241-7.50158-0.

Garud, Sushant S., Iftekhar A. Karimi, and Markus Kraft. 2018. â€œLEAPS2: Learning Based Evolutionary Assistive Paradigm for Surrogate Selection.â€ Computers & Chemical Engineering 119 (November): 352â€“70. https://doi.org/10.1016/j.compchemeng.2018.09.008.

Koza, JohnR. 1994. â€œGenetic Programming as a Means for Programming Computers by Natural Selection.â€ Statistics and Computing 4 (2). https://doi.org/10.1007/BF00175355.

Lessmann, Stefan, Robert Stahlbock, and Sven F Crone. 2006. â€œGenetic Algorithms for Support Vector Machine Model Selection,â€ 7.

Rad, Hossein Izadi, Ji Feng, and Hitoshi Iba. 2018. â€œGP-RVM: Genetic Programing-Based Symbolic Regression Using Relevance Vector Machine.â€ ArXiv:1806.02502 [Cs], August. http://arxiv.org/abs/1806.02502.

Streeter, Matthew, and Lee A Becker. 2003. â€œAutomated Discovery of Numerical Approximation Formulae via Genetic Programming,â€ 32.

Topics

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

Upcoming Conferences & Events

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

2024 DIERS Virtual Spring Meeting

2024 Pacific Northwest Student Regional Conference

2024 Western Student Regional Conference

CCPS Middle East Regional Meeting

Hydrogen Fueling Station Safety

Streamlining Permit-to-Work Processes With a Digital Solution

6th Middle East Process Engineering Conference and Exhibition

CEP: April 2024

CEP: March 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(402b) Updated LEAPS2 for Surrogate Recommendation

AIChE Annual Meeting

2020

2020 Virtual AIChE Annual Meeting

Computing and Systems Technology Division

Advances in Machine Learning and Intelligent Systems I

Tuesday, November 17, 2020 - 8:15am to 8:30am

Authors

Topics

More Conference Links

Contact Us

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams