(104h) Families of Data-Driven Surrogates Based on Accuracy and Complexity

Conference

AIChE Annual Meeting

Year

2021

Proceeding

2021 Annual Meeting

Group

Computing and Systems Technology Division

Session

Advances in Machine Learning and Intelligent Systems I

Time

Monday, November 8, 2021 - 2:15pm to 2:30pm

Authors

Ahmad, M. - Presenter, National University of Singapore

Karimi, I., National University of Singapore

With rapid advancements in computational technologies and swift progress of Industry 4.0, one can incorporate finer details in digital twins to model intricate processes better. However, this comes at the expense of large computational burden associated with high-fidelity models. Data-driven surrogates or meta-models offer computationally cheaper alternatives to complex digital twins. They build approximate response surface by learning correlations between process inputs and outputs. Any surrogate model comprises of two components, a modeling technique and surrogate form (Garud et al., 2018). While the former indicates an underlying algorithm to build a model, the latter constitutes the analytical form and functional details of a model. Different combinations of modeling techniques and surrogate forms can produce a long list of unique surrogate models. The performance of any surrogate model would depend on system nonlinearities, quality and quantity of sampled data, and flexibility a surrogate model possesses. Previous works have attempted to study effect of certain data-specific features on performance of surrogates. Davis et al., 2017 observed that a surrogateâ€™s predictive performance is influenced by number of input dimensions, sample size, sampling technique, and shape of the data-generating analytical function. They analyzed the impact of these characteristics on training times and predictive performances of eight surrogates. Recently, Williams and Cremaschi, 2021 extended this idea to provide rules-of-thumb to aid surrogate selection for modeling and optimization tasks. They drew conclusions by evaluating the predictive performance of eight surrogate models based on normalized root mean squared error and adjusted R2 over various data sets with different features. Bhosekar and Ierapetritou, 2018 analyzed the effect of sample size and sampling technique on performance of nine different variants of Kriging modeling technique. They also highlighted similar performance of some kriging variants. This observation was also made in our previous work (Garud et al., 2018, Ahmad and Karimi, 2021) on meta-learning-based surrogate selection paradigm, LEAPS2. Certain surrogates showed close performance over most noisy or non-noisy data sets. While such observations are mentioned cryptically in literature, it would be interesting to identify and extensively report similar performing surrogates across various modeling techniques.

Therefore, in this work, we aim to identify sets or families of similar surrogates from a pool of 50 surrogates. The surrogate performances were evaluated over various diverse data sets using two performance metrics. Coefficient of determination (R2) measures the predictive accuracy of a surrogate, while Surrogate Quality Score (SQS) (Ahmad and Karimi, 2021) takes into account model complexity in addition to accuracy. We used correlation coefficient to quantify the extent of agreement or similarity between the performances of any two surrogate models. This enabled us to identify pairs of similar surrogates and hence build families containing mutually similar surrogates. Our results revealed separate and very different families for non-noisy and noisy data sets, based on either performance metric. For non-noisy data sets, we obtained nine families based on both, R2 and SQS metrics. Although the families were almost alike for both performance metrics, they were not identical. Certain complex surrogates especially those belonging to support vector regression technique are penalized heavily by SQS. Hence, they belonged to different families based on R2 and SQS for non-noisy data. While most families comprised of surrogates with the same modeling technique, two families had many surrogates from different modeling techniques, for both performance metrics. For noisy data, surrogates belonging to kriging and radial basis function techniques do not belong to any family since they overfit. Naturally, these techniques are unsuitable for modeling noisy data. Hence, we obtained fewer families than that obtained for non-noisy data. Furthermore, the families based on R2 and SQS were contrasting. Seven families were identified based on R2, while only three were obtained based on SQS metric for noisy data. While some families based on R2 comprised of surrogates using separate techniques, each family based on SQS consisted of surrogates with identical modeling technique. Our families for noisy and non-noisy data sets have been validated by verifying similar surrogates of each family, for several new data sets not used for deriving the original families. Our proposed classification of surrogates into families opens up a computationally efficient way for surrogate selection without the need for exhaustive search across all surrogates.

References:

Bhosekar, A., Ierapetritou, M., 2018. Advances in surrogate based modeling, feasibility analysis, and optimization: A review. Computers & Chemical Engineering 108, 250â€“267. https://doi.org/10.1016/j.compchemeng.2017.09.017

Davis, S.E., Cremaschi, S., Eden, M.R., 2017. Efficient Surrogate Model Development: Optimum Model Form Based on Input Function Characteristics, in: Computer Aided Chemical Engineering. Elsevier, pp. 457â€“462. https://doi.org/10.1016/B978-0-444-63965-3.50078-7

Garud, S.S., Karimi, I.A., Kraft, M., 2018. LEAPS2: Learning based Evolutionary Assistive Paradigm for Surrogate Selection. Computers & Chemical Engineering 119, 352â€“370. https://doi.org/10.1016/j.compchemeng.2018.09.008

Williams, B., Cremaschi, S., 2021. Selection of Surrogate Modeling Techniques for Surface Approximation and Surrogate-Based Optimization. Chemical Engineering Research and Design. https://doi.org/10.1016/j.cherd.2021.03.028

Topics

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

6th Middle East Process Engineering Conference and Exhibition

Quantum Computing and Artificial Intelligence Applications Workshop

Upcoming Conferences & Events

6th Middle East Process Engineering Conference and Exhibition

Quantum Computing and Artificial Intelligence Applications Workshop

2024 Offshore Technology Conference

Fourth AIChE Middle East Regional Chem-E-Car Competition

Statistical Modeling of Multivariate Process Parameters

RAPID Roadmap Workshop: Technology Valuation

The Future of AI

2024 Center for Hydrogen Safety Americas Conference

World Digital Congress of Chemical and Biochemical Engineering

CEP: April 2024

CEP: March 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.