(646c) An Information Entropy Based Criterion for Variable Selection Performance Assessment

Conference

AIChE Annual Meeting

Year

2017

Proceeding

2017 Annual Meeting

Group

Computing and Systems Technology Division

Session

Big Data in Process Modeling, Estimation and Control

Time

Thursday, November 2, 2017 - 8:34am to 8:51am

Authors

He, Q. P. - Presenter, Auburn University

Suthar, K., Auburn University

Lee, J., Auburn University

With ever-accelerating advancement of information, communication, sensing and characterization technologies, tremendous amount of data are generated and stored every day. Those so called â€œBig Dataâ€ are often extremely high-dimensional, contaminated by noise, and interspersed with a large number of irrelevant or redundant features, making it a challenging task to retrieve useful information from the data[1], [2]. Variable selection has been one of the practical approaches to reducing data dimensionality prior to data interpretation or modeling. Even for projection-based dimension reduction methods such as principal component analysis (PCA) and partial least squares (PLS), variable selection is often applied as a pre-processing step to further improve the modeling performance[3], [4]. In the last few years, many different variable selection methods have been reported. However, how to evaluate, in particular directly evaluate, the performance of variable selection methods has received limited attention. The commonly applied criteria to assess variable selection performance either indirectly measures the effects of variable selection, such as through prediction performance of a model, or require ground trough of variable relevancy, which is not available in practical applications.

To address this limitation, this paper presents an information entropy based consistency index (Ic) to directly evaluate the performance of variable selection method. The proposed method is based on the hypothesis that the same set of relevant variables would be selected when different training data sets are utilized to build a model. Therefore the proposed Ic index examines the consistency among variables being selected using different training data. The proposed Ic does not require any ground truth of variable relevancy, but can still make use of such information should it is available. Both simulated (with ground truth) and industrial (without ground truth) case studies are provide to demonstrate how Ic performs, which is compared with commonly used criteria. It is shown that the proposed index overcomes some of the limitations of existing indices, and the simulated case studies in this work show that Ic gave more objective assessments than the existing indices. The industrial case study shows that Ic is highly correlated with the performance of the resulted soft sensor, validating the need and benefits of directly assessing variable selection consistency.

References:

[1] J.-A. Ting, A. Dâ€™Souza, S. Vijayakumar, and S. Schaal, â€œEfficient learning and feature selection in high-dimensional regression,â€ Neural Comput., vol. 22, pp. 831â€“886, 2010.

[2] L. Comminges and A. S. Dalalyan, â€œTight conditions for consistent variable selection in high dimensional nonparametric regression.,â€ in COLT, 2011, pp. 187â€“206.

[3] Z. X. Wang, Q. He, and J. Wang, â€œComparison of different variable selection methods for partial least squares soft sensor development,â€ in 2014 American Control Conference, 2014, pp. 3116â€“3121.

[4] Z. X. Wang, Q. P. He, and J. Wang, â€œComparison of variable selection methods for PLS-based soft sensor modeling,â€ J. Process. Control., vol. 26, pp. 56â€“72, 2015.

Topics

Computing and Systems Engineering

Process Automation & Control

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

Upcoming Conferences & Events

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

2024 DIERS Virtual Spring Meeting

2024 Pacific Northwest Student Regional Conference

2024 Western Student Regional Conference

CCPS Middle East Regional Meeting

Hydrogen Fueling Station Safety

Streamlining Permit-to-Work Processes With a Digital Solution

6th Middle East Process Engineering Conference and Exhibition

CEP: April 2024

CEP: March 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(646c) An Information Entropy Based Criterion for Variable Selection Performance Assessment

AIChE Annual Meeting

2017

2017 Annual Meeting

Computing and Systems Technology Division

Big Data in Process Modeling, Estimation and Control

Thursday, November 2, 2017 - 8:34am to 8:51am

Authors

Topics

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams