(646c) An Information Entropy Based Criterion for Variable Selection Performance Assessment
To address this limitation, this paper presents an information entropy based consistency index (Ic) to directly evaluate the performance of variable selection method. The proposed method is based on the hypothesis that the same set of relevant variables would be selected when different training data sets are utilized to build a model. Therefore the proposed Ic index examines the consistency among variables being selected using different training data. The proposed Ic does not require any ground truth of variable relevancy, but can still make use of such information should it is available. Both simulated (with ground truth) and industrial (without ground truth) case studies are provide to demonstrate how Ic performs, which is compared with commonly used criteria. It is shown that the proposed index overcomes some of the limitations of existing indices, and the simulated case studies in this work show that Ic gave more objective assessments than the existing indices. The industrial case study shows that Ic is highly correlated with the performance of the resulted soft sensor, validating the need and benefits of directly assessing variable selection consistency.
 J.-A. Ting, A. DâSouza, S. Vijayakumar, and S. Schaal, âEfficient learning and feature selection in high-dimensional regression,â Neural Comput., vol. 22, pp. 831â886, 2010.
 L. Comminges and A. S. Dalalyan, âTight conditions for consistent variable selection in high dimensional nonparametric regression.,â in COLT, 2011, pp. 187â206.
 Z. X. Wang, Q. He, and J. Wang, âComparison of different variable selection methods for partial least squares soft sensor development,â in 2014 American Control Conference, 2014, pp. 3116â3121.
 Z. X. Wang, Q. P. He, and J. Wang, âComparison of variable selection methods for PLS-based soft sensor modeling,â J. Process. Control., vol. 26, pp. 56â72, 2015.