(446f) Fault Detection Based On Kernel Mean Discrepancy Test | AIChE

(446f) Fault Detection Based On Kernel Mean Discrepancy Test



Fault Detection Based on Kernel Mean Discrepancy Test

Jiusun Zeng1, Lei Xie2, Jin’hui Cai1

(1College of Metrology & Measurement Engineering, China Jiliang University, Hangzhou 310018, China

2Institute of Cyber-Systems and Control, National Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou 310027, China)

Thanks to the advances in sensor technology and computer technology, large amounts of data is collected and accumulated during the operation of industrial systems, which leads to a rapid development in data-driven fault detection and diagnosis (FDD) methods. Due to the capability to handle data co-linearity, multivariate statistical methods have received significant attention, e.g., techniques based on principal component analysis (PCA), partial least squares (PLS) and independent component analysis (ICA). The general procedure of fault detection using multivariate statistical methods consists of a feature extraction step and a subsequent step of constructing monitoring statistics for process monitoring. If the data is Gaussian distributed, T2 and SPEstatistics are very appropriate. For non-Gaussian distributed variables, different methods have been developed to design monitoring statistics, such as kernel density estimation, Gaussian mixture models, ICA based methods etc. It has been reported that these methods can be problematic if the data distribution is sparse or clustered [1], facing singularity problems [2] or causing disturbance of the monitoring performance [3].

Provided the probability density function (PDF) of the normal data can be obtained, the task of process monitoring reduces to inspecting whether the test data are generated from the PDF. However, direct estimation of PDF from data is computationally intensive, available methods like kernel density estimation cannot handle higher dimensional data sets. As an alternative, different kinds of two-sample test approaches have been developed in the field of machine learning and statistics to test whether two sets of samples are sampled from the same distribution, where kernel mean discrepancy [4] is one of the state-in-art approach. The basic idea of kernel mean discrepancy is to project the normal and test data into the reproducing kernel Hilbert space (RKHS), in which the mean discrepancy test is performed. If the test data shares the same distribution as the normal data, the mean discrepancy would be very close to zero. Hence the mean discrepancy in the kernel space can be used as a test statistic to decide whether the normal and test data set are sampled from the same distribution.   

In this article, a fault detection method based on kernel mean discrepancy test is proposed, which measures the difference between the distribution of normal and test data set. A moving window approach is firstly used to divide the test data into consecutive subsets. And the kernel mean discrepancy between the subset and the normal data set is computed. If the discrepancy is large, it is more likely that the subset contains faulty data. Hence, the key step would be to determine the critical value/ confidence limit of the mean discrepancy in the RKHS. To determine the confidence limit, a permutation test is performed based on the normal data set and the 95% or 99% percentile of the kernel mean discrepancy is used as the confidence limit. With the confidence limit obtained, a fault detection strategy is then proposed. By rewriting the test statistic into an integral form, a quantitative analysis on the sensitivity of the proposed monitoring strategy is presented for Gaussian distributed data. The proposed monitoring strategy is nonparametric and distribution free; it is applicable to both Gaussian and non-Gaussian distributed data. Simulation and application study to the Alstom gasifier process show that the proposed method is more sensitive to process faults than the monitoring strategy based on support vector data description (SVDD).

References:

[1] Q. Chen, U. Kruger, A.Y.T. Leung, Regularised kernel density estimation for clustered process data, Control Engineering Practice, 12 (2004), pp. 267–274;

[2] J. Yu, S.J. Qin, Multimode process monitoring with Bayesian inference-based finite Gaussian mixture models, AIChE Journal, 54 (2008), pp. 1811–1829

[3] J.M. Lee, C.K. Yoo, I.B. Lee, Statistical process monitoring with independent component analysis, Journal of Process Control, 14 (2004), pp. 467–485

[4] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Scholkopf, A. Smola. A kernel method for the two-sample problem, Journal of Machine Learning Research, 2008(1): 1-10.