(370b) Probabilistic PCA for Multivariate Process Monitoring and Comparison with PCA | AIChE

(370b) Probabilistic PCA for Multivariate Process Monitoring and Comparison with PCA


Braatz, R. D., Massachusetts Institute of Technology
In the process and manufacturing industries, principal component analysis (PCA) is the most widely applied method for fault detection, which successfully handles high-dimensional, highly correlated, and noisy data (e.g., see reviews [1,2] and citations therein). PCA is a linear dimensionality reduction technique that decomposes the data space into the principal subspace and the residual subspace. Two monitoring statistics, T2 and Q, are applied to monitor the two subspaces separately [3]. However, the optimality of PCA for process monitoring and the selection of the number of principal components have not been theoretically justified. In practice, when the fault information is not available, people use heuristic methods to select the number of principal components, including the percent variance test, scree test, parallel analysis [4], and cross validation [5], with no single technique being dominant.

In this work, we show that the widely applied PCA-based T2 and Q monitoring scheme is not optimal. A novel formulation of probabilistic principal component analysis (PPCA) based fault detection is proposed that has one monitoring index M2. The proposed probabilistic approach is fundamentally different from the PCA method. It is shown that the proposed PPCA-based M2 statistic is optimal for fault detection of a linear, Gaussian, stationary process in a maximal likelihood sense. Moreover, the proposed PPCA framework enables the selection of the optimal number of principal components for normal data, which has been an open research problem for PCA-based fault detection. Detailed theoretical comparisons between PPCA and PCA are provided, which show the merits of PPCA over PCA in terms of model accuracy, robustness, and generalization capability to handle more complex situations. We also derive a relationship between M2 and a weighted combination of T2 and Q statistics. The proposed PPCA-based fault detection method is applied to the Tennessee Eastman process and compared with PCA to illustrate its effectiveness.

[1] Chiang, L.H., Russell, E.L., and Braatz, R.D. (2000). Fault Detection and Diagnosis in Industrial Systems. Springer Science & Business Media.

[2] Venkatasubramanian, V., Rengaswamy, R., Kavuri, S. N., & Yin, K. (2003). A review of process fault detection and diagnosis: Part III: Process history based methods. Computers & Chemical Engineering, 27(3), 327-346.

[3] Kresta, J. V., Macgregor, J. F., and Marlin, T. E. (1991). Multivariate statistical monitoring of process operating performance. The Canadian Journal of Chemical Engineering, 69(1), 35-47.

[4] Franklin, S. B., Gibson, D. J., Robertson, P. A., Pohlmann, J. T., & Fralish, J. S. (1995). Parallel analysis: A method for determining significant principal components. Journal of Vegetation Science, 6(1), 99-106.

[5] Bro, R., Kjeldahl, K., Smilde, A. K., & Kiers, H. A. L. (2008). Cross-validation of component models: A critical look at current methods. Analytical and Bioanalytical Chemistry, 390(5), 1241-1251.