(268h) Expectation-Maximization and Bayesian Inference Based Probabilistic PLS Methods for Soft Sensor Estimation and Prediction of Industrial Processes With Stochastic Missing Measurements

Chiang, L. H., The Dow Chemical Company
Yu, J., McMaster University
Chen, K., McMaster University
Castillo, I., The Dow Chemical Company

Missing data is a common issue due to sensor failure, multi-rate sampling frequency and device unreliability in industrial processes. Despite considerable literature research in the fields of multivariate statistics process monitoring and soft sensor estimation, appropriate way of handling random missing measurements in industrial process data remains a significant challenge. Probabilistic partial least squares (PPLS) method is a technique that integrates conventional partial least squares (PLS) with probabilistic inference to deal with stochastic system uncertainty.

In this study, two kinds of PPLS frameworks for handling stochastic missing measurements are developed, which are expectation-maximization (EM) based PPLS (EM-PPLS) and Bayesian inference based PPLS (BI-PPLS). In the EM-PPLS method, missing measurements are initially imputed by variable means and the parameter values of EM-PPLS model are filled with initial guess. Then, EM algorithm is employed to update latent scores and PPLS model parameter values through data-driven iterative learning. Meanwhile, the missing measurements are re-estimated by using the updated latent scores and model parameters after each EM step. Consequently, PPLS modeling and missing data estimation can be iterated simultaneously along with the uncertainty handling in process measurements through a probabilistic strategy. However, the selection of latent variables in EM-PPLS still involves computationally expensive cross-validation. To solve this issue, prior distributions are introduced on the parameters of BI-PPLS model so that the parameters along with the optimal size of latent space in BI-PPLS can be identified through the recursive variational Bayesian inference. After each updating step for model parameters in BI-PPLS, missing measurements are also re-estimated by utilizing the new expectations of latent score and model parameters based on the updated posterior distributions. Therefore, the proposed BI-PPLS method can lead to a concurrent solution for missing data imputation and latent variable selection within a Bayesian framework.

The presented EM-PPLS and BI-PPLS methods are first applied to a simulated example with stochastic missing measurements. The soft sensor modeling results indicate that both approaches can robustly estimate missing measurement and predict quality variables with satisfactory accuracy. In comparison, BI-PPLS outperforms EM-PPLS in terms of significantly higher computational efficiency. Further, the effectiveness of the presented PPLS approaches is demonstrated with application to soft senor modeling and prediction of an industrial chemical process with three distillation columns.