(287f) Dynamic Data Feature Engineering for Process Operation Troubleshooting | AIChE

(287f) Dynamic Data Feature Engineering for Process Operation Troubleshooting

Authors 

Qin, S. J. - Presenter, City University of Hong Kong
Dong, Y., University of Southern California
Liu, Y., USC
Data analytics and machine learning have shown tremendous power in mining and analyzing a large amount of data to reveal useful information and knowledge behind the data. Many industries, including chemicals, petrochemicals, energy, power grids, and pharmaceutical, have turned their attention to the massive data they own for improvement in efficiency and competitiveness. Owing to rich instrumentation and control in engineering and manufacturing systems and the deployment of industrial internet of things (IIoT), data collected from industrial operations are of high dimensions and fast sampling. Therefore, there is a great need to develop analytics tools geared towards analyzing massive process data.

Most process data are collected in the form of time series, which are highly cross-correlated and auto-correlated. In other words, the high-dimensional data are not dynamically excited in all dimensions of the measurement space. This situation is more pronounced with IIoT where sensors are installed with a degree of redundancy which leads to collinearity. Due to collinearity, the dynamic variations in plant data are often concentrated in a low dimensional subspace. On the other hand, the complementary subspace has only random variations which are independent over time. Because of these characteristics, traditional modeling tools such as vector autoregressive moving average (VARMA) analysis are not suitable since they assume full dimensional dynamics (Tsay, 2013).

In this paper we apply the dynamic inner canonical correlation analysis (DiCCA) developed in Dong and Qin (2018) and Dong et al. (2020) to extract low dimensional latent variables. Each of the latent variable models is a self-dependent univariate autoregressive (AR) model. The latent variables are orthogonal or contemporaneously independent of each other, which is convenient for visualizing latent features and troubleshooting abnormal variations in high dimensional data. In addition, each of the latent variables is rank-ordered by the predictability from its own history. This objective promotes self-dependent AR relations to be extracted. For example, integrating components and oscillatory components are favored by this objective.

In this paper, we propose a dynamically engineered latent feature analysis (DELFA) procedure for plant-wide troubleshooting by applying the DiCCA algorithm to decompose high dimensional process data into dynamic latent features. DELFA does not make use of the prediction model of DiCCA. It finds dynamic features of a segment of time series data that contain interesting features, which could be associated with anomalies. We also extend the DiCCA algorithm to deal with exogenous variables, which is referred to the DiCCAX algorithm.

DELFA further identifies measured variables that are best interpreted by the latent features.

The degree of interpretation is represented by the latent variable loadings. Composite loadings and weights are derived to analyze features that appear in multiple latent variables.

The features of interest can be intermittent in time; when they happen their loadings on measured variables are the focus of analysis. We demonstrate the effectiveness of the DELFA troubleshooting procedure on two high dimensional datasets from an industrial plant. One dataset is analyzed with the troubleshooting procedure to find several anomalous features. The other data set, collected after fixing a major anomaly, is analyzed to confirm the fix of the major anomaly and also find other anomalies.


References

Ruey S. Tsay. Multivariate Time Series Analysis: with R and financial applications. John Wiley & Sons, 2013.

Yining Dong and S Joe Qin. Dynamic latent variable analytics for process operations and control. Computers & Chemical Engineering, 114:69-80, 2018.

Yining Dong, Yingxiang Liu, and S Joe Qin. Efficient dynamic latent variable analysis for high dimensional time series data. IEEE Trans. on Industrial Informatics, vol. 16, no. 6, pp. 4068-4076, June 2020.