(255e) A Scalable Statistical Machine Learning Method: Application for Fault Detection and Fault Propagation Pattern Inference in the Tennessee Eastman Process

Soroush, M., Drexel University
Mohseni Ahooyi, T., Drexel University
Arbogast, J. E., Process Control & Logistics, Air Liquide
Central elements of statistical machine learning algorithms (SMLAs) are joint probability distributions (JPDs). The performance and convergence of SMLAs greatly depend on how JPDs are estimated from historical data. Because many existing JPD estimation methods and their resulting SMLAs suffer from the curse of dimensionality phenomenon, their scalability and applicability for big data analytics are seriously questioned. A general method of JPD estimation is needed to develop the next generation of SMLAs that can cope with the complex, high-volume, high-dimensional nature of big data. Recently we introduced an efficient method of estimating JPDs of continuous random variables with arbitrary relationships [1, 2]. As the backbone of the method is a monotonizing transformation that ‘rolls out’ (monotonizes) the relationships, the method was named the rolling pin (RP) method. This method allows for estimating JPDs without any knowledge of the actual causal structure of the attributes. The method offers many advantages over its well-known counterparts such as the original parametric copula method, moment-based density estimation, and nonparametric techniques of joint probability estimation. An RP-estimated JPD can be incorporated effectively in a variety of big-data statistical learning tasks such as classification, regression, clustering, pattern recognition, dimensionality reduction, and probabilistic modeling and inference. In probabilistic inference, it is computationally more efficient than Bayesian networks.

In this paper, we present a study on the scalability of the JPD estimation method. In particular, we apply the method to the large-scale Tennessee Eastman (TE) process [3, 4]. This process has a total of 91 variables (12 manipulated, 38 state, and 41 measured variables). It has five unit operations (a two-phase reactor, a condenser, a flash separator, a recycle compressor, and a product stripper). We show that the RP method is easily scalable, is computationally efficient and flexible, and allows for reliably estimating JPSs of large-scale highly nonlinear processes such as the TE process. Also, it is demonstrated that the RP method provides a computationally efficient and flexible framework for performing probabilistic inference in highly nonlinear systems with non-monotonic variable interdependencies. The advantages of this inference framework over Bayesian networks are presented.


[1] Mohseni Ahooyi, T., Arbogast, J.E., and Soroush, M. (2015b). An Efficient Copula-Based Method of Identifying Regression Models of Non-Monotonic Relationships. Chem. Eng. Sci., 136(2), 106.

[2] Mohseni Ahooyi, T., Arbogast, J.E., and Soroush, M. (2015a). Applications of the Rolling Pin Method: 1. an Efficient Alternative to Bayesian Network Modeling and Inference. Ind. & Eng. Chem. Research, 54(16), 4316.

[3] Downs, J. J., Vogel, E. F. (1993). A Plant-Wide Industrial Process Control Problem. Comput. & Chem. Eng., 17(3), 245.

[4] Yu, H., Khan, F., Garaniya, V. (2015). A Probabilistic Multivariate Method for Fault Diagnosis of Industrial Processes. Chem. Eng. Research & Design, 104, 306.