(546a) Constrained Variable Selection for Distributed Multivariate Statistical Process Monitoring

Khatib, S., University of Minnesota
Daoutidis, P., University of Minnesota, Twin Cities
Statistical monitoring methods are popular in the chemical industry since they tend to be easier to implement and they do not require a first principles mathematical model. These methods employ statistical hypothesis tests for fault detection in which a fault is detected when a test statistic exceeds a threshold value. A statistical monitoring method may use only univariate statistical hypothesis tests where the test statistic is a function of only one sensor’s measurement. The main problem with such a configuration is that univariate statistical hypothesis tests ignore the correlation between different sensors and hence the detection thresholds calculated for the univariate statistical hypothesis tests may be highly conservative. To address this issue, a statistical monitoring method may employ a multivariate statistical hypothesis test in which the test statistic is a function of the measurements of all the sensors at a sampling instant. The main problem with such a configuration occurs when only a small fraction of the sensors of a large-scale system are affected by a fault and most of the remaining sensors have an insignificant contribution to the test statistic thus making the multivariate statistical hypothesis test less sensitive to such a fault.

The distributed configuration can address the limitations of the univariate and multivariate configurations. In a distributed multivariate statistical process monitoring method, the sensors of the system are first assigned to a set of potentially overlapping subsystems. A statistical monitoring method is applied to each subsystem and the monitoring results of the subsystems are combined using a consensus strategy. The monitoring performance of a distributed method depends strongly on the sensors selected for each subsystem. In [1] we developed a new simulation optimization method called Performance Driven Agglomerative Clustering (PDAC) that finds a decomposition of the system (i.e. selects variables for the subsystems) by minimizing the type II error generated when the distributed monitoring method is simulated using normal and faulty data subject to the type I error being close to a user defined value. PDAC uses the greedy search of clustering algorithms from graph theory including agglomerative clustering and a fine tuning procedure. The software for PDAC is available at [2].

One of the limitations of PDAC is that it can only generate a non-overlapping decomposition of the system, where a sensor is limited to being part of only one subsystem. It is possible that a sensor could enhance the detection performance of multiple subsystems. In this work we extend the PDAC method by proposing an overlapping decomposition generating algorithm that uses a greedy search strategy to further fine tune the non-overlapping decomposition generated by PDAC and generate a set of candidate overlapping decompositions. The type II error of these decompositions is then compared to select the overlapping decomposition with the best fault detection performance. The proposed extended PDAC (EPDAC) method is found to outperform the PDAC method when both methods are used to find a decomposition of the benchmark Tennessee Eastman Process for its monitoring using a distributed implementation of Principal Component Analysis. The extended EPDAC method also incorporates user defined cannot-link constraints which define which pairs of sensors cannot be paired into the same subsystem. The pairing of a certain pair of sensors into the same subsystem may be undesired if their measurements have different sampling rates and are difficult to synchronize, if the layout of the plant makes it difficult to transmit the measurements of the sensors to a single location or if different statistical monitoring methods are to be applied to different sets of sensors to model the differing distributions of their measurements. The constraints incorporated into EPDAC thus make it more flexible than PDAC, allowing the user to incorporate additional requirements into the decomposition selection process besides the detection performance. The software for EPDAC is also available at [2].

[1] Khatib, S., Daoutidis, P., Almansoori, A.. System decomposition for distributed multivariate statistical process monitoring by performance driven agglomerative clustering. Ind Eng Chem Res 2018;57(24):8283-8298.

[2] Khatib, S., Daoutidis, P.. Performance driven agglomerative clustering software. 2018, Computer Software. URL http://z.umn.edu/PDAC.