# (546a) Constrained Variable Selection for Distributed Multivariate Statistical Process Monitoring

- Conference: AIChE Annual Meeting
- Year: 2019
- Proceeding: 2019 AIChE Annual Meeting
- Group: Computing and Systems Technology Division
- Session:
- Time:
Wednesday, November 13, 2019 - 12:30pm-12:49pm

The distributed configuration can address the limitations of the univariate and multivariate configurations. In a distributed multivariate statistical process monitoring method, the sensors of the system are first assigned to a set of potentially overlapping subsystems. A statistical monitoring method is applied to each subsystem and the monitoring results of the subsystems are combined using a consensus strategy. The monitoring performance of a distributed method depends strongly on the sensors selected for each subsystem. In [1] we developed a new simulation optimization method called Performance Driven Agglomerative Clustering (PDAC) that finds a decomposition of the system (i.e. selects variables for the subsystems) by minimizing the type II error generated when the distributed monitoring method is simulated using normal and faulty data subject to the type I error being close to a user defined value. PDAC uses the greedy search of clustering algorithms from graph theory including agglomerative clustering and a fine tuning procedure. The software for PDAC is available at [2].

One of the limitations of PDAC is that it can only generate a non-overlapping decomposition of the system, where a sensor is limited to being part of only one subsystem. It is possible that a sensor could enhance the detection performance of multiple subsystems. In this work we extend the PDAC method by proposing an overlapping decomposition generating algorithm that uses a greedy search strategy to further fine tune the non-overlapping decomposition generated by PDAC and generate a set of candidate overlapping decompositions. The type II error of these decompositions is then compared to select the overlapping decomposition with the best fault detection performance. The proposed extended PDAC (EPDAC) method is found to outperform the PDAC method when both methods are used to find a decomposition of the benchmark Tennessee Eastman Process for its monitoring using a distributed implementation of Principal Component Analysis. The extended EPDAC method also incorporates user defined cannot-link constraints which define which pairs of sensors cannot be paired into the same subsystem. The pairing of a certain pair of sensors into the same subsystem may be undesired if their measurements have different sampling rates and are difficult to synchronize, if the layout of the plant makes it difficult to transmit the measurements of the sensors to a single location or if different statistical monitoring methods are to be applied to different sets of sensors to model the differing distributions of their measurements. The constraints incorporated into EPDAC thus make it more flexible than PDAC, allowing the user to incorporate additional requirements into the decomposition selection process besides the detection performance. The software for EPDAC is also available at [2].

[1] Khatib, S., Daoutidis, P., Almansoori, A.. System decomposition for distributed multivariate statistical process monitoring by performance driven agglomerative clustering. Ind Eng Chem Res 2018;57(24):8283-8298.

[2] Khatib, S., Daoutidis, P.. Performance driven agglomerative clustering software. 2018, Computer Software. URL http://z.umn.edu/PDAC.