(584a) System Decomposition for Distributed Multivariate Statistical Process Monitoring

Authors: 
Khatib, S., University of Minnesota
Daoutidis, P., University of Minnesota, Twin Cities
Almansoori, A., The Petroleum Institute
Monitoring of complex large-scale chemical plants to ensure safe and cost effective operation is a challenging task. Multivariate Statistical Process Monitoring (MSPM) methods like Principal Component Analysis are popular in the chemical industry because they are easy to implement and do not require a priori knowledge of the process mathematical model. In MSPM methods, the magnitude of deviation of the measured variables from their mean values is combined to calculate a test statistic(s) which is compared with a threshold value to determine if a fault is affecting the process. MSPM methods underperform in detecting those faults that affect only a small subset of a large process system’s measured variables because most measured variables do not contribute significantly to the test statistic when such faults occur. Implementing MSPM methods in a distributed configuration, wherein the MSPM method is applied to each subsystem and the monitoring results of the subsystems are combined using a consensus strategy, could in principle address this problem. Implementing MSPM methods in a distributed configuration also makes the monitoring system more fault tolerant. It may also be necessary to implement a MSPM method in a distributed configuration when the layout of a chemical plant is such that the communication of all the plant’s measurements to a central location is infeasible.

The system decomposition (i.e. the partitioning of a system’s measured variables into subsystems) has a significant impact on the performance of a distributed MSPM (DMSPM) method. The DMSPM method should ideally be implemented using a decomposition for which its performance, in monitoring a set of faults, is optimal so that the faults can be detected with greater speed and accuracy. The optimal decomposition for a DMSPM method depends on the set of faults that tend to affect the system. In an optimal decomposition of the system, the different measured variables of a subsystem tend to be affected by similar faults thus ensuring that most measured variables in the subsystem contribute significantly to the subsystem test statistic when a fault occurs thereby making the statistical hypothesis tests of the DMSPM method more sensitive to the faults. The optimal decomposition for a DMSPM method also depends on various other factors such as:

  1. The number of subsystems that the system is partitioned into.
  2. The MSPM method that is implemented in a distributed configuration.
  3. Consensus between the subsystems.

Therefore, finding the optimal decomposition for a DMSPM method is a difficult task. An effective strategy to find the optimal decomposition would be to use simulation optimization wherein the performance of the DMSPM method is simulated for a set of candidate decompositions and the decomposition with the best performance is considered optimal.

In this work, we propose and present a novel simulation optimization method, called the Performance Driven Agglomerative Clustering (PDAC) method, which finds a near optimal system decomposition for a DMSPM method. The PDAC method uses the greedy search of Ward’s agglomerative clustering algorithm to generate a set of candidate decompositions (the decision variables). Normal operation and faulty data, input by the user, is then used to simulate the performance of the DMSPM method for a candidate decomposition and calculate its missed detection rate (MDR) which is the objective function. The MDRs of the candidate decompositions having the same number of subsystems are compared. After applying the agglomerative clustering procedure, a near optimal decomposition is generated for every possible value of the number of subsystems. The number of subsystems can range from one to the number of measured variables in the system. A fine tuning procedure is used to slightly modify some of the decompositions output by the agglomerative clustering procedure and further reduce their MDRs. The monitoring performance of the decompositions output by the agglomerative clustering and fine tuning procedures is then compared to find the optimal number of subsystems and hence the optimal system decomposition for the DMSPM method. The PDAC method is a completely data-driven system decomposition method and can be automated. The PDAC method can also, in principle, be applied to most DMSPM methods since it only requires simulation of the DMSPM method using process data. To illustrate its effectiveness, PDAC is used to find the decomposition of the benchmark Tennessee Eastman Process case study for which the monitoring performance, using a distributed Principal Component Analysis based monitoring scheme, is optimal.