(663g) A Data-Driven Multidimensional Visualization Technique for Process Fault Detection and Diagnosis | AIChE

(663g) A Data-Driven Multidimensional Visualization Technique for Process Fault Detection and Diagnosis


Gajjar, S. - Presenter, University of California, Davis
Palazoglu, A. - Presenter, University of California, Davis

A data-driven
multidimensional visualization technique for process fault detection and

Gajjar, Ahmet Palazoglu

of California, Davis

Background: Chemical
process operations are typically subject to process or operational
disturbances. Fault detection and diagnosis are critical to ensure safety,
process stability and to maintain optimal operations. For process monitoring, techniques
based on first principle models have been studied for more than two
decades but their contribution to industrial practice has not been pervasive
due to substantial cost and time required to develop a sufficiently accurate model
for a complex chemical plant. On the other hand, in a large-scale unit, a
Distributed Control System (DCS) collects data from sensor arrays distributed
throughout the plant and stores the data at high sampling rates. This data
contains information about the underlying process characteristics and can be
used for process monitoring. For process monitoring, a plant operator monitors several
variables on screens and using his/her experience and domain knowledge, focuses
on critical process variables to anticipate and prevent abnormal process operations.
In the absence of such experience or domain knowledge, however, more automated
techniques are required to inform and advise plant operators. A control chart
is one of the primary techniques of statistical process monitoring of real-time
data. However, monitoring hundreds of variables simultaneously using univariate
control charts is not practical. Moreover, 2-D charts limit our ability to
visualize and interpret high-dimensional data.

Prior Work: To
overcome this challenge, Inselbergestablished the concept of parallel
coordinates in 1985 [[1]]. In
plane, parallel coordinates induce duality while in 2-D, they make cluster
identification and pattern recognition easier. Albazzaz et al. [[2]]
proposed the use of parallel coordinates for multidimensional data
visualization of process variables and independent components obtained from
Independent component analysis (ICA) of the process dataset. They used Box-Cox
transformation and the percentile approach to define the upper and lower
control limits of the independent components. The Box-Cox transformation technique
can be applied only for positive data values which requires a constant to be
added if the set of data contains negative values. The percentile  approach
simply sorts  the  vector  data  of an  independent component  from  the lowest
to largest  values then  takes  99.9% and  0.1% percentile of  the  data  to 
be  the  upper  and  lower limit  respectively. Dunia et al. proposed the use
of parallel coordinates along with principal component analysis (PCA) to derive
empirical control limits for fault detection [[3]]
and, in addition to using parallel coordinates, they have also used the
Hotelling's T2 and Q statistics for the detection of an
out-of-control situation. However, this proposed method appears to be fault

Once the abnormality in
the process is detected it is imperative to determine the root cause to repair
the fault. The most widely used method for fault isolation is the contribution
plot (Miller et al., 1998), which depicts the contribution of each process
variable to the monitored statistics. Its effectiveness is limited to simple
faults, e.g.  sensor and actuator faults (Yoon and MacGregor, 2001; Qin, 2003). 
Dunia and Qin (1998) proposed a fault identification index based on the
fault reconstruction square prediction error (FRSPE). The smallest FRSPE is
obtained for the reconstructed fault. Raich and Cinar (1997) proposed distance
and angle metrics to diagnose process disturbances. Some of the researchers
have also worked on data-driven techniques such as qualitative trend analysis
(QTA). QTA is a powerful method that provides quick and accurate fault
diagnosis. Maurya et al. proposed QTA on the principal components instead of on
the original sensors, and showed that the computation time was substantially
reduced [[4]].

Proposed method and preliminary
Motivated by a few limitations of the prior work, this
work expands it by proposing a detection algorithm that uses all the
measurements available at the plant thus bypassing the need to have prior
knowledge of fault or select a priori a set of particular measurements for
detection. Traditionally, once the data is projected in principal component (PC)
subspace the variance and residual errors are lumped into one statistic viz.
Hotelling's T2 and Q statistic respectively. We have developed
control limits for each PC based on the normal operating dataset. Such control
limits are not fault specific and can be used for fault detection in real-time.
For visual process monitoring, our method represents each PC in parallel
coordinates along with their control limits. We were successful in reducing
fault detection time and improving fault detection rates for data obtained from
the Tennessee Eastman benchmark process. Moreover, we have observed that each
fault has a unique signature in the parallel coordinate space and such patterns
can be used for fault diagnosis. We have investigated machine learning methods for
classification of faulty data which can then be used for fault diagnosis in real-time.
Modern industrial processes often present a large number of highly correlated
process variables, and moreover, the process is manipulated by an intricate
network of controller's that provides feedback to the input variables. Thus the
impact of a disturbance (or a fault) propagates through to both the input and
manipulated variables. In real-time monitoring not only fault detection but
also observing how the fault has and will propagate through the process is
important for taking the corrective actions. PC scores and loadings are
complementary and superimposable. Each variable in the original dataset loads
on to different PCs which also reflects in the scores of that variable on the
PCs. From the score and loading plots, our preliminary results have shown that
without any domain knowledge and only using the information obtained from the
data we can show how a fault propagates through the system.

In summary, this work focuses
on the use of parallel coordinates for multidimensional visualization using PCA
and discusses its accuracy for fault detection, fault diagnosis and fault
propagation. The feasibility and validity of the proposed multidimensional PCA
visualization is demonstrated through the benchmark Tennessee Eastman process (Downs
and Vogel,   1993).

[[1]] Inselberg,
Alfred. 1985. "The plane with parallel coordinates." The Visual
Computer 1: 69-91.

[[2]] Hamza
Albazzaz, Xue Z. Wang. 2006. "Historical data analysis based on plots of
independent and parallel coordinates and statistical control limits."
Journal of process control 16: 103-114

[[3]] Ricardo
Dunia, Thomas F. Edgar, Mark Nixon. 2013. "Process Monitoring Using
Principal Components in Parallel Coordinates." AIChE Journal 59: 445-456.

[[4]] M. R
Maurya, R. Rengaswamy, V. Venkatasubramanian. 2005. "Fault diagnosis by
qualitative trend analysis of the principal components." Chemical
Engineering Research and Design 83: 1122-1132.


This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.


Do you already own this?



AIChE Pro Members $150.00
AIChE Graduate Student Members Free
AIChE Undergraduate Student Members Free
AIChE Explorer Members $225.00
Non-Members $225.00