Outlier Detection and Analysis in Batch and Continuous Processes | AIChE

Outlier Detection and Analysis in Batch and Continuous Processes

Authors 

Ryan, P. - Presenter, Response Process Consulting LLC

Title:  Outlier Detection and Analysis in Batch and
Continuous Processes

Author/Presenter:  Peter J. Ryan, Ph.D., P.E.

Author/Presenter email:  peter.ryan@responsepc.com

Company:  Response Process Consulting LLC

Motivation

All manufacturing sectors, continuous
or batch, gather process data for archiving and analysis purposes.  Often the
amount of data gathered, and the quality of data, makes it difficult to use
this resource effectively.  Major drawbacks in the quality of the available
data include:

·        
large gaps in the process data

·        
noise, poor signal-to-noise ratios

·        
correlated data

·        
accuracy

·        
precision

New methods have been
developed to handle these issues in archived process data, and to develop
process models based on this historical data.  Specifically, a new approach to
handling missing data and reconstructing quality data based on the observed
(archived) data is presented.  Once the models are developed, outliers can be
identified in the continuous or batch data, and relationships between the Key
Process Indicators (KPI's) and the upstream process variables can be
established.

 

Approach

Often, steady state or
dynamic models are available to describe a process.  However, these
first-principle models often do not have the granularity needed to describe product
specifications such as color, turbidity or solvent loss due to entrainment in a
separator.  Machine learning can be used to examine large historical process data
sets and determine the leading controlled variables of a process.  The machines
learning methods first fit the data to a specified model, and then use the
model to explore the data space.  Unsupervised learning gives the fitting
method full ability to determine what is different in the data.  Supervised
learning requires the fitting method to consider both the archived process data
and measured quality data (acquired off-line of the process).  An example of
unsupervised modeling is Principal Component Analysis (PCA).  An example of
supervised modeling is Partial Least Squares (PLS).  Both of these methods can
be used to model the historical process data and find outliers by plotting the
two principal component elements (scores) that capture the most variability in
the data.  The score plots examine the data for clusters that classify the data
as meeting product specifications and not meeting product specifications.  The
clusters that represent production runs not meeting product specifications can
be further examined to discover the upstream process variables that are the
cause of the product not meeting specification.  While visual inspection has
been described, statistical metrics such as the Squared Prediction Error (SPE)
and Hotellings T2 metrics can be calculated to find the same results
in the data.

 

Results

Examples of modeling both
continuous and batch processes are given.  The continuous example is of a
commodity chemical process where color is the product specification of
interest.  The batch example is of a nylon process where the relative viscosity
is the product specification of interest.  While the analysis methods are the
same, one significant difference between handling continuous and batch data is
that the batch data must first be ?unfolded? before a model can be developed. 
Once the models are developed, clusters corresponding to successful and
non-compliant products are found.  The non-compliant product clusters are
further examined, and the relationships between the product specification (KPI)
and the upstream process variables causing the non-compliance are discovered. 
Figure 1 shows the results of the scores plot of the continuous commodity
chemical example.  Figure 2 is an example of the scores plot of the batch nylon
example.  In both cases, clusters of production activity where both in-spec and
non-compliant quality production are observed.  Focusing on the batch example,
Figures 3 and 4 show the SPE and Hotelling T2 charts of the initial
process data.  The outliers observed visually in Figure 2 are also detected
numerically by calculating the Hotelling T2 metric.  The Hotelling T2
metric finds points in the scores chart that are on the scores plane but far
away from the center-of-mass of the primary cluster.  The SPE metric finds
points in the clusters that are away from the scores plane.

A ?Contribution to the SPE?
chart is used to discover the relationships between the KPI's and the upstream
process variables, as shown in Figure 5.  Note that in Figure 3 (SPE metric),
batch 49 is far away from the scores plane, even though its projection is in
the cluster of points that represent batches with good product specification. 
The SPE contribution chart (Figure 5) reveals that deviations in two
temperatures, two pressures and a flow caused the batch to product off-spec
product.  Specifically, the batch did not meet the turning points in the
prescribed trajectories when the batch reached the 65th time
interval of its run.  An examination of the control system revealed that a
control issue was accountable for missing the turning points.

These examples show how a
very large set of process data ? of varying quality (missing data, low
signal-to-noise ratio, correlation, etc) ? can be reduced in dimensionality and
how the leading process variables can be identified and related to the
downstream KPI's to improve product quality and consistency.  The model has the
granularity that is typically missing in first-principal models.  The resource
for this method of model building ? historical process data ? is readily
available but seldom used.  The resource is seldom used for process optimization
and analysis because, without the methods needed to reduce the data-space
dimensionality and cope with missing and correlated data, this resource
presents too much raw, unconditioned information.

 Model developed using continuous historical process data

Batch No. 49,95% Confidence Limit
 Model developed using batch historical process data
 Squared Prediction Error (SPE) of the batch data (Nylon example)
Batches 50 - 55,95% Confidence Limit
 Hotellings T2 metric of the batch data (Nylon example)

 

Checkout

This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.

Checkout

Do you already own this?

Pricing

Individuals

AIChE Pro Members $250.00
AIChE Graduate Student Members $250.00
AIChE Undergraduate Student Members $250.00
AIChE Explorer Members $300.00
Non-Members $300.00