(210a) Big Data to Support Operators in Chemical Plants


Data to support operators in chemical plants

Dr. Zied Ouertani,
Dr. Benjamin Klöpper



Big data technologies enable new possibilities to analyze
historical data generated by process plants. This contribution provides an
overview of how big data technology is applied to analyze rare events and hence
support maintenance operations by detecting anomalies that would have gone
otherwise unnoticed

Introduction –Big Data for Chemicals

A typical chemical plant generates large amounts of data throughout it
whole life cycle: I/O and tag lists, piping and instrumentation diagrams
(P&ID), control logic, alarm configurations (during planning and commissioning),
measurement values, alarm and event logs, shift books, laboratory results
(during operation), maintenance notification, repair and inspection reports (during
maintenance). Today, the scope of analytics is usually limited to a single data
source. One common example of such an analysis is loop performance monitoring:
Evaluating the control quality of single control loops based on measurements,
set point, and controller output [1]. However, the limitation to a single data sources
hampers the process of learning from historical data. An integration of all
different types of data within process plants will bring up the notorious big
data attributes: high volume, high velocity, and high variety [1]. For
instance, a refinery can generate more than 300 GB measured values per year,
produced by more than 60,000 sensors with sampling rates between 1 and 60
seconds. Data exists in structured (sensor readings, database tables),
semi-structured (alarm and event logs) and unstructured formats (shift books,
operation manuals). Often, data has been often stored for 10 years and even

This contribution explores the potential of Big Data Analytics in the
context of chemical plants with a particular focus on operator support. It
focuses on specific application scenarios of event prediction and anomaly

Big Data application addressing rare
events for anomaly detection

Monitoring a chemical plant is certainly a demanding and cognitive
complex task. More than thousands signals with trend displays and faceplates
distributed over dozens of operator screens are not a rarity.  In calm situations without alarms or specific
tasks (e.g. ramp-up, load change, shutdown) operators screen the distributed
control system for abnormalities or suspicious signals. Operators perform this
task by browsing through operator screens and trend displays in a more or less
systematic fashion. This unstructured procedure obviously heavily depends on
the experience level of the operator and carries a high risk to miss relevant
abnormalities. Especially for unexperienced operators it is difficult to judge
if the trend of a specific signal actually is abnormal or not.

A big data support system should address these issues, direct the
operator’s attention to relevant signals, and support the diagnosis and
assessment of the abnormality. This implies a quick visual impression off the
plant status and support for putting suspicious signals into context and
predict unexpected behavior.

Based on a distance based anomaly detection algorithms, a Big Data
Anomaly detection is developed for the processing framework Apache Spark.
Apache Spark enables the algorithms to scale to large data sets by adding
additional computational resources to the systems (scale out).  REF _Ref474483343 \h Figure 1 08D0C9EA79F9BACE118C8200AA004BA90B02000000080000000E0000005F005200650066003400370034003400380033003300340033000000
one shows the big
data architecture of the plant-wide anomaly detection. The architecture is an
adaption of Maerz Lamdba-Architecture [3] and can be organized in three
different layers: stream layer (handling and processing streams of incoming
data), batch layer (processing large amounts of stored historical data), and
the serving layer (providing access to the algorithm results).

The ingestion point for new data is Apache Kafka, a stream-processing
framework, high-throughput,
low-latency platform for handling real-time data feeds. Its capability to
persist data for long time and leave control when to read data to the consumer
of data makes it ideal to decouple the fast reading process of the stream
analytics and the more batch and bulk oriented reading process to the storage
of the batch layer.

Figure  SEQ
Figure \* ARABIC 1: Big Data System Architecture for
Plant-Wide Anomaly-Detection

The batch layer consists
of two elements: the Hadoop File Systems (HDFS) that stores process data in
columnar fashion and enables easy distributed processing by the batch algorithm
executed within Apache Spark. The implemented k-nearest neighbor algorithm
analyzes the proximity between episodes (in our case 60 minutes) of data from
single signals (univariate approach). In order to handle the large amount of
data without extensive expert classification by into normal and abnormal
behavior, the big data algorithm works in an unsupervised fashion considering
all available data (normal and abnormal) and 
applying the ‘three-sigma rule of thumb’ to initial identify and score
anomalies. The results of the algorithm are the anomaly scores for the historical
episodes. The results and the historical episodes are stored in the serving
layer database (HBASE).

The second part
of the streaming layer is Apache Spark Streaming. Spark Streaming is a library for Spark and
represents the possibility to use Spark over a stream of data. Spark Streaming
uses a micro-batch architecture, in the sense that the streaming computation is
treated as a continuous series of batch computations on small batches of data.
The duration of a batch is determined by the configuration parameter batch
interval. When a batch starts, all the data that arrive during the interval are
added to the batch. This micro-batching semantics fits the anomaly detection
scenario very well and enables to share the algorithm code between the batch
and streaming layer. While the batch algorithm compares all historical episodes
with all other historical episodes and determines a proximity threshold, the
stream algorithm compares one query episode against all historical episode and
calculates an anomaly score based on the threshold value. The results of the
stream algorithms are stored in the serving layer database Apache HBASE.

HBASE is a
distributed key-value storage running on top of HDFS. HBASE is very scalable,
robust and has sufficient answering times for the anomaly detection scenario.
The operator UI, which is, deploy via an application server polls the HBASE for
the relevant data.  REF _Ref474499904 \h Figure 2 08D0C9EA79F9BACE118C8200AA004BA90B02000000080000000E0000005F005200650066003400370034003400390039003900300034000000
shows a screenshot of the operator UI. The
heat map (1) in the upper left corner shows the most suspicious signals
according to the anomaly score calculated by the streaming algorithm. The heat
map also gives an indication of the development of the anomaly score over time.
The upper right corner (2) shows a normal episode (low anomaly score), the
lower left hand corner (3) the current trend of signal and the right hand
corner (4) a similar situation in the past. This information helps the operator
to perform a broad monitoring of the plant and to judge if a signal trend is
actually abnormal or not. The similar situation in the past helps to diagnose
the current situation and can be used to investigate additional data sources
like shift books or alarm logs.

Figure  SEQ
Figure \* ARABIC 2: Anomaly Detection UI


authors would like to thank the development partners from the KDE Group,
University of Kassel, the Department of Measurement and Control, University of
Kassel, the Chair of Process Control System Engineering, TU Dresden and RapidMiner
for the great cooperation and joint work in the project. The development team
is thankful to PCK, BASF and INEOS for many valuable discussions, crucial
feedback and for providing access to the data.


Dr. Zied Ouertani

phone: +49 (0) 6213 811012

e-mail: mohamed-zied.ouertani@de.abb.com

Dr. Benjamin Klöpper

phone: +49 (0) 6203 716211

e-mail: benjamin.kloepper@de.abb.com


Martin; Kloepper, Benjamin ; Mawla, Hassan Al ; Jäschke, Benjamin ; Hollender,
Martin ; Graube, Markus ; Arnu, David ; Schmidt, Andreas ; Heinze, Sebastian ;
Schorer, Lukas ; Kroll, Andreas ; Stumme, Gerd ; Urbas, Leon: {Big Data
Analytics for Proactive Industrial Decision Support: Approaches \& First
Experiences in the Context of the FEE Project}. In: atp edition, 58 (2016), Nr.

[2]     McMillan, Gregory K.
(2014) Tuning and control loop performance.Momentum Press

[3]     McAfee, A.,
Brynjolfsson, E., Davenport, T.H., Patil, D.J. Barton, D. (2012) Big Data. The
management Revolution. Harvard Bus Rev 90(10), 61-67