(370u) Cognitive Autonomous Agents for Process Abnormal Behaviour Detection: An Application of Arti?cial Intelligence | AIChE

(370u) Cognitive Autonomous Agents for Process Abnormal Behaviour Detection: An Application of Arti?cial Intelligence

Authors 

Srinivas, S., TCS Research
Runkana, V., TCS Research
Health monitoring namely, identifying faulty behavior remains the most critical tasks for avoiding downtimes & preventing mishaps of large scale complex industrial plants. Machine learning based discriminative models, the self-supervised Recurrent Neural Network (RNN) Auto Encoder in specific, are best suited for abnormality detection. The Composite RNN variant auto encoder is composed of an encoder and a dual-decoder pair. The encoder maps a variable-length source sequence to a fixed-length vector which is a compressed representation that learns salient features of the source. The reconstruction decoder maps the vector representation back to a variable-length target sequence or the source. The prediction decoder is capable of predicting the output several time steps ahead of an input. It detects the pattern that deviates from normal reconstructed behavior and labels them as anomalies.

Discriminative models are, in general black box models that limits explainability of the function modeled by the machine learning algorithm. Machine learning based generative approaches capture the operational behavior of the complex process system and the generative information helps in better understanding of faulty behavior in the industrial plants. Three distinct generative models categorized namely, Generative Adversarial Nets (GAN), Variational Autoencoder (VAE) and Flow-Based Generative Nets model are discussed. GAN, rooted in game theory, whose sole aim is to realize the Nash Equilibrium betwixt discriminator and generator net that is based on a twoplayer non-cooperative zero-sum up game. This helps to build a model for the source data generation, i.e., an unsupervised to a supervised machine learning algorithm published by Ian Goodfellow, Google Brain, 2014. Several alternatives to traditional GAN training are implemented namely, Wasserstein GAN (WGAN), Bayesian GAN & Metropolis-Hastings generative adversarial network (MH-GAN). WGAN uses Wasserstein distance as GAN loss function published by Soumith Chintala, Facebook AI Research, 2017 and Bayesian GAN uses Bayesian formulation for semi-supervised and unsupervised machine learning algorithms with GANs and applies stochastic gradient Hamiltonian Monte Carlo to deprecate the weights of the generator and discriminator networks published by Yunus Saatchi, Uber AI Labs, 2017. MH-GAN, published by Ryan Turner, Uber AI Labs, 2018 integrates certain aspects of Metropolis-Hastings algorithm (leverages Markov Chain Monte Carlo (MCMC) for sampling) and GANs. The vanilla variant of GAN samples from the distribution defined by the generator. In contradiction, MH-GAN samples from the distribution implicitly defined by a GAN’s discriminator-generator. GANs do not explicitly intersect to the real source data distribution, indeed when generator is imperfect. There by, for improved sampling it utilizes the discriminator from GAN training too. VAE, rooted in Bayesian inference, utilizes probability modeling in a neural network and captures the beneath probability distribution of the source by optimizing the log-likelihood of the source data by explicitly maximizing the evidence lower bound published by Diederik P. Kingma, Google Brain, 2014. GAN & VAE have inherent limitations in learning the probability density function of the source. Flow-Based Generative Nets are constructed by a series of invertible transformations. Unlike the former algorithms, the model explicitly assimilates the source data distribution and utilizes negative log-likelihood as the loss function. Normalizing flow models are namely, Real-valued Non-Volume Preserving (RealNVP) & Non-linear Independent Components Estimation (NICE) published by Dinh Laurent Google Brain, 2017. Later Durk Kingma, Google Brain, 2018, published Generative Flow with Invertible 1 × 1 Convolutions(Glow).

Convolution Autoencoders utilize the convolution operator to exploit that a signal are often seen as a summation of distinct signals. Nal Kalchbrenner, Google Brain, 2016, published Neural Machine Translation in Linear Time, and proposed ByteNet Encoder-Decoder model. The algorithm applies to the sequential source data, layer by layer, dilated 1-dimensional convolutions to obtain the source encoding. In order to obtain the next output in the target sequence, the decoder then applies masked 1-dimensional convolutions on the target sequence (conditioned by the encoder output). The byteNet decoder is referred to as generation model, while the machine translation model is the combination of the encoder and decoder. Jonas Gehring, Facebook AI Research, 2017, published Convolutional Sequence to Sequence Learning. They utilize Convolutional Neural Networks(CNN) which are highly parallelizable. The algorithm uses multi-hop attention so that the network generates the source distribution through multiple revisits to the source data. Attention based Neural Machine Translation, an algorithm to both align and translate published by Dzmitry Bahdanau, Univeristy of Montreal, 2015 overcomes the restriction of the long-term dependencies of the RNN-Encoder-Decoder model. This algorithm encodes the source input sequence to a fixed length vector which is in-turn utilized to decode each output time step. Recurrent Neural Network Auto encoders based point and trend anomalies are limited by parallelization. Attention is all you need published by Ashish Vaswani et al., 2017, Google Brain & Google Research proposes transformer, based solely on attention mechanisms, rejecting either recurrent or convolution based neural networks. Both encoder, decoder are composed of a pile of identical layers. Each of the above stacked layers comprises out of two general types of sub-layers namely, multi-head self-attention mechanism, and position-wise fully connected Feed Forward Neural Network. The above mentioned algorithms have been implemented to detect outliers, designated as point or trend anomalies. For anomaly forecasting, Parent Forcing algorithm based Recurrent Neural Network vanilla variants are implemented. Long Short-Term Memory (LSTM), Gated Recurrent Units(GRUs), Peep-hole & Tree-based variant networks are an extension for Recurrent Neural Networks which overcomes vanishing and exploding gradient by extending their memory to forecast anomalies for a fewer time steps ahead of an input. Teacher Forcing algorithm inherent limitations are overcome by Professor Forcing algorithm, published by Alex Lamb, Amazon Labs, 2016 which utilizes adversarial domain adaptation to regulate hidden states of the RNN architectures, thereby RNN dynamics remain invarinat during training and sampling for forecasting. Autonomous agents that are set up on the above algorithms backed by GPU based Cloud Computing are investigated on the real-world datasets. It delivers a real-time early warning forecasting of the mishaps to prevent down-times and help in smooth operation regime of the large scale industrial plants.