(429d) Gradient-Weighted Class Activation Mapping (Grad-CAM) Based Explanations for Process Monitoring Results from Deep Neural Networks | AIChE

(429d) Gradient-Weighted Class Activation Mapping (Grad-CAM) Based Explanations for Process Monitoring Results from Deep Neural Networks


Srinivasan, R. - Presenter, Indian Institute of Technology Madras
In the chemical process industries, high product quality with maximum efficiency and process safety are the critical issues that require effective process monitoring. A fault in a process results in the deviation of process state variables from its normal operating condition. The effect of the fault propagates along the process and disturbs stability of the system. So, it is essential to detect them quickly on their occurrence. To tackle this problem, we motivate to develop the methodology which fulfils the aspect of high accuracy, novel explainability and to make effective use of the multivariate time series process data.

Recent developments in sensor techniques and storage technologies have provided a significant technology push for developing next-generation data driven process monitoring methods [1]. Hence deep learning methods have attracted much attention for fault detection. Deep learning methods like Deep Belief Network (DBN), Convolutional Neural Networks (CNN) and, Hierarchical Deep Neural Networks (HDNN) have been used for fault detection [2] but they lack in providing explanations for their decision. Hence, the eXplainable Artificial Intelligence (XAI) method Integrated Gradients was used by [3] to provide explanations for the results generated from deep learning models. But the Integrated Gradient suffer from one shortcoming that it analyses the samples one by one without accounting for the fact that the data is a time series. In the paper, we develop the XAI method which accounts for the time series nature of the process monitoring problem.

Recently, a new XAI method called Grad-CAM has been developed targeted at providing explanations for time series data in terms of activation maps for CNN [4]. Grad-CAM helps to explore the spatial information that is preserved inside convolution layers. The extracted information from CNN layers helps to understand the important feature of an input data to make classification decision. Grad-CAM uses k feature maps of height h and width u of the last convolution layer, as it contains high level semantics and detailed spatial information. For each feature map Ak, we obtain an importance weightwkc associated with specific class c. The wkc is obtained by computing the gradient of final class output ycwith respect to the feature map Ak i.e., ∂yc ⁄∂Ak. It is further globally averaged over width u and height h. This step is followed by computing a weighted sum of feature maps Ak of class c using wkc. The final heatmap is obtained by applying the ReLU operation on the weighted sum to emphasize only positive values. This heatmap highlights the features which are important in making a final decision of the network for class c. In this paper, we use the Grad-CAM method for the explanation of process fault diagnosis.

In this paper, we exploit the real time explainable CNN methodology that classifies the normal and faulty state of the process. The proposed methodology consists of a 2D Convolution layer that is trained to solve a fault detection problem. Then, Grad-CAM is used on a trained model to understand which part of input data contributes more to detect a fault [5]. The contribution of different features visualized using the generated heatmap for each sample helps to identify the key factors in fault occurrence. The performance of the proposed methodology demonstrated on simulated time series classification datasets which contain different input variables and categorical outputs. The CNN was trained and heatmaps were generated using Grad-CAM that provide an accurate explanation about the predicted outcome. In this paper, we will describe the proposed methodology as well as results obtained from the simulated dataset which effectively makes use of the time series dimension to provide intuitive explanations for outputs. The results from fault diagnosis on the Tennessee Eastman challenge problem will also be demonstrated [6].


[1] P. Park, P. D. Marco, H. Shin, and J. Bang, “Fault Detection and Diagnosis Using Combined Autoencoder and Long Short-Term Memory Network,” Sensors, vol. 19, no. 21, p. 4612, Oct. 2019, doi: 10.3390/s19214612.

[2] H. Wu and J. Zhao, “Deep convolutional neural network model based chemical process fault diagnosis,” Comput. Chem. Eng., vol. 115, pp. 185–197, Jul. 2018, doi: 10.1016/j.compchemeng.2018.04.009.

[3] V. Pakkiriswamy and R. Srinivasan, “An Explainable Artificial Intelligence Based Approach for Interpretation of Fault Detection Results from Deep Neural Networks,” presented at the 2020 Virtual AIChE Annual Meeting, 2020, [Online]. Available: https://www.aiche.org/academy/conferences/aiche-annual-meeting/2020/proc....

[4] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization,” ArXiv161002391 Cs, Dec. 2019, doi: 10.1007/s11263-019-01228-7.

[5] R. Assaf and A. Schumann, “Explainable Deep Neural Networks for Multivariate Time Series Predictions,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, Macao, China, Aug. 2019, pp. 6488–6490, doi: 10.24963/ijcai.2019/932.

[6] J. J. Downs and E. F. Vogel, “A plant-wide industrial process control problem,” Comput. Chem. Eng., vol. 17, no. 3, pp. 245–255, Mar. 1993, doi: 10.1016/0098-1354(93)80018-I.