# (509e) Automated Parameter Selection for an Autoencoder for Use in Anomaly Detection

#### AIChE Annual Meeting

#### 2023

#### 2023 AIChE Annual Meeting

#### Computing and Systems Technology Division

#### Process monitoring & fault detection II

#### Thursday, November 9, 2023 - 5:10pm to 5:35pm

A novel method to select the optimal number of latent variables with principal component analysis (PCA) was recently proposed in our previous work [2]. In this method, the dataset XâˆˆR^{mxn}, where m is the number of observations and n is the number of variables, is split into a calibration data block, X_{c}âˆˆR^{mcxn}, and a validation data block, X_{v}âˆˆR^{mvxn}, where m_{c}+m_{v}=m. After these data blocks are identified, a faulty data block that corresponds to anomalous datapoints is created from the validation block by randomly permuting the elements of each column of data separately. The faulty block, X_{f}âˆˆR^{mvxn}, is used with the validation block to assess the discriminative ability of the anomaly detection algorithm. Generating the faulty data block helps address the common issue of training a classifier on an imbalanced dataset with fewer anomalous observations by simulating them from the validation block. Once these three data blocks are obtained, the calibration block is used to train a PCA model for a specified number of latent variables. After training, the validation and faulty data blocks are embedded and reconstructed using the PCA model, and the reconstruction errors are computed using the Q statistic [3]. The Q statistic is the sum of squares error for each datapoint and is a summary statistic describing the reconstruction error that is commonly used to detect anomalies through a PCA model. To quantify the ability to distinguish normal datapoints from anomalous datapoints using the Q statistics, a sweep of Q statistics thresholds is used to construct a Receiver Operator Characteristic curve (ROC). From the ROC curve, the Area Under the Curve (AUC) is computed to describe the performance of the anomaly detection model using a specified number of latent variables. This method to evaluate the performance of an anomaly detection model can be used to select the number of latent variables by running this analysis for a specified range of latent variables and selecting the number which corresponds to the highest AUC. While this method has proven to be successful for PCA, its use with other types of dimensionality reduction and reconstruction algorithms has not been explored to date.

PCA, being a linear model, may be a poor choice in cases where the data exhibits significant nonlinear properties. Since PCA is a linear model, it is likely that the optimal AUC is obtained for a number of latent variables, or principal components (PCs), that does not match the true dimensionality. There are two possible scenarios: 1) Too few PCs are chosen such that not all features of the high-dimensional dataset are captured during reduction or 2) Too many are chosen causing noise to be captured in the lower-dimensional representation. In turn, a suboptimal number of PCs is expected to hinder anomaly detection performance. To address this limitation, we explore the use of the automatic tuning method with an Autoencoder model (previously known as auto-associative neural networks). Such models are able to identify non-linear features of low-dimensional manifolds better than PCA [1]. However, an objective and optimal tuning method for nonlinear PCA models, like the Autoencoder does not exist.

We hypothesize that the optimal number of latent variables in nonlinear PCA models can be determined using the same method and criterion (AUC) as used for PCA. To test this hypothesis, we employ two datasets to investigate different aspects of the tuning modelâ€™s performance. First, we use a trefoil knot to determine if the method can determine the true number of latent variables. The trefoil knot dataset consists of a set of 3-dimensional coordinates describing a curve computed from three parametric equations that use the independent variable, . Second, we use the MNIST database consisting of 60,000 28x28 pixel images of hand-written digits in the training set, and 10,000 images in the test set [4]. This dataset was used to determine if a global maximum AUC score exists across the models using different numbers of latent variables.

First, the ability of the automatic tuning method to determine the true number of latent variables in a trefoil knot was tested. Although the trefoil knot is computed from one variable, the trefoil knot does not have an inverse function to 1D. However, an inverse mapping does exist from the trefoil knot to 2D since it is homeomorphic with a circle. Since this inverse mapping exists, the expected number of latent variables to be detected by the tuning method for a trefoil knot is two. After sweeping across the three possible numbers of latent variables, the 2D embedding produced the highest AUC using the automatic tuning method. The ROC curves from the three numbers of latent variables are shown on the left in Figure 1.

After showing the tuning method can determine the correct number of latent variables, we tested its ability to find an optimal number of latent variables for a higher-dimensional dataset. The proposed method is built on the assumption that the optimal number of latent variables will maximize the AUC of the anomaly detection model tested on the validation and randomized data blocks. To test this assumption, we used 7000 images of the 1 digit in the MNIST dataset to automatically tune the number of neurons, akin to number of latent variables, in the bottleneck layer of a convolutional Autoencoder. After sweeping across the number of latent variables, a global maximum AUC score equal to 0.9928 was observed for 16 latent variables allowing for selection of the optimal number of latent variables. With the PCA model, the max AUC was comparable; however, many more latent variables, 122, were required. A plot of the AUC versus the number of latent variables is shown on the right in Figure 1.

Using the MNIST and trefoil knot datasets, the automatic tuning method developed for PCA was demonstrated with an autoencoder. The trefoil knot demonstrated the ability of the method to select the true number of latent variables within a dataset. Testing the method on the MNIST dataset demonstrated the existence of a global optimum number of latent variables that could be found using AUC. Our main conclusions from this study follow:

- The first objective tuning method for nonlinear PCA models was proposed and tested.
- The proposed method works well for a simulated case where the true number of dimensions is known.
- In the case of the MNIST data, good performance is obtained with 7 latent variables, and the best AUC was achieved using 16 latent variables. Dimension reduction of 98% is accomplished with an autoencoder versus 84% using a PCA model with equivalent performance.

In future work, we aim to address the following limitations:

- The performance of the models selected by the tuning method must be tested with datasets containing anomaly cases instead of the randomized data block to validate improved classification performance.
- Although the results from MNIST and the trefoil knot show promise, this method will need to be tested with more datasets of nontrivial structure and known number of true latent variables.
- The relative impact of autoencoder layers, bottleneck versus others, on anomaly detection performance should be quantified. While PCA has only one hyper-parameter, autoencoder models have many hyper-parameters and near-limitless potential architectures. In the future, the sensitivity of the anomaly detection performance to the bottleneck layer versus other layers should be quantified to verify relative importance of the bottleneck layer.

**Sources**

[1] M. Sakurada and T. Yairi, â€œAnomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction,â€ in *Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis*, New York, NY, USA, Dec. 2014, pp. 4â€“11. doi: 10.1145/2689746.2689747.

[2] D. Chowdhury, M. McClain, and K. Villez, â€œAutomated Dimension Reduction with Principal Component Analysis Using Area Under Curve,â€ presented at the 2021 AIChE Annual Meeting, Nov. 2021. Accessed: Mar. 13, 2023. [Online]. Available: https://aiche.confex.com/aiche/2021/meetingapp.cgi/Paper/628428

[3] B. M. Wise and N. B. Gallagher, â€œThe Process Chemometrics Approach to Process Monitoring and Fault Detection,â€ *IFAC Proceedings Volumes*, vol. 28, no. 12, pp. 1â€“21, Jun. 1995, doi: 10.1016/S1474-6670(17)45398-5.

[4] Y. LeCun, â€œThe MNIST database of handwritten digits,â€ *http://yann. lecun. com/exdb/mnist/*, 1998.