# (109d) Selection of Combined Index Weights to Optimize Anomaly Detection in Big Area Additive Manufacturing

#### AIChE Annual Meeting

#### 2021

#### 2021 Annual Meeting

#### Topical Conference: Next-Gen Manufacturing

#### Applied Artificial Intelligence, Big Data, and Data Analytics Methods for Next-Gen Manufacturing Efficiency I

#### Monday, November 8, 2021 - 1:30pm to 1:50pm

One traditional approach to monitor a manufacturing process is statistical process control (SPC). Before SPC is applied, a model that can adequately describe source of variation must be identified. Principal component analysis (PCA) [1] is one technique available to this end. The main idea is to describe normal variations by means of a number of principal components (PCs). This number (K) is smaller than the number of original variables (J). If there is no a priori knowledge about the data structure, it can be difficult to automatically calibrate a PCA model by selecting the correct number of PCs to retain. To overcome this issue, an existing proposed technique that involves the calculation of an ignorance score [2] to calibrate a probabilistic PCA model (PPCA) [3] is used in this study. An optimal model is selected by choosing the number of PCs for which the ignorance score achieves the minimal value.

After a model is selected using the ignorance score, SPC monitoring statistics such as Hotelling's T^{2} statistic (H) and the squared prediction error (Q) can be used to detect anomalous data during process operation [4]. Anomalous data is identified when the H statistic and/or the Q statistic exceed their respective theoretical upper control limits (UCLs). While most applications make use of two charts, one for Q and one for H, a combined index () has been proposed more recently [5]. The combined index is a weighted sum of H and Q as seen in Eq. (1). The two weights, w_{1} and w_{2}, are typically thought of as separate parameters [5]. So far, the weights of the combined index are calculated from UCLs for H and Q [5,6]. For example, the UCL of the H statistic relies on the assumption on how the H statistic is distributed (the chi-squared distribution was assumed in the original work [5]) and on selected confidence limits. Using UCLs as weights in the combined index is not motivated theoretically, only practically. Consequently, these UCL-based weights may render results obtained with the combined index difficult to translate from one application to the next. Today, there is no procedure available to set the weights (w_{1} and w_{2}) so that optimal results can be assured, either theoretically or empirically. In this work, we present the first results of a systematic study where the chosen weights are varied deliberately in order to explore their effects on the combined index as a classifier during process monitoring.

Importantly, the combined index and its use for process monitoring is invariant to scale. Multiplying w_{1} and w_{2 }by the same factor results in proportional change of the combined index and the upper control limit obtained for this combined index. To standardize our approach, we always chose the weights such that the sum of the eigenvalues of the empirical covariance matrix equals the following weighted sum of the same eigenvalues as seen in Eq. (2). The range of w_{2} is chosen so that w_{1} remains positive. Then, w_{1} is calculated based on the above constraint.

Since the weights affect the estimated combined index, there likely exists an optimal set of relative weights that can optimize the performance of the combined index across numerous data types. The hypothesis is that the relative PPCA weights w_{1}=1 and w_{2}= (where ) give the optimal anomaly detection performance. When these weights are applied, the combined index becomes equal to the squared Mahalanobis distance using the covariance matrix corresponding to the PPCA model [2,3] as seen in Eqs. (3-5). In essence, we are suggesting that the new optimal weights found in this study should be used consistently across literature to provide the best anomaly detection with the combined index.

The effect of the relative weights on the combined index is explored with simulated data sets [2]. Data types B1, B2, and C with noise to signal ratios up to 55% are used to generate simulated data sets with 1024 samples. The calculated combined indices are averaged over the number of samples. As seen in Figs. 1-2, the value of the combined index for a particular data set is a function of the selected weights. One should note that the vertical lines visually represent the locations of select weight ratios, such as the PPCA weights and UCL weights, where the chi-squared UCL has a specified confidence limit (i.e. 90%, 95%, and 99%). The simulated data sets allow the exploration of the effect of the noise to signal ratio and a nonoptimal number of retained PCs on the identification of the best performing relative weights. In order to compare the performance of each set of relative weights, the area under the curve (AUC) metric is used [7]. The AUC of the receiver operator characteristic curve (ROC) essentially describes the effectiveness of the combined index, given a set of relative weights, as a classifier to detect the true and false anomalies. The set of relative weights that lead to the maximum AUC (above 0.5 which is a random classifier) is the set of relative weights that produce the best classifier performance.

[1] Jolliffe, I.T. and Cadima, J., â€œPrincipal Component Analysis: A Review and Recent Developments.â€ *Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences*, 374(2065), 2016, pp. 20150202.

[2] Russo, S., Li, G., and Villez, K., â€œAutomated Model Selection in Principal Component Analysis: A New Approach Based on the Cross-validated Ignorance Score,â€ *Industrial & Engineering Chemistry Research, *58(3), 2019, pp. 13448-13468.

[3] Tipping, M.E. and Bishop, C.M., â€œProbabilistic Principal Component Analysis,â€ *Journal of the Royal Statistical Society: Series B (Statistical Methodology)*, 61(3), 1999, pp. 611-622.

[4] Montgomery, D., â€œIntroduction to Statistical Quality Controlâ€, *Wiley*, 8^{th} ed., 2019.

[5] Yue, H.H. and Qin, S.J., â€œReconstruction-based Fault Identification Using a Combined Index,â€ *Industrial & Engineering Chemistry Research*, 40(20), 2001, pp. 4403-4414.

[6] Jackson, J.E. and Mudholkar, G.S., â€œControl Procedures for Residuals Associated with Principal Component Analysis,â€ *Technometrics*, 21(3), 1979, pp. 341-349.

[7] Fawcett, T., â€œAn Introduction to ROC Analysis,â€ *Pattern Recognition Letters,* 27(8), 2006, pp. 861-874.