# (577e) A Nearest in Control Neighbour Based Method to Estimate Variable Contributions to the Hotelling's Statistic

#### AIChE Annual Meeting

#### 2008

#### 2008 Annual Meeting

#### Computing and Systems Technology Division

#### Poster Session: Computers in Operations and Information Processing

#### Wednesday, November 19, 2008 - 6:00pm to 8:30pm

Abstract

Hotelling's statistic, also called T2-statistic, is widely used in statistical process control as an extension of the univariate student's chart to reliably detect out of control status in multivariate processes.

Although it is a very efficient tool for detection purposes, by itself, it offers no assistance about the reason of the declared faulty status.

Several different approaches have been proposed to estimate the variable values' effect in the overall statistic's value. Some of these strategies work in the original measurement space while others interpret the results coming from analysis in latent variable spaces such as principal component analysis (PCA) or independent component analysis (ICA).

Introduction

Statistical process monitoring involves three activities: detection of the out-of-control or faulty status, identification of the variable or variables that signal such condition, and diagnosis of the source cause for the abnormal behaviour. Although Hotelling's statistic is widely used to reliably detect out of control status it offers no assistance in the identification stage. A number of strategies have been proposed to assign variable-contribution values to the T2-statistic taking into account the multivariate nature of process data.

Mason et al. [1,2] proposed to decompose the T2-statistic value as a summation of N independent parts (where N is the number of measured variables).

The first term is calculated squaring a univariate t statistic for one variable. The j-th term ( j=2, ?, N) of the sum is the j-th measurement adjusted by using estimates of the mean and standard deviation of its conditional probability distribution given the ( j-1) previously considered variables. Since there exists no fixed order variables, N! different but non-independent partitions can be obtained.

As an alternative to this problem, authors suggested to focus the interest in only two of those terms for each partition: the one corresponding to the unadjusted contribution of a single selected variable and, the term containing the adjusted contributions of this after the adjustment of the (N-1) remaining ones.

Nevertheless, when the inspection of this reduced set of terms is not enough to come to a clear conclusion, all significant conditional terms should be compared to a critical value, increasing the complexity of the identification of the source fault.

An alternative straightforward method to decompose the T2-statistic as a unique sum of variable contributions was recently presented by Alvarez et al. [3]. This method also provides a clear understanding of positive and negative contributions which often results from the techniques mentioned here (and had not been given an interpretation) and estimates a bound for the negative ones.

Among the methods that work in latent variable-spaces, it was Jackson [4] who first proposed the decomposition of the T2-statistic into a sum of principal components and perform the identification in terms of the weigh of each variable in the of out-of-control component. However, in most of the industrial applications it results very difficult to associate a physical meaning to each principal component and, the variables associated with out-of-control signals cannot be determined easily.

Miller et al. [5] and MacGregor et al. [6] proposed to evaluate the contributions of each process variable to the scores that are outside of their confidence limits. Nomikos [7] presented an approach to calculate the contributions of each process variable to the T2-statistic instead of to the scores, when latent variables cannot be associated to a meaningful group of process variables.

Westerhuis et al. [8] extended the theory of contribution plots to latent variable models with correlated scores and, introduced control limits for the contributions that help in finding the variables which behaviour are different with respect to those contained in the reference data set.

In all the above mentioned methods, the contribution to the T2-statistic for each variable is estimated considering the remaining N-1 variables fixed at their measured values. As a result of this, there is a sole ?parametric curve? defining all the possible values of the T2-statistic as function of only the analyzed variable as it was pointed out by Alvarez et al. [3].

In this work a novel method to estimate the variable contribution to the T2-statistic is presented that do not imposes this restriction.

Given a measured point which T2-value exceeds the critical value TC, the contribution of each variable will be determined in terms of the minimum distance between the measured point and its closer neighbour which T2-value equals TC.

The problem of finding this neighbour can be stated as an optimization problem where the objective is to find an alternative point that minimizes a distance function to the measured point, subject to the constrain that the corresponding T2-value is equal or less than TC.

Since all the variables should have been previously standardized the comparison of the resulting movements in each direction can be used as estimates of the deviation degree of each variable and can be used in a similar way to classically used contribution plots.

Results have shown a good performance when this technique is applied in the original variable space and are similar to those obtained using the OSS strategy proposed by Alvarez et al. [3].

When the proposed technique is applied to interpret results in latent variables spaces the variable contribution to T2-statistic are more congruous with the value of SPE contribution than when classical generalized contribution plots technique is applied.

Furthermore, in this case it is possible to obtain a sole ?global? contribution plot by adding the additional constrain that SPE- value is equal or less than its critic value SPEC.

In this case the nearest neighbour fulfils both statistics' requirements.

References

[1] R.L. Mason, N.D. Tracy, J.C. Young, J. Qual. Technol. 27 (1995) 99?108.

[2] R.L. Mason, N.D. Tracy, J.C. Young, J. Qual. Technol. 29 (1997) 396?406.

[3] Alvarez R, Brandolin A, Sánchez M. Chemometr.Intell. Lab. Syst. 88 (2007) 189-196.

[4] J.E. Jackson, A User's Guide to Principal Components, JohnWiley & Sons Inc., New York, 1991.

[5] P. Miller, R. Swanson, C. Heckler, Appl. Math. Comput. Sci. 8 (1998) 775?792.

[6] J.F. MacGregor, C. Jaeckle, C. Kiparissides, M. Koutoudi, AIChE J. 40 (1994) 826?838.

[7] P. Nomikos, ISA Trans. 35 (1996) 259?266.

[8] J.A. Westerhuis, S.P. Gurden, A.K. Smilde, Chemometr. Intell. Lab. Syst. 51 (2000) 95?114.

[9] S. Valle, W. Li, S.J. Qin, Ind. Eng. Chem. Res. 38 (1999) 4389?4401.

[10] R. De Maesschalck, D. Jouan-Rimbaud, D.L. Massart, Chemometr. Intell. Lab. Syst. 50 (2000) 1?18.

### Checkout

This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.

### Do you already own this?

Log In for instructions on accessing this content.

### Pricing

####
**Individuals**

AIChE Pro Members | $150.00 |

AIChE Graduate Student Members | Free |

AIChE Undergraduate Student Members | Free |

AIChE Explorer Members | $225.00 |

Non-Members | $225.00 |