(767g) An Extended Algorithm of Generalized Linear Model to Cope with Multicollinearity and Nonlinearity

Authors:
Kyoto University
Kyoto University

In increasingly fierce market competition, the process
industry is required to continually provide high-grade products with low
production costs in a sustainable production environment. However,
manufacturing processes are always influenced by various variations including
process perturbations, equipment malfunctions, and inappropriate operations.
These variations result in defective products, which not only reduce the
quality of products but also increase manufacturing costs, energy consumption,
and lead time. One of the methods to reduce the defects of products is to
develop a soft sensor that predicts the number of defects (response variable)
from process operating conditions (predictor variables), and to control the
process based on the model so that defects are less likely to occur.

In the process industry, linear regression methods
such as partial least squares (PLS) have been widely used to construct soft
sensors [1,2]. However, they cannot properly cope with count data, because they
assume that the response variable follows a normal distribution with equal
variance and may yield negative estimates for the response variable, which is
incorrect for the defect count data. Thus, when the response variable is binary
data or count data, generalized linear model (GLM) is widely used [3]. GLM
assumes that the response variable follows a distribution belonging to
exponential family distributions such as a Bernoulli distribution or a Poisson
distribution. GLM includes the logistic regression model and the Poisson
regression model.

In the framework of GLM, a regression model is
constructed assuming that the predictor variables and the response variable
have a linear relationship through a link function. Hence, GLM can cope with
binary and count data but cannot cope with strong nonlinearity or
multicollinearity.

To overcome the problems of nonlinearity and
multicollinearity, partial least squares generalized linear regression
(PLS-GLR) and local likelihood estimation (LLE) were proposed [4,5]. PLS-GLR
can cope with the multicollinearity by incorporating the characteristics of PLS
into GLM. On the other hand, LLE can cope with nonlinearity by constructing a
local GLM each time when a predicted value of the response variable is
required. However, these GLM-based methods cannot cope with a complex system
which has multicollinearity and nonlinearity at the same time.

In this research, we developed partial least squares
local likelihood estimation (PLS-LLE) to cope with both nonlinearity and
multicollinearity by constructing a local PLS-GLR model each time when a
predicted value of the response variable is required.

To examine the effectiveness of the proposed method, a
numerical experiment was conducted, in which the predictor variables have
linear relationship, the predictor and response variables have nonlinear
relationship, and the response variable follows a Poisson distribution. In this
numerical example, only three of ten predictor variables are independent, and
the other seven variables are generated by linear combinations of the three
independent variables. First, the training, validation, and test data sets were
generated so that each set has 100 samples. Second, regression models were
built by PLS-GLR, LLE, and PLS-LLE using the training data. Third, model parameters
were tuned using the validation data. Finally, root mean squared error (RMSE)
and R-squared (R2) coefficient of determination were calculated using the test
data to measure the accuracy of the models. Fig. 1 shows the result of
repeating the above procedure 50 times for each method. It was confirmed that
PLS-LLE showed higher prediction accuracy than the existing methods.

PLS-LLE is an effective regression method that can
cope with nonlinearity and multicollinearity if it is possible to assume an
appropriate distribution. We are now applying the proposed PLS-LLE to real
industrial data and aim to predict the number of defects in products from the
process operating conditions. The application results will be presented at the
meeting.

è‡ªå‹•çš„ã«ç”Ÿæˆã•ã‚ŒãŸèª¬æ˜Ž" class="documentimage">

References

[1] M. Kano and M.
Ogawa, â€œThe state of the art in chemical process control in Japan: Good
practice and questionnaire survey,â€ J. Process Control, vol. 20, no. 9, pp.
969â€“982, 2010.

[2] M. Kano and K.
Fujiwara, â€œVirtual sensing technology in process industries: Trends and
challenges revealed by recent industrial applications,â€ J. Chem. Eng. Japan,
vol. 46, no. 1, pp. 1â€“17, 2013.

[3] A. Dobson and
A. Barnett, â€œAn Introduction to Generalized Linear Models, Third Editionâ€,
Chapman and Hall, 2008.

[4] B. Marx,
â€œIteratively Reweighted Partial Least Squares Estimation for Generalized Linear
Regressionâ€, Technometrics, vol. 38, no. 4, pp. 374-381, 1996.

[5] R. Tibshirani,
T. Hastie, â€œLocal Likelihood Estimationâ€, J. the American Statistical Association,
vol. 82, no. 398, pp. 559-567, 1987.