# (767g) An Extended Algorithm of Generalized Linear Model to Cope with Multicollinearity and Nonlinearity

- Conference: AIChE Annual Meeting
- Year: 2019
- Proceeding: 2019 AIChE Annual Meeting
- Group: Computing and Systems Technology Division
- Session:
- Time:
Friday, November 15, 2019 - 2:24pm-2:43pm

In increasingly fierce market competition, the process

industry is required to continually provide high-grade products with low

production costs in a sustainable production environment. However,

manufacturing processes are always influenced by various variations including

process perturbations, equipment malfunctions, and inappropriate operations.

These variations result in defective products, which not only reduce the

quality of products but also increase manufacturing costs, energy consumption,

and lead time. One of the methods to reduce the defects of products is to

develop a soft sensor that predicts the number of defects (response variable)

from process operating conditions (predictor variables), and to control the

process based on the model so that defects are less likely to occur.

In the process industry, linear regression methods

such as partial least squares (PLS) have been widely used to construct soft

sensors [1,2]. However, they cannot properly cope with count data, because they

assume that the response variable follows a normal distribution with equal

variance and may yield negative estimates for the response variable, which is

incorrect for the defect count data. Thus, when the response variable is binary

data or count data, generalized linear model (GLM) is widely used [3]. GLM

assumes that the response variable follows a distribution belonging to

exponential family distributions such as a Bernoulli distribution or a Poisson

distribution. GLM includes the logistic regression model and the Poisson

regression model.

In the framework of GLM, a regression model is

constructed assuming that the predictor variables and the response variable

have a linear relationship through a link function. Hence, GLM can cope with

binary and count data but cannot cope with strong nonlinearity or

multicollinearity.

To overcome the problems of nonlinearity and

multicollinearity, partial least squares generalized linear regression

(PLS-GLR) and local likelihood estimation (LLE) were proposed [4,5]. PLS-GLR

can cope with the multicollinearity by incorporating the characteristics of PLS

into GLM. On the other hand, LLE can cope with nonlinearity by constructing a

local GLM each time when a predicted value of the response variable is

required. However, these GLM-based methods cannot cope with a complex system

which has multicollinearity and nonlinearity at the same time.

In this research, we developed partial least squares

local likelihood estimation (PLS-LLE) to cope with both nonlinearity and

multicollinearity by constructing a local PLS-GLR model each time when a

predicted value of the response variable is required.

To examine the effectiveness of the proposed method, a

numerical experiment was conducted, in which the predictor variables have

linear relationship, the predictor and response variables have nonlinear

relationship, and the response variable follows a Poisson distribution. In this

numerical example, only three of ten predictor variables are independent, and

the other seven variables are generated by linear combinations of the three

independent variables. First, the training, validation, and test data sets were

generated so that each set has 100 samples. Second, regression models were

built by PLS-GLR, LLE, and PLS-LLE using the training data. Third, model parameters

were tuned using the validation data. Finally, root mean squared error (RMSE)

and R-squared (R2) coefficient of determination were calculated using the test

data to measure the accuracy of the models. Fig. 1 shows the result of

repeating the above procedure 50 times for each method. It was confirmed that

PLS-LLE showed higher prediction accuracy than the existing methods.

PLS-LLE is an effective regression method that can

cope with nonlinearity and multicollinearity if it is possible to assume an

appropriate distribution. We are now applying the proposed PLS-LLE to real

industrial data and aim to predict the number of defects in products from the

process operating conditions. The application results will be presented at the

meeting.

è‡ªå‹•çš„ã«ç”Ÿæˆã•ã‚ŒãŸèª¬æ˜Ž" class="documentimage">

**References**

[1] M. Kano and M.

Ogawa, â€œThe state of the art in chemical process control in Japan: Good

practice and questionnaire survey,â€ J. Process Control, vol. 20, no. 9, pp.

969â€“982, 2010.

[2] M. Kano and K.

Fujiwara, â€œVirtual sensing technology in process industries: Trends and

challenges revealed by recent industrial applications,â€ J. Chem. Eng. Japan,

vol. 46, no. 1, pp. 1â€“17, 2013.

[3] A. Dobson and

A. Barnett, â€œAn Introduction to Generalized Linear Models, Third Editionâ€,

Chapman and Hall, 2008.

[4] B. Marx,

â€œIteratively Reweighted Partial Least Squares Estimation for Generalized Linear

Regressionâ€, Technometrics, vol. 38, no. 4, pp. 374-381, 1996.

[5] R. Tibshirani,

T. Hastie, â€œLocal Likelihood Estimationâ€, J. the American Statistical Association,

vol. 82, no. 398, pp. 559-567, 1987.