(345u) A Machine Learning Approach for the Design of a Soft Sensor in an Oil Refinery's Distillation Column
AIChE Annual Meeting
2021
2021 Annual Meeting
Computing and Systems Technology Division
Interactive Session: Data and Information Systems
Tuesday, November 9, 2021 - 3:30pm to 5:00pm
The soft sensor takes as inputs the data that is collected on a frequent basis and provides as outputs the molar fractions of C3 and C4 hydrocarbons. The model for the sensor is learnt from historic data using Kaizen programming (KP, Veloso et al., 2018) implemented in Python (Ferreira et al., 2019). KP is an iterative algorithm for solving symbolic regression problems of the type y=sumi (βifi), i=1, 2,..., p.
KP starts with the generation of an initial set of functional bases (fi), i.e., mathematical expressions made of combinations of basic simple functions (exp, log, sin, etc.) and mathematical operators (+, -, *, /). These functional bases are created as expression trees and modified during execution of the algorithm using genetic programming (GP) techniques (Poli et al., 2008). The number of maximum bases of the model (p) and the mutation and crossover probabilities are parameters of the algorithm. At each iteration, new bases are created based on the previous iteration, and the coefficients (βi) fitted by linear regression. Finally, some features are discarded based on the corresponding p-value or value of the coefficient. In the context of linear regression, the p-value is the statistically significant result in a hypothesis test, where the null hypothesis is that the correspondent coefficient is zero. The search of the model finishes with a maximum number of iterations or with some threshold of the error obtained from the linear regression. The main difference between this approach and previous contributions such as Kriging (or Gaussian Process, Krige, 1951), Support vector regression (Smola and Scholkop, 2004) or ALAMO (Cozad et al, 2014) is that the functional bases are a priori unknown: the algorithm searches for them, potentially allowing for finding bases that have physical meaning.
The talk will show the final model, as well as cover the overall development of the soft sensor, which includes preprocessing the collected data, definition of learning and validation data sets, and learning and validation of the models.
References
- Cozad, A., Sahinidis, N. V., and Miller, D.C., Learning surrogate models for simulation-based optimization. AIChE Journal. 60, 2211â2227. (2014)
- de Melo, V. V., and Banzhaf, W., Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid, Information Sciences, 430, 287-313 (2018)
- Ferreira, J., Torres, A. I., and Pedemonte, M., A Comparative Study on the Numerical Performance of Kaizen Programming and Genetic Programming for Symbolic Regression Problems, 2019 IEEE Latin American Conference on Computational Intelligence (LA-CCI), Guayaquil, Ecuador, 1-6, (2019)
- Poli, R., Langdon, W.B., McPhee, N.F., and Koza, J.R., A field guide to genetic programming. Lulu Press, (2008)
- Krige, D.G., A statistical approach to some mine valuations and allied problems at the Witwatersrand. Master's thesis of the University of Witwatersrand. (1951)
- Smola, A. and Scholkop, B., A tutorial on support vector regression, Statistics and Computing, 14, 199â222 (2004)