(345u) A Machine Learning Approach for the Design of a Soft Sensor in an Oil Refinery's Distillation Column | AIChE

(345u) A Machine Learning Approach for the Design of a Soft Sensor in an Oil Refinery's Distillation Column

Authors 

Ferreira, J. - Presenter, Facultad De Ingeniería, Universidad De La Repúblic
Torres, A. I., Facultad De Ingeniería Udelar
Pedemonte, M., Facultad de Ingeniería, Universidad de la República
In this work we present the application of a novel Machine Learning method for the development of a soft sensor for monitoring the operation of a distillation column in an oil refinery (ANCAP, the state-owned refinery of Uruguay). The column is a C3/C4 splitter for which data for temperatures at the feed, distillate, bottoms, head of the column and reboiler, as well as the bottom pressure, the volumetric flow at the feed, distillate, bottoms and reflux are collected every 30s. On the other hand, distillate and bottoms compositions, which are used to calculate indicators to assess the performance of the column, can only be measured once a day in the laboratory. The idea of the soft sensor is to be able to estimate these performance indicators on a frequent basis, so that corrective actions can be taken without delay.

The soft sensor takes as inputs the data that is collected on a frequent basis and provides as outputs the molar fractions of C3 and C4 hydrocarbons. The model for the sensor is learnt from historic data using Kaizen programming (KP, Veloso et al., 2018) implemented in Python (Ferreira et al., 2019). KP is an iterative algorithm for solving symbolic regression problems of the type y=sumi (βifi), i=1, 2,..., p.

KP starts with the generation of an initial set of functional bases (fi), i.e., mathematical expressions made of combinations of basic simple functions (exp, log, sin, etc.) and mathematical operators (+, -, *, /). These functional bases are created as expression trees and modified during execution of the algorithm using genetic programming (GP) techniques (Poli et al., 2008). The number of maximum bases of the model (p) and the mutation and crossover probabilities are parameters of the algorithm. At each iteration, new bases are created based on the previous iteration, and the coefficients (βi) fitted by linear regression. Finally, some features are discarded based on the corresponding p-value or value of the coefficient. In the context of linear regression, the p-value is the statistically significant result in a hypothesis test, where the null hypothesis is that the correspondent coefficient is zero. The search of the model finishes with a maximum number of iterations or with some threshold of the error obtained from the linear regression. The main difference between this approach and previous contributions such as Kriging (or Gaussian Process, Krige, 1951), Support vector regression (Smola and Scholkop, 2004) or ALAMO (Cozad et al, 2014) is that the functional bases are a priori unknown: the algorithm searches for them, potentially allowing for finding bases that have physical meaning.

The talk will show the final model, as well as cover the overall development of the soft sensor, which includes preprocessing the collected data, definition of learning and validation data sets, and learning and validation of the models.

References

  1. Cozad, A., Sahinidis, N. V., and Miller, D.C., Learning surrogate models for simulation-based optimization. AIChE Journal. 60, 2211–2227. (2014)
  2. de Melo, V. V., and Banzhaf, W., Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid, Information Sciences, 430, 287-313 (2018)
  3. Ferreira, J., Torres, A. I., and Pedemonte, M., A Comparative Study on the Numerical Performance of Kaizen Programming and Genetic Programming for Symbolic Regression Problems, 2019 IEEE Latin American Conference on Computational Intelligence (LA-CCI), Guayaquil, Ecuador, 1-6, (2019)
  4. Poli, R., Langdon, W.B., McPhee, N.F., and Koza, J.R., A field guide to genetic programming. Lulu Press, (2008)
  5. Krige, D.G., A statistical approach to some mine valuations and allied problems at the Witwatersrand. Master's thesis of the University of Witwatersrand. (1951)
  6. Smola, A. and Scholkop, B., A tutorial on support vector regression, Statistics and Computing, 14, 199–222 (2004)