(199c) Comparing Several Methods for Product Properties Prediction | AIChE

(199c) Comparing Several Methods for Product Properties Prediction

Comparing several methods for product properties prediction

Jean Jérôme Da Costa1, Benoit Celse1*, Fabien Chainet1, Marion Lacoue Nègre1, Cyril Ruckebush2, Didier Espinat1,

  1. IFP Energies Nouvelles, Rond-Point de l’Echangeur de Solaize, 69360 Solaize, France
  2. LASIR CNRS, Université Lille Nord de France, Sciences et Technologies, Cité scientifique, bât C5, 59665 Villeneuve d’Ascq Cedex - France

*E-mail: benoit.celse@ifpen.fr


  • Comparison between linear regression models and interpolation methods on toy examples
  • Interpolation methods are much more efficient especially for non linear data
  • Comparison between kriging and spline interpolation
  • Kriging models can provide a posteriori analysis of the results
  • Application on cloud point and viscosity index prediction


In order to respect specification, petroleum products properties prediction is an increasingly need for refining industry. Machine learning is then more and more used to develop predictive models from process and physicochemical analytical data. The most classical models are based on classical (multilinear and nonlinear) regression referring to some equations which describe physicochemical phenomena or derived from empirical observations (Riazi 2005). However, regression models require to identify the adapted analytical expression for the regression function. That may be very complex in some situations.

Many chemometrics applications in modeling petroleum products properties were also proposed using partial least squares (PLS) regression (Braga, Jez Willian B., Junior, Araci Araújo Dos Santos et Martins 2014). Although these models provide relatively good performances, they use spectroscopic data such as NMR (nuclear magnetic resonance) or NIR (Near Infrared spectroscopy). This type of data are not currently available for refiners.

Some authors (Piloto-Rodríguez et al. 2013) proposed to use artificial neural networks (ANNs) for modeling complex properties. The results are good. However, the optimization of ANN requires to perform very stringent statistical steps to insure the model robustness and a lot of effort to prevent from over-fitting. This might be very tricky.

The aim of this work is to compare interpolation and linear regression methods on toy and real examples (could point, viscosity index …) in order to suggest the best methodology to use for product properties prediction.


Kriging is an interpolation method based on stochastic multiGaussian assumptions (Roustant, Ginsbourger et Deville 2012)). Its goal is to provide the best linear unbiased estimate (B.L.U.E.) and was first used in applied geostatistics for modeling properties related to natural resources. More recently, other applications of kriging were developed such that metamodeling, which is an increasingly need in computing sciences (DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization 2012). One major point of kriging is the ability to provide a measure of the prediction uncertainty that depends on the data configuration (Roustant, Ginsbourger et Deville 2012).

The aim of splines interpolation is to draw contour lines “as smooth as possible”, that is a map which looks like what a draftsman would obtain manually (Rooij, P. L. J. van et Schurer 1973). The splines consist of polynomials of degree m being local. The polynomials describe pieces of a line or surface. For degree m=1,2 or 3, a spline is called linear, quadratic or cubic respectively. This method is currently used in non-parametric statistical learning (Schumaker 2015).


Results on toy examples point out that interpolation methods have a high capacity of adaptability to linear or nonlinear situations without requiring previous analysis. Although, kriging and splines provide very close performance in some situations, kriging is more valuable since it provides a measure of uncertainties related to the predicted value based on stochastic assumptions. It is based on a specific distance between the new sample to predict and the calibration data base. This point is very important in modeling physicochemical properties of petroleum products. It is even particularly essential when the number of descriptors increases significantly (higher than 3).

Kriging and MLR models are developed and compared in real examples: cloud points and viscosity index. Although MLR provides good performances that are in accordance with the uncertainties of the standard measure, kriging enables to improve accuracy. In our knowledge, this is the first kriging model for the prediction of these two properties.

Globally, interpolation methods are well adapted to the modeling of petroleum products properties since they provide good performances when the number of samples is limited. Use of kriging methods to high dimensional study remains always challenging.


Braga, Jez Willian B.; Junior, Araci Araújo Dos Santos; Martins, Ingrid S. (2014) Determination of viscosity index in lubricant oils by infrared spectroscopy and PLSR. In : Fuel, vol. 120, p. 171–178. DOI: 10.1016/j.fuel.2013.12.017.

DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization (2012) : University of California, Los Angeles.

Piloto-Rodríguez, Ramón; Sánchez-Borroto, Yisel; Lapuerta, Magin; Goyos-Pérez, Leonardo; Verhelst, Sebastian (2013) Prediction of the cetane number of biodiesel using artificial neural networks and multiple linear regression. In : Global Conference on Renewable energy and Energy Efficiency for Desert Regions 2011 "GCREEDER 2011", vol. 65, n° Supplement C, p. 255–261. DOI: 10.1016/j.enconman.2012.07.023.

Riazi, M. R. (2005) Characterization and properties of petroleum fractions. 1st ed (Online-Ausg.). West Conshohocken, Pa : ASTM International (ASTM manual series MNL 50).

Rooij, P. L. J. van; Schurer, F. (1973) A bibliography on spline functions. Eindhoven : Technische Hogeschool (TH report / Technische Hogeschool Eindhoven Nederland, 73-WSK-01).

Roustant, Olivier; Ginsbourger, David; Deville, Yves (2012) DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization. In : Journal of statstical software, vol. 51, n° 1.

Schumaker, Larry L. (2015) Spline functions. Computational methods. Philadelphia : Society for Industrial and Applied Mathematics (Other titles in applied mathematics, 142).