(269b) Qsrr Application to Two-Dimensional Gas Chromatography | AIChE

(269b) Qsrr Application to Two-Dimensional Gas Chromatography

Authors 

Kamma, K. - Presenter, The University of Tokyo
Kaneko, H., The University of Tokyo
Funatsu, K., The University of Tokyo



QSRR application to two-dimensional gas chromatography

1. Introduction

Gas chromatography is a widely used identification or elucidation method. In gas chromatography, a target sample is carried through a column filled with stationary phase, then retention time of compounds in the sample is observed. The structure of each compound in the sample can be identified based on its retention time, if the retention time of candidate compounds which are considered to be included in the sample is known from past observation data or from advance measurements. However, in this way, it is impossible to identify the structures of the compounds whose retention time is unknown.

Hence quantitative structure-retention relationship(QSRR)[1] is proposed to estimate the chromatographic retention time of any compounds. QSRR is a regression method of modeling the chemical structures and the retention time. First in structure descriptor calculation, chemical structures are converted into values which can be dealt with mathematically. Then a regression model is constructed between the retention time as an objective variable and the molecular structure descriptors as explanatory variables. In prediction, a new chemical structure is converted into structure descriptors and the constructed model predicts its retention time from the structure descriptors. A. Morsali et al. suggested QSRR models for BTEX and substituted benzenes based on chemical parameters.[2] Y. Zhang proposed QSRR models for polybrominated diphenyl ethers.[3]

Though QSRR is a useful method for the structure elucidation, large errors of the predicted retention time can be one of the crucial problems. Therefore it is important to construct a highly accurate model. But for the accurate models, prediction errors still remain to some degree. To deal with this problem, QSRR for two-dimensional chromatography[4] is considered useful. In two-dimensional chromatography, two different types of columns connected in line are used for retention time observation. The total retention time through the two columns is observed at the exit of second column, then retention time in each column is obtained. Constructing QSRR models for the retention time in each column enables the structure elucidation based on two predicted retention time. If one of the predicted retention time has large errors, the structure elucidation is still possible based on another predicted retention time.

The objective in this research is to improve the structure elucidation ability of two-dimensional gas chromatography by constructing QSRR models with high accuracy. We applied the partial least squares (PLS) method, the genetic algorithm-based PLS (GAPLS) method, ensemble PLS (EPLS) method to the data observed by two-dimensional gas chromatography. The prediction accuracy of the PLS, GAPLS and EPLS models is compared.

2. Method

2.1. PLS

PLS is a linear regression method. PLS extracts components which have strong correlation with objective variables from explanatory variables. Then the relationship between extracted components and objective variables is modeled. PLS can construct a robust model with high accuracy.

2.2. GAPLS

GA is an optimization method which imitates the evolution process. GAPLS is a variable selection method combining GA and PLS. GAPLS can efficiently optimize the combination of explanatory variables so that the prediction accuracy of the model improves.

2.3. EPLS

In EPLS, some variables are selected randomly from explanatory variables and many submodels are constructed between the selected variables and objective variables with the PLS method. Each submodel predicts values of objective variables, and mean or median of the predicted values are the final predicted values. The EPLS method can decrease the affection of prediction errors and increase the prediction accuracy of the model.

3. Result

3.1. Data

The set of retention time of 155 compounds was observed by two-dimensional gas chromatography. A non-polar column and a mid-polar one were used as the first column and the second column, respectively. The retention time in the first and second columns was set to be lower than 60 minutes and 8 seconds, respectively.

A software Dragon6[5] was used for structure descriptor calculation. The descriptors which failed to calculate were deleted. The descriptors in which more than 50% samples had the same value were deleted. Correlation coefficients between each descriptor are calculated, and one of the two descriptors whose correlation coefficient was over 0.9 was deleted. In the end, 164 descriptors were left.

100 compounds selected randomly were used as the training data, and the left 55 compounds were used as the test data.

3.2. QSRR Model Construction

The PLS, GAPLS and EPLS models were constructed with the training data and the models predicted the retention time of the test data. Table 1 shows the results. The retention time in the first and second column is represented as Y1 and Y2, respectively. R2 is the evaluation criterion of the accuracy of the models. Q2 is the cross-validated R2 calculated by 5-fold cross-validation and R2pred is R2 for the test data. When these values are close to 1, the model is accurate in prediction.

In prediction on Y1, R2, Q2 and R2pred values of PLS model were 0.984, 0.941 and 0.957, respectively. This result means that the prediction accuracy of PLS model for Y1 is high. The GAPLS model and the EPLS model also had high prediction accuracy for Y1.

In prediction on Y2, R2, Q2 and R2pred values of PLS model were 0.925, 0.743 and 0.676, respectively. These values mean that the prediction accuracy of PLS model is not very high. It is considered that some unnecessary variables in the explanatory variables decreased the prediction accuracy. For the GAPLS model, R2, Q2 and R2pred values were 0.929, 0.882 and 0.754, respectively. R2pred value of the EPLS model was 0.740, which was close to that of the GAPLS model. It can be said that the GAPLS and EPLS methods obviously improved the prediction accuracy.

In any models, the prediction accuracy for Y2 was lower than that for Y1. As a reason for that, it is considered that the observation errors of Y2 decreased the prediction accuracy. Because the second column was short so that the measurements took less than 8 seconds, observation errors of Y2 could be large.

Table 1. Modeling and prediction result

Y1

Y2

R2

Q2

R2pred

R2

Q2

R2pred

PLS

0.984

0.941

0.957

0.925

0.743

0.676

GAPLS

0.990

0.983

0.966

0.929

0.882

0.754

EPLS

-

-

0.953

-

-

0.740

4. Conclusion

We aimed to construct highly accurate QSRR models for two-dimensional gas chromatography by using the PLS, GAPLS and EPLS methods. As a result, we could construct the Y1 and Y2 prediction models having high performance although the prediction accuracy of Y2 was lower than that of Y1.

In this research, we assumed that the candidate compounds included in the target sample are already presumed. But mostly, it is difficult to presume the candidate compounds in advance. In the case that candidate compounds are totally unknown, inverse analysis of QSRR models can be applied to predict candidate chemical structure based on retention time and the constructed QSRR models. It is expected that inverse analysis achieves more useful structure elucidation.

5. Acknowledgement

The authors acknowledge the support of Asahi Kasei Corp. for offering the data of two-dimensional gas chromatography.

6. References

[1]    R. Kaliszan. Structure and Retention in Chromatography A Chemometrics Approach. CRC Press. May 22, 1997

[2]    A. Morsali, S. A. Beyramabadi, M. R. Bozorgmehr, M. Raanaee, B. Keyvani and G. R. Jafari. Prediction of gas chromatography retention of BTEX and other substituted benzenes based on quantum chemical parameters. Scientific Research and Essays Vol. 5 (3), pp. 349-351, 4 February 2010

[3]    Ya-Hui Zhang, Shu-Shen Liu, Hong-Yan Liu. Predicting the Gas Chromatographic Relative Retention Time of Polybrominated Diphenyl Ethers by MEDV-13 Descriptors. Chromatographia Vol. 65, Issue 5-6, pp 319-324, March 2007

[4]    T. Hyötyläinen. Comprehensive Two-Dimensional Chromatography. SPRINGER, BERLIN. February 2013

[5]    Talete srl. http://www.talete.mi.it/ 13 May 2013

Topics 

Checkout

This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.

Checkout

Do you already own this?

Pricing

Individuals

AIChE Pro Members $150.00
AIChE Graduate Student Members Free
AIChE Undergraduate Student Members Free
AIChE Explorer Members $225.00
Non-Members $225.00