(189ci) QSAR Modeling for Predicting Elimination Half-Life of Industrial Chemical Compounds

Papadaki, K., Aristotle University of Thessaloniki
Sarigiannis, D., Aristotle University of Thessaloniki
Karakitsios, S., Aristotle University of Thessaloniki
In recent years, there is an increasing interest for the development of Physiologically Based Toxicokinetic (PBTK) models, which provide quantitative descriptors of Absorption, Distribution, Metabolism and Excretion (ADME) of environmental or pharmaceutical chemicals. However, their application in toxicity testing and health risk assessment is limited due to the lack of input parameters required for their development. The proper parameterization of PBTK models is achieved using advanced Quantitative Structure-Activity Relationships (QSARs). QSARs are regression or classification models with many applications in toxicological sciences, aiming at modelling the biochemical interactions of chemical compounds with molecular targets and living tissues. A limited but very promising application of QSAR models, that are used to estimate the parameters of PBTK models, which provide descriptors of Absorption, Distribution, Metabolism and Excretion (ADME) of chemical compounds. The aim of this study is to develop QSARs for predicting the elimination half-life of industrial chemical compounds, as it is considered one of their major ADME properties. Moreover, elimination half-life is particularly important for PK models used in data-poor compounds, as well as for providing first-cut estimates of intake, starting from human biomonitoring data.

The methodological approach for the modeling of elimination half-life for environmental chemical compounds is presented. The first step of QSAR modeling was the preparation of input data, which included the existing experimental values of elimination half-life and the molecular descriptors of the corresponding chemical compounds. The input molecular descriptors were divided into two individual sets; the Linear Free Energy Relationship (LFER) and the PaDEL descriptors. The first set of descriptors included molar refractivity (E), solubility (S), acidity (A), basicity (B) and McGowan volume (V), which describe the LFER equation proposed by Abraham (1993). The second dataset consisted of 1444 1D and 2D descriptors, known as PaDEL descriptors. These descriptors are related to the molecular structure of the chemicals and are characterized as constitutional, topological, geometrical or electronic.

The dataset consisted of 199 industrial chemical compounds, which were randomly split to the training and the prediction set. The prereduction process was followed for the derived molecular descriptors in order to exclude the semi-constant and intercorrelated ones. Principal Component Analysis (PCA) was used for the distribution of chemical compounds. The score plot indicated that the molecules were clustered by structures, the loading plot showcased the most influential descriptors for the chemicals’ categorization, while the screen plot gave the most significant principal components for data analysis. A genetic algorithm was then used for the selection of the optimal set of descriptors for the models.

The datasets were analyzed using two statistical methods, Multiple Linear Regression (MLR) and Artificial Neural Networks (ANN). MLR was implemented using the QSARINS software, while the ANN technique was implemented using the Neural Network Toolbox in Matlab. The Applicability Domain (AD) of the developed models was determined using the AD Toolbox in Matlab, based on several approaches such as bounding box, convex hull, leverage, distance to centroid, k Nearest Neighbors (kNN) approach and Probability Density Function (PDF)-based methods. The fitting performance (R2) of the models using MLR and ANN was 0.70 and 0.87, respectively. The Leave One Out (LOO) technique (QLOO2) indicated that model performance in predictions was equal to 0.65 and 0.81, while the external validation value (Rext2) was found to be 0.74 and 0.91 for MLR and ANN, respectively. The Mean Squared Error (MSE) for the training set ranged from 0.26 to 0.62, while for the prediction set ranged from 0.25 to 0.53. AD analysis showed that there were no outliers, verifying the reliability of each of the developed QSAR models. Elimination half-life was found to be greatly associated with plasma protein binding, as it is well known that chemical compounds with high protein binding tend to have greater values of elimination half-life. The derived descriptors define electrostatic, polarizability- and dispersion- type interactions, which strongly influence the affinity of molecules to the binding sites of plasma proteins and consequently their elimination half-life. Regarding LFER and PaDEL descriptors, it was proved that the second ones gave by far the best performance

Both models produced satisfactory results, with ANN, being the one with the best overall performance. The proposed models were checked for their fitting capacity, their validity and applicability. They were found to be stable, reliable and capable to predict physicochemical parameters of “data poor” chemical compounds that fall within the applicability domain that includes compounds with highly diverse properties such as VOCs, PBDEs, PCDDs και PCDFs. In this way, animal testing and laboratory experimentation could be reduced and the wide use of PBTK and PK models could be reinforced. Moreover, the “safe by design” concept for environmental chemicals is supported, by allowing the successful prediction of toxicokinetic behaviour based on molecular parameters, promoting thus green chemistry and cost saving of product development.