(462g) Qsars for Predicting Physicochemical and Biochemical Properties of Industrial Chemicals Conference: AIChE Annual MeetingYear: 2015Proceeding: 2015 AIChE Annual MeetingGroup: Computational Molecular Science and Engineering ForumSession: Data Mining and Machine Learning in Molecular Sciences I Time: Wednesday, November 11, 2015 - 10:30am-10:45am Authors: Papadaki, K., Aristotle University of Thessaloniki Sarigiannis, D., Aristotle University of Thessaloniki Kontoroupis, P., Aristotle University of Thessaloniki Karakitsios, S., Aristotle University of Thessaloniki A current limitation for the introduction of Physiologically Based Bio Kinetic (PBBK) models in the risk assessment arena is the lack of the generic character of these models. In order to expand the applicability domain of PBBK models, model parameterization for data “poor” chemicals is developed using advanced Quantitative Structure-Activity Relationships (QSARs). QSARs are regression or classification models, which form a relationship between the biological effects and the chemistry of a chemical compound and comprise three parts: 1) the activity data to be modeled, 2) the data with which to model and 3) a method to formulate the model. A QSAR model is based on the potency where activity (Y) is a function of one or more descriptors (X) and is used particularly for the estimation of physicochemical properties, biological effects as well as understanding the physicochemical features governing a biological response. The biological effects are normally the property to be modeled, which is linked with the physical or structural chemistry of the molecules. The methodological approach presented in this study developed on the application of the Linear Free Energy Relationship (LFER), proposed by Abraham, via the incorporation of a large number of chemical compounds, in order to address the biological properties of the main human tissues. This involved the collection of the necessary input data, the statistical analysis and the model implementation to a large number of chemical compounds. The equation was analyzed using two statistical techniques; the Non Linear Regression (NLR) and the Artificial Neural Networks (ANN). Modeling results from the two methods were compared to corresponding literature data for environmental chemicals. The method of Non Linear Regression was used in order to estimate the parameters in the Abraham’s equation. The algorithms selected for fitting the nonlinear regression to the observations was the Least Squares (LS) coupled with the Levenberg-Marquardt algorithm. Artificial Neural Networks were used for developing a nonlinear model based on Linear Free Energy Relationship. Multi-Layer Perceptron (MLP) model was selected using the scaled conjugate gradient back-propagation algorithm in order to train the network. The multi-layer network consisted of a single input layer, including the values of the molecular descriptors, the experimental values of the physicochemical and the biochemical parameters; one hidden layer and an output layer of a log-sigmoid transfer function. It is noted that the scaled conjugate gradient back-propagation algorithm was selected on the basis of the good convergence rate on the limited sample size used. The most efficient of the examined methods was Linear Free Energy Relationship analyzed using Artificial Neural Networks. The data set of 33 and 29 industrial chemicals was used in order to predict the tissue/blood partition coefficients for seven main human tissues (heart, adipose, muscle, kidney, lung, liver, brain) and the constants of metabolism (maximal velocity of metabolism, normalized to the human bodyweight and Michaelis – Menten constant), respectively. The initial data sets were divided into the training (70%), the validation (15%) and the test (15%) set. The coefficient of determination, R2, was equal to 0.97, 0.81, 0.95 for kidney, heart, adipose/blood partition coefficient and 0.92, 0.91, 0.96, 0.81 for liver, muscle, brain and lung/blood partition coefficient, respectively. In addition, the square of correlation coefficient for maximal velocity of metabolism and Michaelis – Menten constant was equal to 0.99 and 0.82, respectively. The combination of Abraham’s equation and ANN method was used for the estimation of tissue/blood partition coefficients for the main human tissues for several chemical compounds with unknown values of partition coefficients in order to expand the chemical space of the application of the developed QSAR model. These compounds were categorized into chemical families, including hydrocarbons, aromatic and halogenated hydrocarbons, alcohols, ketones, ethers and esters, depending on their chemical structure. In order to validate the results obtained from the derived QSAR model, a simpler literature one was used, considering the fractional contents of lipids and water in tissues and blood. The Linear Free Energy Relationship combined with ANN, provides a generic model, which can estimate a satisfactory number of biological properties, including tissue/blood partition coefficients and metabolic constants. The correlation between the experimental and predicted values obtained using ANN is better than that using NLR. This is because ANN method has the ability to detect and take stock of the non-linear relationships between the independent and dependent variables of the developed model. When significant interactions exist between the input variables, the hidden layer of ANN is used in order to find and analyze them. This cannot easily be accomplished using the NLR method because appropriate transformation of the input or the output parameters is required in order to recognize the nonlinearities and improve model fitness. The relative importance of input descriptors to the predicted parameters was estimated via the ANN model. It was found that the most critical descriptor for the estimated tissue/blood partition coefficients, except for heart/blood partition coefficient, is the McGowan volume. The distribution of a chemical into a tissue depends on the rate of blood flow to the tissue, the tissue mass, and the partition characteristics between blood and tissue. A compound's distribution between tissues and blood at equilibrium is a function of the respective lipid and water fractions in each matrix (lipid content), which is a function of lipophilicity. Thus, it is a logical conclusion that one of the major determinants of partitioning between blood and tissue is the Mc Gowan volume, as it is a measure of lipophilicity. The polarizability was found to be the most important descriptor influencing the heart/blood partition coefficient. Regarding metabolic parameters, McGowan volume is also the most important parameter for Michaelis – Menten constant, while the most important one for maximal velocity of metabolism is by far the polarizability descriptor. Both Michaelis-Menten constant and maximal velocity of metabolism identify the rate of metabolism of a chemical, which is an enzyme-catalyzed reaction. These reactions are influenced by the ionization state of the substrate of metabolism or the enzyme binding site of the substrate. The maximal velocity illustrates the turnover number of an enzyme, which is the number of substrate molecules converted into product by an enzyme molecule in a unit time when the enzyme is fully saturated with substrate. So, it is directly related to ionization, as well as polarizability. In general, the results indicated that the molecular descriptors that were inputs of the QSAR model can be suitable for the estimation of the parameters, which characterize the physicochemical and biochemical phenomena. The improved performance of the proposed QSAR model, combining Linear Free Energy Relationshiop and ANN analysis, can be the result of the ability of LFER input parameters to describe the biological responses, as well as the capacity of ANN method to represent mathematically the complex metabolic and physicochemical phenomena. In general, the proposed QSAR model offers four major advantages: the descriptors of each chemical are easy to obtain, a satisfactory number of biological properties are predicted with the same relationship, it fills the data gaps of data “poor” chemicals, allowing the wide use of PBBK models and it supports the “safe by design” concept for industrial chemicals, by allowing the successful prediction of toxicokinetic behavior based on molecular parameters.