(543g) Premexotac: Bitterants in-Silico Screening Using Machine Learning for Advanced Pharmaceutical Development | AIChE

(543g) Premexotac: Bitterants in-Silico Screening Using Machine Learning for Advanced Pharmaceutical Development

Authors 

Salar-Behzadi, S., Research Center Pharmaceutical Engineering Gmbh
Bitter, along with sweet, sour, salty and umami, is one of the 5 basic taste modalities. The receptors responsible for the detection of bitter substances are the so called TAS2Rs, which are part of the family of the G-protein coupled receptors. Several TAS2Rs expressing tissues have been identified along the human body (i.e. gastrointestinal tract, airways, brain, testis, among others). Pharmaceutical interest has grown on the latter topic, due to the potential therapeutic applications by stimulating extra-cellular bitter receptors. These applications could be found on: Diabetes type II, neuroblastoma, breast cancer, immunological diseases, overactive bladder, airways irritation, chronic inflammation and gut self-protection against toxic compounds, metabolic disease and obesity. Also, taste masking comes as an area of interest for bitterness assessment. Evaluation of the API bitterness is directly related to the taste sensation and applied for the development of taste masking strategies, in order to improve patients’ adherence to medication. The already established methods for taste assessment in-vivo come along with ethic, economic and time challenges. In-vitro methods are often economically and time expensive. To address these challenges, in-silico tools are widely spread due to their desirable characteristics. Machine learning methods are practical and rapid for screening bioactivities in different API available in the market, as well as novel compounds, reducing significantly monetary and time expenditures.

An in silico bitterness predictor was constructed, following the procedure on figure 1. A database of the desired bioactivity (bitter and non-bitter) compounds was collected from the literature. The database was preprocessed, eliminating salts and disconnect structures, keeping the largest fragment. The final size of the database was 932 bitterants and 1908 presumed non-bitter compounds. For feature extraction, two datasets of molecular descriptors were calculated. The first dataset consisted on Extended Connectivity Fingerprints (ECFP) with size 1024 bit. The second set was a collection of 22 physicochemical and topological descriptors. Mutual information (MI) was applied as method for feature selection. For the models training, Support Vector Machine (SVM), k Nearest Neighbors (kNN), Random Forest (RF) and Adaptive Boosting (AdaBoost) were the selected algorithms. For the external validation, the set of 56 compounds UNIMI was evaluated. The evaluated metrics were the specificity, F-1 score and the recall.

From the feature selection, it was found that the Wiener Index (WPath), Molecular Weight (MW) the ABC-index, Crippen-Wildman Molar Refractivity (SMR) and the Graovac-Ghorbani ABC index (ABCGG) were the top 5 descriptors, according their MI score. The latter descriptors provide key information for the classification of bitter compounds. Regarding the ECFP, the top 10 substructures with the highest MI score were identified, as key descriptors for bitterness prediction on table 1. The latter is an update on a previous work done by (Rodgers et al., 2006).

The performance of the best models was compared with the predictors available in the literature, using the reported metrics on the UNIMI set on table 2. The difference between the top performer and PREMEXOTAC was 0.08 on the F-1 score. All the models compared used different sets of descriptors, data pre-processing and modelling. From the comparison with the models available in the literature, it was found that with the actual methods and access to confirmed experimentally bitterants/non-bitterants, a plateau in performance has been reached. Novel approaches for feature extraction and model training are constantly being developed. With this, would be possible in the future to create models able to surpass significantly this plateau. Data size is also a very important factor for performance improvement. Nevertheless, for bitter classification, a significant increase in the database would be time and financially expensive. Thus, a significant improvement in performance due thereof would not be achieved in the short-term future. Nevertheless, the actual models have very good performance and would provide significant reduction in costs for further in-vivo/in-vitro validations. Also, machine learning algorithms are ideal for pattern recognition. Therefore, the key information provided by the MI feature selection method provides significant insights into key physicochemical and topological characteristics of bitter compounds.

Acknowledgment: HERMES-Johannes-Burges-Stiftung is funding this project.