(526g) Machine-Learning Based Prediction of Membranolytic Peptides with Anticancer Activities
AIChE Annual Meeting
Wednesday, November 16, 2022 - 2:00pm to 2:18pm
Cancer is the second leading cause of death worldwide. Recently, membranolytic anticancer peptides (ACPs) have received considerable attention for their ability to target and kill cancer cells. Identification of ACPs is costly and usually time consuming. Therefore, development of efficient computational methods is of a great importance to aid in the identification of potential ACP candidates. In the current study, we developed support vector machine (SVM) models to predict membranolytic anticancer activity given a peptide sequence. Oscillations in physiochemical properties in protein sequences have been shown to be predictive of protein structure and function, and in this work, we are taking advantage of these known periodicities to predict if a peptide has ACP activity given the amino acid sequence. To this end, Fourier transforms were applied to the property factor vectors to measure the amplitude of the physiochemical oscillations, which served as the features for our SVM models. Peptides targeting breast and lung cancer cells were collected from the CancerPPD database and converted into physiochemical vectors using 10 property factors for the 20 natural amino acids. Also, we combined these datasets to investigate the model performance for discriminating active and inactive anticancer peptides regardless of the type of cancer they are targeting. Using the datasets, cross-validation has been applied to train and tune the models based on multiple training and testing sets with the accuracies around %75 and %80 for the models predicting peptides targeting breast and lung cancer cells, respectively. Also, for the model trained on the combined dataset this value was almost %78. The minimum number of features required to maintain accuracies is approximately 100 to 150 features for all models. However, given that our ultimate goal is the design of more active peptides, several works have been done to improve our model in separation active and inactive ACPs. We performed classification for multiple classes to more precisely predict specific activities of peptides. Instead of training a model with just two classes, active vs inactive ACPs, we developed 3 binary classification models which separately classify highly active ACPs versus moderately active ACPs, highly active ACPs versus inactive ACPs and moderately active versus inactive ACPs. The prediction results of the three individual classifiers combined together to predict whether a proposed peptide is inactive, moderately active or highly active, since we ultimately aim to design highly active peptides. Furthermore, to try to improve our prediction accuracy, we incorporated other sets of physiochemical features and properties of amino acids from the literature into our models and applied the cross-validation process again. To further validate our models, we also compared our predictions with models based on those features and properties.