(284d) Data-Driven Approach for the Prediction of MHC Class II Epitopes Using Oscillations of Physicochemical Properties | AIChE

(284d) Data-Driven Approach for the Prediction of MHC Class II Epitopes Using Oscillations of Physicochemical Properties


Song, H. - Presenter, Inha University
Kieslich, C., Auburn University
Identifying T-cell epitopes in antigens is of great interest to improve understanding of cellular immunity and to aid the design of peptide-based vaccines, therapeutics and diagnostics. Traditional epitope identification relies on costly and time-consuming experimental techniques, thereby computational epitope prediction methods have emerged for in silico screening of peptides with the increasing wealth of gene and protein sequence data generated by high-throughput technologies.

Major histocompatibility complex (MHC) class II molecules, expressed on the surface of antigen-presenting cells (APC), display peptides to be recognized by CD4+ T-cells which would elicit various host immune responses. Thus, binding of peptides derived from protein antigen to the MHC molecules is a prerequisite for T-cell immunogenicity. One approach for the computational prediction of peptide-MHC binding is the data-driven machine learning method which involves predicting binding affinities given the sequences of the peptide and an MHC molecule. Numerous prediction tools have been developed for peptides-MHC class II binding, but it remains a challenging problem because of the polymorphic nature of MHC class II molecules and the variations in peptides length.

The presented work tests the performance of support vector machine (SVM) models of multiple allele-specific models combined with a previously proposed SVM based feature selection algorithm. The SVM models aim to classify MHC class II binding and non-binding peptides based on their amino acid sequences and derived features. In developing the SVM model, we take advantage of underlying periodicities in physicochemical properties along the sequence of a peptide that have been shown to be predictive features. Once the physicochemical descriptors are generated, Fourier transforms are then applied to be able to encode peptide sequences of varying lengths. In training and testing the model, a comprehensive dataset of MHC class II binding peptides was taken from IEDB database and cross validation and grid search are applied across multiple train and test datasets. A feature selection algorithm is also incorporated into the model development to identify an essential set of predictive features.