(575c) Machine Learning-Based Meta-Analysis of the Association between HLA-Peptide Binding Interactions and HLA-Linked Disease Susceptibility | AIChE

(575c) Machine Learning-Based Meta-Analysis of the Association between HLA-Peptide Binding Interactions and HLA-Linked Disease Susceptibility

Authors 

Song, H. - Presenter, Inha University
Kieslich, C., Auburn University
Islam, S., Auburn University
The human leukocyte antigen (HLA), located on chromosome 6, contains the most highly polymorphic family of genes and encodes the Major Histocompatibility Complex (MHC) molecules. Over a hundred diseases are associated with the HLA region of the human genome, including cancer, food allergy, infectious diseases, and autoimmune diseases. Peptide-binding characteristics of MHC molecules may influence susceptibility to the diseases. Thus, understanding the principles that govern the peptide-MHC (pMHC) interactions may potentially provide biological insight into the role of HLA alleles in immunopathology. Understanding the interaction between molecules and antigen-derived peptides is a key challenge in immunoinformatics since it is a crucial step in T-cell recognition and the triggering of adaptive immune responses. Computational tools have thus been developed to predict the binding of peptide antigens to MHC molecules with the ultimate aim of discovering T-cell epitopes in the field of vaccine design. These tools use a variety of methods, including machine learning algorithms, to predict peptide-MHC binding affinity. This work aims to investigate the potential associations between MHC-peptide interactions and susceptibility to HLA-associated diseases. The machine learning-based binding prediction models of HLA alleles are developed by training them on large experimental binding affinity datasets obtained from Immune Epitope Database (IEDB). For the encoding peptides, the BLOSUM matrix was used as a numeric representation of amino acids, and the input features were extracted by the Fourier transform which allows for the conversion of peptide sequences with varying lengths into the same number of features. A feature selection algorithm is applied to rank the features based on their contribution and used to derive a “fingerprint” characterizing the peptide-binding characteristics of each allele. The proposed allele-specific models are validated and analyzed with current state-of-art pHLA binding predictors, and HLA alleles were clustered based on their feature preference and analyzed with HLA allele-disease associations.