(157ac) Machine Learning Guides Combinatorial Protein Library Design | AIChE

(157ac) Machine Learning Guides Combinatorial Protein Library Design

Authors 

Woldring, D. - Presenter, HHMI/Brandeis University
Mardikoraem, M., Michigan State University
Engineered protein ligands are powerful tools for therapeutic and diagnostics. The medical community would greatly benefit from increased availability of agents which can accurately detect and reliably treat aggressive diseases such as pancreatic and breast cancer. Traditionally, antibodies have played a dominant role in the development of drugs and imaging probes. However, small engineered proteins (e.g. affibody, Gp2, and monobody) have multiple advantages over antibodies including superior biodistribution properties and remarkably inexpensive production. Hence, it is critical to improve our understanding of structural and functional traits that enable the strong, selective interactions required for clinical utility. In this study, we use high-throughput selection of extremely large collections of proteins (>109 unique proteins) to generate novel binders to a diverse group of clinically relevant targets. In doing so, an important challenge that we face is deciding which protein variations to include in our initial collection of proteins. If we consider a small protein that is only 50 amino acids long, there would be over 1065 unique variations possible! In order to predict which variations or mutations will provide for strong binding, highly selective proteins, we make use of several resources including deep sequencing data sets obtained from our directed evolution experiments, structural data from x-ray crystallography and cryo-EM, computational docking simulations in Rosetta, Bayesian phylogenetic analysis, and mutational stability analysis. These data are then used to guide a machine learning framework to more accurately determine which protein mutations to experimentally test for optimal binding properties. Collectively, this workflow provides a more efficient approach for discovering the next generation of clinical tools for detecting and treating some of the world’s most deadly diseases.