(608f) Machine-Learning Guided Mutagenesis for Directed Evolution of Recombinant Proteins | AIChE

(608f) Machine-Learning Guided Mutagenesis for Directed Evolution of Recombinant Proteins

Authors 

Umetsu, M. - Presenter, Tohoku University
Nakazawa, H., Tohoku University
Saito, Y., Advanced Industrial Science and Technology
Oikawa, M., Tohoku University
Kameda, T., Advanced Industrial Science and Technology
Tsuda, K., RIKEN
Molecular evolution based on mutagenesis is widely used in protein engineering, where critical amino acid residues of a target protein are identified based on available structural information and mutated for function alteration and maturation. In iterative saturation mutagenesis (ISM), one of the principal molecular evolution methods, mutagenesis proceeds in a step-wise manner: however, ISM does not always lead to the optimal sequence, because the effects of mutations on function are often synergistic or antagonistic. On the other hand, the library approach mutates all critical residues simultaneously via evolution operations and allows us to discover optimal sequences under synergistic or antagonistic coupling. Recent advances in genetic engineering allowed us to prepare an extremely large library, beyond the limit of organic synthesis. Such a large library, however, leads to high costs in screening experiments. The success of protein engineering crucially depends on preparing a small library with high enrichment of functional proteins.

Here, we propose a novel approach that combines molecular evolution with machine learning. In this approach, we iterated mutagenesis for which next to the last library of protein variants is used to train a machine-learning model to guide mutagenesis. This enables to prepare a small library suited for screening experiments with high enrichment of functional proteins. A first library of variants are generated, and the sequence and functional data acquired from the variants in the library were used for training a machine-learning model to create the second-round library. The library containing the positive candidate variants predicted by machine-learning are analyzed, and the data are used for training a machine-learning model again. We show the potential of our approach as a powerful platform for accelerated discovery of functional proteins.