(119g) Computational Analysis of the Protein Sequence Space: A New Method to Identify Beneficial Mutations

Bansal, P., Georgia Institute of Technology
Hall, M., Georgia Institute of Technology
Realff, M., Georgia Institute of Technology
Lee, J. H., Korea Advanced Institute of Science and Technology (KAIST)
Bommarius, A. S., Georgia Institute of Technology

Enzyme engineering performed to achieve higher levels of specific enzyme characteristics ? stereo-selectivity, thermostability, or catalytic activity, requires changing protein sequences via mutations. The first and second waves of protein engineering, rational design and directed evolution respectively, have proven to be successful in the last ten to fifteen years. However, in the absence of a high-throughput assay (required for directed evolution) and extensive knowledge of the role of specific protein residues (required for rational protein design), the mutations need to be picked wisely, guided by some other methodology to avoid the generation of oversized libraries of mutants. In this work, we present a new method to identify beneficial mutations in a protein, through sequence analysis of the library made of the members from this protein family. This method differs from the consensus method. Mathematically, the steps consist of identification of structures in the protein library's sequence space with multi-variate statistical analysis, and then choosing the variants that satisfy those structures. The differences in the residues of these variants and the analyzed library are then identified as the target mutations. Some of the questions that need to be answered when picking mutations are: 1. What positions need to be chosen? 2. What should the picked residues be mutated to? 3. How likely is it that the suggested mutations lead to a dead mutant? 4. Are there any covariance/co-evolution patterns among the residues? To this end, our method was tested on proteins whose function-sequence landscape is well known. A major fraction of the predicted mutations were seen to be related to increased activities, which shows that this method can be very useful in aiding protein design.