Precision Engineering of Biomolecular Function with Massively Multiplexed Genotype-Phenotype Measurements and Machine Learning | AIChE

Precision Engineering of Biomolecular Function with Massively Multiplexed Genotype-Phenotype Measurements and Machine Learning

Authors 

Ross, D. - Presenter, National Institute of Standards and Technology
Tack, D. S., University of Texas at Austin
Romantseva, E. F., Material Measurements Laboratory, National Institute of Standards and Technology
Olson, N. D., NIST
Pressman, A., NIST
Levy, S. F., SLAC National Accelerator Laboratory
For engineering biology to become a mature engineering discipline, there is a need for methods to engineer complex biological functions with quantitative, made-to-specification precision. In this presentation, we’ll describe results for a new approach to biomolecular engineering: Inspired by directed evolution, we start with a large library of biomolecular variants. But, instead of picking a few “winners” from a laboratory selection, we measure the genotype and corresponding phenotype for every biomolecule in the library across several different chemical environments. This “measure everything” approach enables precise biomolecular engineering in two ways: First, we can identify genotypes from the library that satisfy quantitatively targeted phenotypic specifications without the limitations of a laboratory selection. Second, the large datasets, together with interpretable machine learning models, can reveal systematic sequence-structure-function design rules that provide a predictive understanding of the targeted biomolecular function.

In our first demonstration of this approach, we created a library of nearly 100,000 variants of the LacI sensor in E. coli. We used laboratory automation and a combination of long- and short-read sequencing to measure the full dose-response curves and corresponding DNA sequences for every variant. With the resulting data, we identified LacI genotypes with precisely targeted dose-response. For example, we engineered sensors with sensitivities (i.e. EC50) spanning 3 orders of magnitude with a 1.25-fold accuracy. In addition, we used the data to train interpretable machine learning models that provide an intuitive route to engineer new LacI variants with quantitatively predictable dose-response.

Remarkably, we also found many LacI variants with phenotypes that differ qualitatively from the wild-type, including inverted dose-response variants and never-before-seen band-stop (on-off-on) variants. These qualitative phenotypic changes are particularly interesting because they can provide specific insight into the biophysics of sensor proteins and because they highlight the capability for large-scale genotype-phenotype measurements to discover rare and useful biomolecule variants.