(53b) Fully Automated in silico Enzyme Engineering with a Simulation and Machine Learning Feedback Loop | AIChE

(53b) Fully Automated in silico Enzyme Engineering with a Simulation and Machine Learning Feedback Loop


Burgin, T. - Presenter, University of Washington
Beck, D., University of Washington
Pfaendtner, J., University of Washington
Current state-of-the-art practices in enzyme engineering with machine learning involve making predictions based on either experimental results or unlabeled sequence data, and then testing those predictions with benchtop experiments. Although much work has been done in using molecular simulation to explain and motivate experiments, simulation currently plays little or no role in most machine learning-based enzyme engineering strategies owing to both perceived and real difficulty and expense associated with measuring key quantities of interest from simulations at scale. Our recent work has shown how one of the most important experimental observables, enzymatic activity, can be quickly and accurately predicted using cheap and automated atomistic simulations. Herein, we describe the integration of such simulations into a fully automated enzyme engineering pipeline that incorporates already-available labeled and unlabeled data together into a predictive neural network, generates candidate enzyme variants and tests them using simulations, and augments the neural network with the results in an active learning feedback loop in order to dynamically explore and characterize sequence space and provide experimentalists with high-quality variant candidates. We describe the application of this tool to a real enzyme engineering problem and show how predictions obtained by combining experimental and simulation data are more specific and of higher quality than possible with experimental data alone. Upon its completion, we will publish our tool as an open-source software package automating the process of obtaining high-quality variant candidates for any enzyme of interest.