(305c) RGN2- Single-Sequence Protein Structure Prediction with Applications in Protein Design and Novel Biomaterials | AIChE

(305c) RGN2- Single-Sequence Protein Structure Prediction with Applications in Protein Design and Novel Biomaterials

Authors 

Chowdhury, R. - Presenter, Harvard Medical School

Recent advances in protein modeling (e.g., AlphaFold2) have made it possible to predict protein structures with high fidelity from alignments of homologous protein sequences by using significant computational resources. While groundbreaking, three outstanding challenges remain unaddressed by these systems: (i) prediction of structure from individual sequences, necessary for orphan proteins, de novo design, rapidly evolving proteins, and modeling genetic variation, (ii) fast prediction, necessary for protein design and whole-proteome analyses, and (iii) scientific understanding of the sequence-to-structure relationships that underpin protein folding. Here we report RGN2, an end-to-end differentiable system for predicting protein structure from single protein sequences. RGN2 maps protein sequences to latent representations learned by a self-supervised sequence modeling task, then uses these learned representations to predict protein structure in a differentiable manner. To improve accuracy, we augment RGN2 with physics-based refinement at the cost of additional computation. Without needing to derive protein sequence alignments, RGN2 provides a million-fold gain in prediction speed over the publicly-available trRosetta system when no physics-based refinement is used and an 30-fold gain with refinement. We assessed RGN2 accuracy by predicting structures of proteins with no homologous sequences available—196 natural and 35 de novo designed proteins—and observe that RGN2 outperforms trRosetta in both instances despite trRosetta having been used to design the de novo proteins. To our knowledge this represents the first end-to-end differentiable system for predicting protein structure from individual sequences, devoid of any explicit form of evolutionary information, and provides an alternate route to accurate and fast protein structure prediction.

Checkout

This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.

Checkout

Do you already own this?

Pricing

Individuals

AIChE Pro Members $150.00
AIChE Emeritus Members $105.00
AIChE Graduate Student Members Free
AIChE Undergraduate Student Members Free
AIChE Explorer Members $225.00
Non-Members $225.00