(582dc) A Novel Protein Structure Refinement Method: Blind Predictions During CASP10, and An Optimized Protocol to Maximize Performance

Authors: 
Khoury, G. A., Pennsylvania State University-University Park
Tamamis, P., Texas A&M University
Pinnaduwage, N., Princeton University
Smadbeck, J., Princeton University
Kieslich, C. A., Texas A&M University
Floudas, C. A., Princeton University



The protein structure prediction problem remains unsolved. Stated succinctly, given a primary amino acid sequence, determine accurately the full three-dimensional structure of the protein. Protein structure refinement addresses a different problem; given a three-dimensional structure, perform a set of operations that will consistently improve the accuracy of that structure towards its experimentally defined structure, and improve its structure quality removing unphysical features such as clashes.

The refinement problem is plagued by many challenges, with most methods consistently degrading the structure, rather than improving it, as demonstrated by the blind predictions during the two most recent Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments [1, 2]. This is a result of many generated models being significantly similar to one another and the inability of current forcefields to distinguish between such better or worse models. This may not be surprising given predicted structures have already been optimized to a conformation that is in a deep local minima, and the probability of making a movement that will improve the structure is much smaller than one’s chance of degrading it.

In this presentation, we introduce a novel optimized framework that consistently refines structures, both in terms of improved structure accuracy (GDT_TS and RMSD) and quality (# of clashes and Ramachandran violations). The initial version of our refinement method was ranked 5th place in the international blind structure prediction experiment CASP10 among all groups. The method employed biological filters based on the high correlation observed between sequence length and the number of hydrogen bonds, as well as SASA as a function of a protein’s molecular weight. The method achieved the peak-performance for CASP target TR722 indicating the maximum refinement in GDT_TS among all groups and methods. During post-CASP assessment, the method was automated and optimized to include a support vector machine classification model to select refined vs. degraded structures, as well as enrich the probability of selecting a refined structure from a set of decoys, and enhanced with a molecular dynamics refinement step. Each step is tuned to maximize its performance and the probability of refining the structure. The optimized method that will be presented has been benchmarked on CASP10, 9, 8, and 7 refinement targets achieving between a 66% to 80% refinement success rate in terms of GDT_TS in 1 model. An online webtool has been made to disseminate the method used during CASP10 to the broader community and is freely available for academic use at http://atlas.princeton.edu/refinement. The optimized method will be available online for use in the near future.

 

References

1.         Center, P.S.P. 10th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction 2013  [cited 2013 4/15/2013]; Available from: www.predictioncenter.org/casp10/.

2.         MacCallum, J.L., et al., Assessment of protein structure refinement in CASP9. Proteins: Structure, Function, and Bioinformatics, 2011. 79(S10): p. 74-90.