(591b) Wefold: A Collaborative Protein Structure Prediction Experiment

Khoury, G. A., Pennsylvania State University-University Park
Floudas, C. A., Princeton University
Smadbeck, J., Princeton University
Liwo, A., Cornell University
Krupa, P., Cornell University
Mozolewska, M., University of Gdansk
Wirecki, T., University of Gdansk
Baker, D., University of Washington
Scheraga, H. A., Cornell University
Skolnick, J., Georgia Institute of Technology
Keasar, C., Ben-Gurion University

The protein structure prediction problem continues to elude scientists. Even though many new methods have been introduced, certain classes of prediction targets such as free modeling targets remain a challenge based on blind predictions in the several previous Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments [1]. To meet this challenge, a large-scale collaborative effort called WeFold was undertaken by thirteen labs, each with their own specialties and approaches in addressing the problem.

In this talk, we will present the different methods or branches collaboratively designed and tested during the WeFold experiment, as well as their predictive ability, outcomes, and lessons learned. Independent branches involved in the collaborative effort yielded several high-ranking predictions among all group and method submissions in CASP10 for human, free modeling (template free), and refinement targets. Remarkably, two WeFold methods were able to produce the very best predictions in the refinement category in two different targets from a starting model for several different accuracy metrics. Contributions were made by junior and seasoned scientists alike in an open and accessible collaborative environment. The contributed methods to WeFold include ICOS [3] for contact prediction (Bacardit Lab); CONCORD [4] for secondary structure prediction, BeST for ab initio beta-sheet topology prediction [5], contact prediction [6, 7], ICON for traveling-salesman problem based clustering (Floudas Lab);  the online multiplayer game Foldit [8] and Rosetta [9] for sampling and selection (Baker Lab); UNRES [10-14] for sampling via Multiplexed Replica Exchange Molecular Dynamics (Liwo Lab and Scheraga Lab); GOAP [15] for knowledge-based scoring, TASSER [16] for sampling, SPICKER [17] for clustering (Skolnick Lab); KoBaMIN [18, 19] for refinement (Levitt Lab); APOLLO [20] for consensus-based quality assessment (Cheng Lab); Replica-Exchange Molecular Dynamics in GROMACS [21] for sampling (Univ. Sao Paulo, Brazil); and MESHI [22] for quality assessment (Keasar Lab).

Performance of the synergistic branches will be compared with the performance of the base methods that comprise them. In total, the collaboration used over 1.5 million CPU hours and processed and evaluated over 8 million candidate structure models (~100x the size of the protein data bank [23]) in a 3.5 month period. All discussions and data generated during the collaboration are publically accessible at http://www.wefold.org.


1.         Dill, K.A. and J.L. MacCallum, The Protein-Folding Problem, 50 Years On. Science, 2012. 338(6110): p. 1042-1046.

2.         Soding, J., A. Biegert, and A.N. Lupas, The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Research, 2005. 33(Web Server issue): p. W244-8.

3.         Bacardit, J., et al., Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features.Bioinformatics, 2012.

4.         Wei, Y., J. Thompson, and C.A. Floudas, CONCORD: a consensus method for protein secondary structure prediction via mixed integer linear optimization. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science, 2012. 468(2139): p. 831-850.

5.         Subramani, A. and C.A. Floudas, β-sheet Topology Prediction with High Precision and Recall for β and Mixed α/β Proteins. PLoS ONE, 2012. 7(3): p. e32461.

6.         Rajgaria, R., S.R. McAllister, and C.A. Floudas, Towards accurate residue–residue hydrophobic contact prediction for α helical proteins via integer linear optimization. Proteins: Structure, Function, and Bioinformatics, 2009. 74(4): p. 929-947.

7.         Rajgaria, R., Y. Wei, and C.A. Floudas, Contact prediction for beta and alpha-beta proteins using integer linear optimization and its impact on the first principles 3D structure prediction method ASTRO-FOLD. Proteins, 2010. 78(8): p. 1825-46.

8.         Cooper, S., et al., Predicting protein structures with a multiplayer online game. Nature, 2010. 466(7307): p. 756-760.

9.         Leaver-Fay, A., et al., ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods in enzymology, 2011. 487: p. 545-74.

10.       Czaplewski, C., et al., Application of Multiplexed Replica Exchange Molecular Dynamics to the UNRES Force Field: Tests with α and α+β Proteins. Journal of Chemical Theory and Computation, 2009. 5(3): p. 627-640.

11.       He, Y., et al., Exploring the parameter space of the coarse-grained UNRES force field by random search: Selecting a transferable medium-resolution force field. Journal of Computational Chemistry, 2009. 30(13): p. 2127-2135.

12.       Liwo, A., et al., Simulation of protein structure and dynamics with the coarse-grained UNRES force field. Coarse-Graining of Condensed Phase and Biomolecular Systems, 2008. 1: p. 1391-1411.

13.       Liwo, A., et al., Modification and Optimization of the United-Residue (UNRES) Potential Energy Function for Canonical Simulations. I. Temperature Dependence of the Effective Energy Function and Tests of the Optimization Method with Single Training Proteins. The Journal of Physical Chemistry B, 2006. 111(1): p. 260-285.

14.       Liwo, A., et al., Implementation of Molecular Dynamics and Its Extensions with the Coarse-Grained UNRES Force Field on Massively Parallel Systems: Toward Millisecond-Scale Simulations of Protein Structure, Dynamics, and Thermodynamics. Journal of Chemical Theory and Computation, 2010. 6(3): p. 890-909.

15.       Zhou, H. and J. Skolnick, GOAP: A Generalized Orientation-Dependent, All-Atom Statistical Potential for Protein Structure Prediction. Biophysical Journal, 2011. 101(8): p. 2043-2052.

16.       Zhang, Y. and J. Skolnick, Automated structure prediction of weakly homologous proteins on a genomic scale. Proceedings of the National Academy of Sciences of the United States of America, 2004. 101(20): p. 7594-7599.

17.       Zhang, Y. and J. Skolnick, SPICKER: A clustering approach to identify near-native protein folds. Journal of Computational Chemistry, 2004. 25(6): p. 865-871.

18.       Rodrigues, J.P.G.L.M., M. Levitt, and G. Chopra, KoBaMIN: a knowledge-based minimization web server for protein structure refinement. Nucleic Acids Research, 2012. 40(W1): p. W323-W328.

19.       Chopra, G., N. Kalisman, and M. Levitt, Consistent refinement of submitted models at CASP using a knowledge-based potential. Proteins: Structure, Function, and Bioinformatics, 2010. 78(12): p. 2668-2678.

20.       Wang, Z., J. Eickholt, and J. Cheng, APOLLO: A Quality Assessment Service for Single and Multiple Protein Models. Bioinformatics, 2011. 27: p. 1715-1716.

21.       Hess, B., et al., GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. Journal of Chemical Theory and Computation, 2008. 4(3): p. 435-447.

22.       Kalisman, N., et al., MESHI: a new library of Java classes for molecular modeling. Bioinformatics, 2005. 21(20): p. 3931-3932.

23.       Berman, H.M., et al., The Protein Data Bank. Nucleic Acids Res, 2000. 28(1): p. 235-42.