(7a) A First Principles Based Structure Prediction Algorithm for Beta and Mixed Alpha/Beta Proteins

Subramani, A., Princeton University
Wei, Y., Princeton University
Floudas, C. A., Princeton University

We present the latest developments in the prediction of pure beta and mixed alpha/beta proteins using our first principles based structure prediction framework ASTROFOLD 2.0. The framework for the approach has been enhanced to include improvements in secondary structure prediction, beta sheet topology prediction, residue-to-residue contact prediction, tertiary structure prediction and near-native structure identification [1].

For the prediction of secondary structure, two sets of algorithms have been developed which can be employed depending on the sequence similarity of a given protein with the PDB. Firstly, a consensus server prediction algorithm, CONCORD, has been developed, which combines the predictions of 7 well known secondary structure prediction servers using an integer linear optimization model, in order to enhance the prediction accuracy [2]. Alternately, single sequence formulations have been developed, which predict the secondary structure of the protein without the use of profile information. For pure beta and mixed alpha/beta proteins, we use a novel integer optimization prediction based algorithm to derive the most likely beta sheet topologies. These topologies are re-ranked using linear programming formulations to predict the best possible topology for a given predicted secondary structure [3]. Contacts between alpha helices in a mixed alpha/beta protein are then predicted using an integer optimization based algorithm, wherein the contacts between beta strands are fixed to the predictions made at the previous stage [4]. A novel flexible-stem loop structure prediction algorithm is employed to derive tight bounds for the backbone dihedral angles for all loop residues [5]. Separately, ASTRO-FOLD allows for bypassing secondary structure prediction steps by building the secondary structure from 3D-Jury results, and using fixed-stem loop structure prediction algorithms to refine the structure of the loop regions. All of these constraints are used to predict the three dimensional structure of the protein, using a combination of deterministic global optimization, stochastic conformational space annealing and torsion angle dynamics. At each stage of the deterministic and stochastic algorithms, fast side chain optimization steps are introduced to alleviate clashes between side chains and the protein backbone, thus providing better starting points for the optimization algorithms [1]. The final set of predicted structures is clustered using a novel traveling salesman problem based clustering algorithm, ICON [6].



[1] Subramani A, Wei Y and Floudas CA (2011) ASTRO-FOLD 2.0: An enhanced framework for protein structure prediction, submitted.

[2] Wei Y and Floudas CA (2011) CONCORD: A consensus method for protein secondary structure prediction, submitted

[3] Subramani A and Floudas CA (2011), in preparation.

[4] Rajgaria R, Wei Y and Floudas CA (2009) Contact prediction for beta and alpha/beta proteins using integer linear optimization and its impact on the first principles 3d structure prediction method ASTRO-FOLD, Proteins, 78, 1825-1846.

[5] Subramani A and Floudas CA (2011), in preparation

[6] Subramani A, DiMaggio PA and Floudas CA (2009) Selecting high quality protein structures from diverse conformational ensembles, Biophysical J, 97, 1728-1736