(526h) A Novel Approach for Protein Structure Prediction | AIChE

(526h) A Novel Approach for Protein Structure Prediction


Kieslich, C. A. - Presenter, Texas A&M University
Smadbeck, J., Princeton University
Khoury, G., Princeton University
Tamamis, P., Texas A&M University
Uralcan, B., Princeton University
Floudas, C. A., Princeton University

Protein structure prediction remains a very challenging problem in computational biology, especially for instances when structural templates are not easily identified through sequence alignments alone [1]. Despite the existence of several ab initio methods for protein structure prediction, the top performing methods in the free modeling category at CASP10 utilized template structures. We present a novel platform for protein structure prediction that utilizes a consensus across multiple low-homology structural templates in the identification of structural templates and as the starting point for grey-box global optimization.

The approach is hierarchical in nature, represents an update to the framework previously developed by our group [2-5], and includes improved methods for the prediction of secondary structure [3], protein domain definitions, Cβ-tertiary contacts [4], beta-sheet topology [5], tertiary structure prediction, and refinement [6]. Secondary structure prediction is based on an SVM model that takes as features the predictions of 4 published methods. The SS prediction model consists of 3 one-vs-all binary classifiers that have been combined to maximize the prediction of helices and strands. The β-sheet topology method has three components: (i) SVM model for β-contact prediction; (ii) MILP model for strand pair alignment; and (iii) MILP model for β-sheet topology. SVM predicted β-contact probabilities are used as input for the strand pair MILP model, which produces a rank-ordered list of optimal strand pair alignments, for each possible pair of β-strands. A rank-ordered list of optimal β-sheet topologies is ultimately generated based on the optimal strand pair alignments. The prediction of tertiary Cβ-contacts is based on the Delaunay triangulation of the Cβ coordinates of structural templates, and a consensus score based on the template Z-scores is used to rank observed contacts.

The presented approach contains 2 pipelines for template-based tertiary structure prediction, consensus-based template identification and biclustering-based template identification, as well as, ab initio structure prediction using grey-box global optimization using ARGONAUT. The consensus-based template identification uses consistency with predicted secondary structure, consistency with predicted beta topology, and consistency with predicted Cβ contacts to re-rank and select structural templates. Biclustering-based template identification utilizes clustering of templates (columns) according to extracted pairwise distances (rows) in combination with manual clustering of these distances using an alignment confidence derived from the position-specific scoring matrix. The biclustering helps identify consistency in extracted template distances while allowing structural variability between regions with less confident alignments. The final row and column clusters represent sets of similar residue-residue distances with consistency across several templates, and is used to identify a single template which best fits these consensus distances. For ab initiostructure prediction, initial models are first generated based on several subsets of the predicted Cβ and β-sheet contacts. Subsequently, ARGONAUT is used to fit and optimize surrogate models to minimize a structure-based objective function according the pairwise-distances of predicted residue contacts. Solutions of the surrogate models are added to the set of predicted structures, and used in the next iterations of model fitting and optimization.

The final step of the approach is the refinement of the generated structural models using Princeton_TIGRESS, as well as a novel molecular dynamics-based refinement method. We present results for the benchmarking of the individual methods, as well as our overall results from the CASP11 competition.

  1. Khoury, G. A.; Smadbeck, J.; Kieslich, C. A.; Floudas, C. A. Protein folding and de novo protein design for biotechnological applications. Trends in Biotechnology 2014, 32 (2), 99-109.
  2. Subramani, A.; Wei, Y.; Floudas, C. A. ASTRO-FOLD 2.0: An enhanced framework for protein structure prediction. AIChE Journal 2012, 58 (5), 1619-1637.
  3. Wei, Y.; Thompson, J.; Floudas, C. A. CONCORD: a consensus method for protein secondary structure prediction via mixed integer linear optimization. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science 2012, 468 (2139), 831-850.
  4. Rajgaria, R.; McAllister, S. R.; Floudas, C. A. Towards Accurate Residue-Residue Contact Prediction for Alpha Helical Proteins via Integer Linear Optimization. Proteins 2009, 74 (4), 929-947.
  5. Subramani, A.; Floudas, C. A. β-sheet Topology Prediction with High Precision and Recall for β and Mixed α/β Proteins. PLoS ONE 2012, 7 (3), e32461.
  6. Khoury, G. A.; Tamamis, P.; Pinnaduwage, N.; Smadbeck, J.; Kieslich, C. A.; Floudas, C. A. Princeton_TIGRESS: Protein geometry refinement using simulations and support vector machines. Proteins: Structure, Function, and Bioinformatics 2014, 82 (5), 794-814.