(508f) Novel Framework for Beta-Sheet Topology Prediction Using Low-Homology Template-Based Constraints

Kieslich, C. A., Texas A&M University
Smadbeck, J., Princeton University
Khoury, G. A., Pennsylvania State University-University Park
Floudas, C. A., Princeton University

Accurate prediction of beta-sheet topology is a major unresolved challenge in the area of protein structure prediction. Current state-of-art approaches utilize sequence alignments, secondary structure assignments, and pairwise potentials to derive rank ordered lists of solutions [1-3].  Due to the large combinatorial complexity that arises for even a small number of beta strands, Mixed Integer Linear optimization (MILP) models have been proposed to identify the optimum topology [1-2]. Involvement of non-local tertiary contacts can make the prediction of beta-sheets based on sequence information alone very difficult. Therefore, we propose a novel framework for beta-sheet topology prediction, which utilizes structural templates of low-homology to derive likely beta-strand pairs that serve as constraints for a MILP model.

Given a query sequence, structural templates are first identified using a modified version of SPARKS-X [4]. Distance constraints are extracted from each template and are used as input for CYANA [5] to generate structural models. The initial set of structural templates is reduced using hierarchical clustering based on pairwise GDT, a measure of protein structure similarity. The beta-sheet topology of each template structure is extracted based on sequence alignments to the query and secondary structure assignment. A final set of structural templates is selected based on clustering of the template beta-sheet topologies, and a set of observed strand pairs is obtained.

The presented MILP model utilizes pairwise potentials calculated by BetaPro [6], as well as template-based constraints derived as described above. Additional constraints, initially proposed by Subramani and Floudas [2],  are also imposed to ensure that only biologically relevant topologies are generated. Ultimately, a rank-ordered list of likely beta-sheet topologies is produced. We present results for the application of the proposed framework to all beta and mixed alpha-beta proteins of the PDBSelect25 data set, as well as to the most difficult targets from recent CASP competitions.              

1.      Klepeis, J. L.; Floudas, C. A. Prediction of Beta-Sheet Topology and Disulfide Bridges in Polypeptides. Journal of Computational Chemistry 2003, 24, 191-208.

2.      Subramani, A.; Floudas, C. A. β-sheet Topology Prediction with High Precision and Recall for β and Mixed α/β Proteins. PLoS ONE 2012, 7 (3), e32461.

3.      Ho, H.K.; Zhang, L.; Ramamohanarao, K.; Martin, S. A survey of machine learning methods for secondary and supersecondary protein structure prediction. Methods Molecular Biology 2013, 932, 87-106.

4.      Yang, Y.; Faraggi, E.; Zhao, H.; Zhou, Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 2011, 27(15), 2076-82.

5.      López-Méndez, B.; Güntert, P. Automated protein structure determination from NMR spectra. Journal of  American Chemical Society 2006, 128, 13112-13122.

6.      Cheng, J.; Baldi, P. Three-stage prediction of protein beta-sheets by neural networks, alignments and graph algorithms. Bioinformatics 2005, 21, 75-84.