(388e) Novel Biclustering Framework for Tertiary Contact Prediction Using Low-Homology Protein Templates

Conference

AIChE Annual Meeting

Year

2013

Proceeding

2013 AIChE Annual Meeting

Group

Engineering Sciences and Fundamentals

Session

Thermophysical Properties of Biological Systems

Time

Tuesday, November 5, 2013 - 4:27pm to 4:45pm

Authors

Smadbeck, J. - Presenter, Princeton University

Kieslich, C. A., Texas A&M University

Khoury, G. A., Pennsylvania State University-University Park

Floudas, C. A., Princeton University

The prediction of the three-dimensional structure of a protein from its amino acid sequence remains an open question in molecular biology, with important implications in biological engineering. For sequences with low-homology template structures, the ranking of template structures and generation of predicted structures can be particularly difficult. The development of accurate prediction of long-range amino-acid contacts can be important in ranking these low-homology templates for modeling. Additionally, these contacts can be useful in constraining dynamic simulations of protein structure of low-homology targets. To this end we have developed a novel tertiary contact prediction method based on biclustering analysis for extracting C_α distance constraints from low homology templates.

The initial step of our procedure is the identification of template structures for the target sequence using a modified threading algorithm of SPARKS-X [1]. Preliminary models, based on the top template structures, are generated using CYANA [2] in order to remove gaps in the structures resulting from unmapped regions of the alignments. To identify persistent structures and topologies within the templates, hierarchical clustering based on pair-wise GDT (a structure similarity measure) is performed using the initial CYANA models. The template structures belonging to the largest three clusters of the GDT-based dendrogram tree are selected for C_α-C_α distance calculation. This matrix of C_α-C_α distances then serves as input to OREO[3-5], which is an iterative framework for biclustering dense and sparse data matrices via optimal re-ordering of rows and columns.

The final step is to filter the clustered distances to exclude distances of low confidence. We apply three filters: (i) a variance filter, based on the mean/standard deviation of each distance; (ii) a sequence mapping filter, in which contacts involving poorly mapped positions are excluded, according to an accumulation of the position-specific scoring matrix (PSSM) produced by SPARKS-X; (iii) a structure-based filter, which removes distance constraints that deviate significantly from the predicted values during CYANA structure generation. The structure-based filter includes additional constraints derived from CONCORD[6] predicted secondary structure and predicted beta-sheet topology[7], which is essential for identifying conflicting constraints. The remaining clustered distances are considered strong candidates for conserved contacts and are used to generate C_α distance constraints for a final structure generation.

We present results on a series of free-modeling targets from the Critical Assessment of techniques for protein Structure Prediction 10 (CASP10) competition. This method demonstrates a superior performance over other low-homology template-based contact prediction methods in prediction short, medium, and long-range contacts for difficult targets. The contacts are used in the generation of structures through a constrained molecular dynamics (MD) run to demonstrate how such contacts are important for the accurate structural fold determination.

[1] Yang Y, Faraggi E, Zhao H, Zhou Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of the query and corresponding native properties of templates. Bioinformatics 27:2076-82 (2011)

[2] López-Méndez B, Güntert P. Automated protein structure determination from NMR spectra. J. Am. Chem. Soc. 128:13112–13122 (2006)

[3] DiMaggio PA, McAllister SR, Floudas CA, Feng XJ, Rabinowitz JD, Rabitz HA. Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies. BMC Bioinformatics, 9:458 (2008)

[4] McAllister SR, DiMaggio PA, Floudas CA. Mathematical modeling and efficient optimization methods for the distance-dependent rearrangement clustering problem. J. Global Optim. 45:111-129 (2009)

[5] DiMaggio PA, McAllister SR, Floudas CA, Feng XJ, Li G, Rabinowitz JD, Rabitz HA. Enhancing molecular discovery using descriptor-free rearrangement clustering techniques for sparse data sets. AIChE J. 56(2):405-418 (2010)

[6] Wei Y, Thompson J, Floudas CA. CONCORD: a consensus method for protein secondary structure prediction via mixed integer linear optimization Proc. R. Soc. A 468(2139):831-85 (2012)

[7] Subramani A, Floudas CA. β-sheet Topology Prediction with High Precision and Recall for β and Mixed α/β Proteins. PLoS ONE 7(3):e32461 (2012)

Topics

Biological Engineering

Thermodynamics

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

Upcoming Conferences & Events

2024 DIERS Virtual Spring Meeting

2024 Pacific Northwest Student Regional Conference

2024 Western Student Regional Conference

CCPS Middle East Regional Meeting

Hydrogen Fueling Station Safety

Streamlining Permit-to-Work Processes With a Digital Solution

6th Middle East Process Engineering Conference and Exhibition

Quantum Computing and Artificial Intelligence Applications Workshop

2024 Offshore Technology Conference

CEP: April 2024

CEP: March 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.