(485ah) Secondary Structure Aided Protein Structure Alignment | AIChE

(485ah) Secondary Structure Aided Protein Structure Alignment

Authors 

Shah, S. B. - Presenter, Carnegie Mellon University


Proteins are responsible for most processes and biochemical reactions carried out within the living cell and possess a very broad range of functionality. The functional properties of proteins are strongly dependent on their structural conformations. However, the amino acid sequence of a protein is not able to provide much structural or functional information and comparison of the proteins at the sequence level is often not enough to establish their functional relationship. Comparing and aligning structures of known and unknown proteins, on the other hand, provides more information about the protein folds, similarities between proteins not depicted by the sequence, as well as elucidates functional properties. The applicability of structure alignment is found particularly in the areas of drug design and fold identification and classification.

Structural alignment can be done via several methods, including contact map overlap maximization (MAX-CMO) that aligns proteins to maximize the number of common residue contacts. Xie and Sahinidis [1] recently developed a reduction-based exact algorithm (CMOS), which is the state-of-the-art for the MAX-CMO problem. Computational experiments demonstrate that this algorithm runs significantly faster than prior exact algorithms for MAX-CMO and is also able to resolve some hard CMO instances not solved in the past. In addition, this algorithm produces protein clusters that are in excellent agreement with the SCOP classification. Although the CMOS algorithm is faster than all competing algorithms for MAX-CMO, it still is not fast enough to allow for an all-to-all structure comparison for proteins from the Protein Data Bank in a reasonable computing time.

The present work aims at improving the CMOS algorithm through the incorporation of novel bounding techniques. Proteins comprise of some well known structural motifs (secondary structures) which are the building blocks of protein folds. We represent the 3d structure of the protein as a sequence of these secondary structure motifs. The corresponding sequences for two proteins are then compared by well-established sequence alignment algorithms, such as the Needleman-Wunsch algorithm [2], providing an alignment and a lower bound for the problem. Our preliminary computations indicate that the dynamic programming-based bounds employed in CMOS can be significantly improved by the use of the proposed approach. For the Skolnick data set, the proposed approach leads to a 1.5 fold improvement in the root node bounds of the CMOS algorithm. We further demonstrate the utility of this approach through extensive computational studies.

[1] W. Xie and N.V. Sahinidis. A reduction-based exact algorithm for the contact map overlap problem. Journal of Computational Biology, 14:637-654, 2007.

[2] S. Needleman and C. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48:443-453, 1970.