(192bb) Multi Metric 3D Protein Descriptors: The Correlation Impact of Algebraic Forms and Its Analysis | AIChE

(192bb) Multi Metric 3D Protein Descriptors: The Correlation Impact of Algebraic Forms and Its Analysis

Authors 

Teran, J. - Presenter, Universidad San Francisco de Quito
Marrero-Ponce, Y., Universidad San Francisco de Quito
Macro molecular descriptors serve to distinguish and represent proteins of different functional and structural profiles by exploring its characteristic features, components and distribution of the constituent amino acids, and its physicochemical properties. In the present work, we applied the calculation of protein 3D descriptors by using several algebraic forms in the Rn space, applying various generalization metrics matrices. For these descriptors `calculations, vectors conformed by numerical values representing each amino acid side chain properties were used as weighting schemes. Several normalization methods (simple stochastic, mutual probability) were applied on the inter amino acid distance matrices as a calculation standardization. The local amino acidic invariant (LAI) is introduced to characterize fragments of interest in proteins and study its properties. Also, In order to evaluate inter amino acid interactions, topological and spatial cut-offs were applied.

To assess the utility of global and local indices, a classification model for the prediction of the major four protein structural classes was built with the Linear Discriminant Analysis (LDA) technique. The developed model correctly 92.6% and 92.7% of the proteins on the training and test sets, respectively. The model yield high values of the generalized square correlation coefficient (GC2) on both the training and test series. The statistical parameters derived from the validation procedures endorse the strength, stability and the high predictive power of the proposed model. The performance of the LDA-model demonstrates the competence of the proposed indices not only to codify relevant biochemical information related to the structural classes of proteins, but also to yield suitable interpretability. It is anticipated that the current method will benefit the prediction of other protein attributes or functions.