(515g) Extraction Of Informative Genes From Underlying Dynamics | AIChE

(515g) Extraction Of Informative Genes From Underlying Dynamics

Authors 

Yang, E. - Presenter, Rutgers - The State University of New Jersey


The selection of genes from temporal gene expression data is still focused primarily upon differences in scale for the selection rather than the utilization of shape information. While the use of scale is still useful in this experimental paradigm, it still ignores the information which is encoded in the generalized shape of the expression profiles. In our previous algorithm SLINGSHOTS[1] it was possible to exploit the shape information to obtain the characteristic profiles of the genes which respond to external stimulus as well as a subset of genes which make up this characteristic profile. It was found that the shapes of the characteristic profiles were robust even under significantly different parameters, but the numbers of genes selected were inconsistent, though the intersection between multiple runs with different parameters were statistically significant. Therefore, what we were able to obtain with different parameters was the selection of genes form a global set which represents the underlying truth.

Therefore, the issue then becomes whether this global set of genes can be identified given the characteristic profiles. To do so, we have taken elements from model based gene selection in which datasets are selected via their ability to be reconstructed based upon the models. However, instead of requiring that a model be specified before hand, or utilize a set of models that form a basis set as in CAGED[2], we have elected to utilize the elementary profiles selected under SLINGSHOTS as the basis. These elementary profiles are selected via their ability to distinguish between the transcriptional state at time 0 which functions as the control, and the state at the later time points under the hypothesis that more informative profiles will encompass the set of genes which show the greatest deviations.

The model then comprises up of a linear combination of the elementary profiles. This is then used in order to select the genes which correspond to the underlying dynamics and are strong candidates for those genes which respond to treatment. An additional benefit is that by utilizing these elementary responses we have essentially fixed the centers of each of the clusters. Given the use of linear mixture models, the coefficients describing the contribution of each elementary model in the reconstruction of the signal will then give an indication as to which cluster a given expression profile belongs in, thereby integrating the clustering with the initial gene selection. This method is more robust to noise and parameter selection than the original algorithm. This level of robustness is important in the analysis of multiple datasets especially those that span multiple chip types given the differences in the signal to noise ratio as well as the general difference in intensity characteristics of the different chips. It was found that despite changing the parameters from what we would consider non-ideal the incorporation of this model has allowed over 80% consistency between runs with different parameters. This suggests that the identification of the elementary responses is consistent, and that we have exploited one of the primary strengths of model based selection in that we limit the effect of noise without adversely affecting selection.

1. Yang E, Maguire T, Yarmush M, Berthiaume F, Androulakis I: Bioinformatics analysis of the early inflammatory response in a rat thermal injury model. BMC Bioinformatics 2007.

2. Ramoni MF, Sebastiani P, Kohane IS: Cluster analysis of gene expression dynamics. Proc Natl Acad Sci U S A 2002, 99:9121-9126.