(560g) Biclustering Via Optimal Ordering

Conference

AIChE Annual Meeting

Year

2007

Proceeding

Food, Pharmaceutical & Bioengineering Division

Session

Thursday, November 8, 2007 - 10:40am to 11:00am

Authors

DiMaggio, P. A. Jr. - Presenter, Princeton University

McAllister, S. R. - Presenter, Princeton University

Floudas, C. A. - Presenter, Princeton University

Feng, X. - Presenter, Princeton University

Rabinowitz, J. D. - Presenter, Princeton University

Rabitz, H. A. - Presenter, Princeton University

Problems of data organization and data clustering are prevalent across a number of different disciplines. The goal of data clustering, regardless of the application, is to organize data in such a way that objects which exhibit ?similar? attributes are grouped together. The definition of similarity depends on the application and may correspond to the direct comparison of values or the degree of correlation among trends or patterns of values. Several methods have been proposed for the clustering of large-scale data. The most frequently used techniques are hierarchical [1] and partitioning [2] clustering because of their availability and ease of implementation. However, these approaches result in suboptimal clusters because the comparisons between terms are evaluated locally.

Rearrangement clustering has emerged as an effective technique for optimally minimizing the sum of the pair-wise distances between rearranged rows and columns. The bond energy algorithm (BEA) was originally proposed as a method for finding ?good? solutions to this problem [3] and it was subsequently discovered that this problem could be formulated as a traveling salesman problem (TSP) which can be solved to optimality [4,5] using existing methods. The concept of biclustering has emerged in the context of microarray experiments where the rearrangement of the rows and columns is needed to identify a bicluster, which corresponds to a subset of genes related under a certain set of conditions. This is particularly useful when a gene is involved in more than one biological process or belongs to a group of genes that are co-expressed under limited conditions. An assortment of techniques and formulations has been proposed, many of which utilize heuristics to identify biclusters and often require normalization to elucidate patterns of interest. A good review on biclustering can be found elsewhere [6].

In this work, we present a method for the optimal rearrangement of both the rows and columns of a data matrix. Various objective functions can be used to quantify the degree of similarity among adjacent rows or columns in the final arrangement. We present both a network flow representation, which can be extended to address sparse data sets, and a traveling salesman problem (TSP) representation to model the actual permutations of the rows and columns. We compared the results of our method with hierarchical clustering for (a) metabolite concentration data [7], (b) colon cancer data [8], and (c) gene expression data [9]. For each of these data sets, our method provides a denser grouping of related metabolites and annotated genes (i.e., rows of the data matrix) than does hierarchical clustering, which suggests that optimal ordering has distinct advantages over local ordering. Our method also reconstructs underlying fundamental patterns in the data. For the metabolite concentration data, the nitrogen and carbon starvation conditions are perfectly separated and the in the colon cancer data the tissue types are partitioned into tumor-rich and normal-rich regions.

[1] M.S. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. PNAS, 95(25):14863-14868, 1998.

[2] J.A. Hartigan and M.A. Wong. Algorithm AS 136: a K-means clustering algorithm. Applied Statistics, 28:100-108, 1979.

[3] W.T. McCormick Jr, P.J. Schweitzer, and T.W. White. Problem decomposition and data reorganization by a clustering technique. Operations Research, 20(5):993-1009, 1972.

[4] J.K. Lenstra. Clustering a data array and the traveling-salesman problem. Operations Research, 22(2):413-414, 1974.

[5] J.K. Lenstra and A.H.G. Rinnooy Kan. Some simple applications of the traveling salesman problem. Operations Research Quarterly, 26(4):717-733, 1975.

[6] S.C. Madeira and A.L. Oliveira. Biclustering algorithms for biological data analysis: A survey. IEE-ACM Trans. Comp. Bio., 1(1):24-45, 2004.

[7] M. J. Brauer, J. Yuan, B. Bennett, W. Lu, E. Kimball, D. Bostein, and J.D. Rabinowitz. Conservation of the metabolomic response to starvation across two divergent microbes. Proc. Natl. Acad. Sci., 103: 19302-19307.

[8] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci., 96:6745-6750, 1999.

[9] R.B. Brem and L. Kruglyak. The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci., 102(5):1572-1577, 2005.

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

Upcoming Conferences & Events

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

2024 DIERS Virtual Spring Meeting

2024 Pacific Northwest Student Regional Conference

2024 Western Student Regional Conference

CCPS Middle East Regional Meeting

Hydrogen Fueling Station Safety

Streamlining Permit-to-Work Processes With a Digital Solution

6th Middle East Process Engineering Conference and Exhibition

CEP: April 2024

CEP: March 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(560g) Biclustering Via Optimal Ordering

AIChE Annual Meeting

2007

2007 Annual Meeting

Food, Pharmaceutical & Bioengineering Division

Systems Biotechnology

Thursday, November 8, 2007 - 10:40am to 11:00am

Authors

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams

Code of Conduct

Beware of Hotel and Attendee-list Scams