(237g) Molecular Signatures That Differentiate Cancer Subtype and Predict Clinical Outcome

Price, N. D., University of Illinois at Urbana-Champaign
Hood, L., Institute for Systems Biology
Shmulevich, I., Institute for Systems Biology
Lin, B., Institute for Systems Biology
Urban, N., Fred Hutchinson Cancer Research Center
Drescher, C., Fred Hutchinson Cancer Research Center
Zhang, W., The University of Texas M.D. Anderson Cancer Center

The identification of molecular signatures to stratify disease and predict clinical outcomes are issues of major significance in cancer research today. To identify such molecular signatures, we employed machine-learning techniques on two new datasets to discover novel multi-parameter markers to 1) differentiate two closely-related cancers, leiomyosarcoma (LMS) and gastrointestinal stromal tumor (GIST), and 2) predict the outcome of chemotherapy (either sensitive or resistance to cisplatin) in women diagnosed with ovarian cancer. In collaboration with researchers at The University of Texas M.D. Anderson Cancer Center, we generated a data set measuring the expression of 43,931 oligonucleotides using Agilent arrays on tissue samples from 71 patients. We used a recently-developed approach from Rai Winslow's lab at Johns Hopkins (called k-Top Scoring Pairs) to classify tumors based on finding genes whose relative expression reverses between the cancer subtypes. Using this approach we found simple decision rules of the form: if gene A expression > gene B expression, vote GIST, else LMS (gene names withheld temporarily pending publication). The final classification is then performed by a majority vote of three gene pairs. One of the strong advantages of this relative-expression approach is that it makes completely irrelevant normalization of arrays, thereby bypassing an often problematic and arbitrary analysis step. Another strong advantage of the method is that it keeps classification rules simple, making over-fitting on current data to the detriment of being able to predict future data less likely than in more complicated models. Using this computational approach, we identified a multi-gene classifier that can distinguish between LMS and GIST. The differentiation of these two closely-related sarcomas is important because the recommended treatments differ. The accuracy of our multi-parameter marker on future cases was estimated at 99% (70.5/71) using leave-one-out cross validation (LOOCV). (The 0.5 in the accuracy appears because in one case the test was indeterminate due to lack of signal on one of the arrays for the relevant genes and thus this sample was scored as a random guess on a binary variable.) We are in the process of validating this classifier using real-time PCR and have now accumulated a significant number of additional samples to serve as an independent test of these markers. If this independent test is successful, as is expected from the cross-validation results, we have plans to make a device to implement this test and hope that it will be used in the clinic. The second project I will present is focused on treatment outcome in patients diagnosed with ovarian cancer and was performed in collaboration with researchers at the Fred Hutchinson Cancer Research Center. We measured the expression of 35,553 oligonucleotides (including specific splice variants) from 22 patient samples using in-house arrays developed at the Institute for Systems Biology. In this study, we identified a two-gene marker that predicts whether an ovarian cancer tumor will be responsive to chemotherapy with a high degree of accuracy on the samples tested so far. Using cross validation techniques (LOOCV), the estimated accuracy of the multi-parameter marker on future cases was 98% (21.5/22). This was a remarkable result given that the best published result for this type of data in the literature was 77%. We are in the process of obtaining an independent set of tumor samples on which to validate this highly-promising marker. Thus, the use of in silico systems biology techniques for marker identification is proving highly useful for problems of clinical interest.