(597b) F-Test in k-Fold Cross Validation and Its Application to the Discovery of Biological Networks

Conference

AIChE Annual Meeting

Year

2010

Proceeding

In Silico Systems Biology: Intracellular Signaling and Gene Regulation II

Time

Thursday, November 11, 2010 - 8:55am to 9:20am

Authors

Maurya, M. - Presenter

Gupta, S. - Presenter, University of California, San Diego

Subramaniam, S. - Presenter, University of California, San Diego

There has been considerable emphasis in the recent years on applying systems approaches to decipher and reconstruct cellular networks using high-throughput data. To avoid over-fitting the data and to ensure that the resulting model has good predictive power, cross-validation is often used in data-driven input-response (or input/output (I/O)) modeling. F-test is commonly used to compare the fit-errors (usually, sum of squared prediction errors (SSE)) of the model on the training and test sets. If sufficiently large dataset is available then the data can be divided into non-overlapping training and test sets and F-test can be applied subject to the assumption of the normality of the experimental data points and hence that of the prediction errors. However, when large datasets are not available owing to the cost of conducting experiments, as is often the case for biological systems, k-fold cross validation (CV) is used. In k-fold CV, the entire dataset is randomly divided into k groups. The model is developed using (k - 1) groups as training set and remaining one set is used as the test set. This process is repeated until all k groups are used as a test set once. The mean of the SSE for the test set is compared with the mean of the SSE for the training set through F-test. In this case, 1/k fraction of the samples in any training set is exactly the same as 1/k fraction in other (k-1) training sets. Hence, the computation of the degree of freedom (DOF) for the average SSE for the training sets is not straight-forward. To the best of our knowledge, in most existing work on k-fold CV, the comparison between the average SSE for the training and test sets is carried out qualitatively in an ad-hoc fashion. In this work, we have developed a rigorous procedure to compute the DOFs for robust F-test in k-fold cross-validation.

We have used this approach of k-fold CV to a partial-least squares (PLS)-based method for identifying the interactions between different signaling proteins using phosphoprotein data in mouse macrophage RAW 264.7 cells provided by the Alliance for Cellular Signaling (AfCS). A value of k = 10 was used. In the PLS-based modeling scheme used here, only one output is used at a time (1), which is different from the traditional way of applying PLS technique on I/O data. Once the I/O model is deemed robust based on the F-test, significant interactions are selected through t-test (1, 2) and are used to reconstruct the phosphoprotein signaling network. Important signaling events such as activation of glycogen synthase kinase 3 by protein kinase B (Akt) are captured by our reconstructed network. Novel links as well as testable hypotheses are also generated by our analysis approach. We will also show the application of the approach to least-square regression and principal component regression-based techniques for modeling I/O data.

Reference

1. Gupta, S., M. R. Maurya, and S. Subramaniam. 2010. Identification of crosstalk between phosphoprotein signaling pathways in RAW 264.7 macrophage cells. PLoS Comput Biol. 6:e1000654.

2. Pradervand, S., M. R. Maurya, and S. Subramaniam. 2006. Identification of signaling components required for the prediction of cytokine release in RAW 264.7 macrophages. Genome Biol. 7:R11.

Topics

Biological Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

Upcoming Conferences & Events

2024 Eckhardt Northeast Student Regional Conference

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

2024 DIERS Virtual Spring Meeting

2024 Pacific Northwest Student Regional Conference

2024 Western Student Regional Conference

CCPS Middle East Regional Meeting

Hydrogen Fueling Station Safety

Streamlining Permit-to-Work Processes With a Digital Solution

CEP: April 2024

CEP: March 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(597b) F-Test in k-Fold Cross Validation and Its Application to the Discovery of Biological Networks

AIChE Annual Meeting

2010

2010 Annual Meeting

Systems Biology

In Silico Systems Biology: Intracellular Signaling and Gene Regulation II

Thursday, November 11, 2010 - 8:55am to 9:20am

Authors

Topics

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams