A Widget-Based Data-Analytic Metabolomics Pipeline

Alden, N., Tufts University
Lee, K., Tufts University
Porokhin, V., Tufts University
Colebrook-Soucie, J., Tufts University
Cokova, E., Tufts University

Owing to recent technological advances in measurement platforms, it is now possible to simultaneously detect and characterize a very large number of metabolites covering a substantial fraction of the small molecules present in a biological sample. This presents an exciting opportunity to develop potentially transformative approaches to study cells and organisms. One major challenge in realizing this potential lies in efficiently processing and analyzing the data. A typical dataset from an untargeted experiment contains thousands of “features,” each of which could correspond to a unique metabolite. While several tools have emerged to pre-process the data, annotating and analyzing the data to extract meaningful biological information remains a challenge.

We present in this work a software-based customizable workflow, which allows users to visually create their own metabolomics workflow for interpreting metabolomics data within a relevant biological context.   We describe a suite of widgets (software gadgets that plug into and interact with a larger system) organized into a ‘metabolomics toolbox’, within the plug-and-play Orange framework (http://orange.biolab.si).  Orange has over 100 widgets, organized in various toolboxes centered on data curation, visualization, and classification. To create a metabolomics analysis pipeline, users select and connect various widgets from our metabolomics toolbox and amongst other Orange widgets. Our plug-and-play visual workflow differs from current web-based approaches, offering several advantages. The ability to combine small processing steps into complex analysis pipelines allows the user to create customized and flexible data-processing workflows. This flexibility is an important advantage in efficiently analyzing data from different types of metabolomics studies. For example, a comparative study seeking to identify differences in metabolite profiles between two sets of samples requires a different workflow from a study that seeks to characterize all of the metabolites present in a sample.  Another advantage is that the widgets can integrate existing data analysis tools (e.g. machine learning) already implemented in Orange to analyze data at differing points in the pipeline. Third, the integrated data visualization toolbox within Orange can be used to view relationships in the data in many different ways, thus departing from the traditional method of visualizing metabolite data in the context of metabolic pathways or atlases.  Fourth, a custom workflow, once created, can be saved, published and shared with collaborators, thus facilitating not only data sharing but also method dissemination, which should be of broad benefit to the metabolomics research community.

We illustrate this workflow by describing the results obtained from two representative case studies. The datasets of these case studies describe the metabolite profiles of 1) Chinese hamster ovary (CHO) cells in fed-batch culture during monoclonal antibody production; 2) an in vitro culture of isolates collected from murine cecum. The first dataset involves samples from a culture of a single cell type from an organism (hamster) with an annotated genome in the KEGG database, while the second dataset is more complex, involving samples from a culture of many different microbial species.  Using the metabolomics toolbox within Orange, we identify growth inhibitory metabolites accumulating in CHO cell culture, and discover novel microbiota