(635g) Invited Talk: A Multi-Omics Bioinformatics Workflow for the Integration and Interpretation of Transcriptomics and Metabolomics Data | AIChE

(635g) Invited Talk: A Multi-Omics Bioinformatics Workflow for the Integration and Interpretation of Transcriptomics and Metabolomics Data


Sarigiannis, D. - Presenter, Aristotle University
Papaioannou, N., Aristotle University of Thessaloniki
Dallas, I., Aristotle University of Thessaloniki
Papageorgiou, T., Aristotle University of Thessaloniki
Schultz, D., Aristotle University of Thessaloniki
Frydas, I., Aristotle University
Karakitsios, S., Aristotle University of Thessaloniki
Blanc, E., University Paris Descartes
Advances in high-throughput technologies, which characterize multiple “omes” of biological samples, have revealed the need to develop integrative algorithms for data unification and knowledge discovery. Several methods have been proposed to leverage multi-omics data in order to derive actionable insights. These can be classified into the following categories: joint pathway analysis, Bayesian, fusion, similarity-based, correlation-based, and other multivariate methods. The choice among the proposed methods demands a prior comparison of their performance but unfortunately there is a lack of gold standards and consistent performance criteria (Duan et al., 2019). Therefore, it is strongly advised to try different options and various visualization techniques to understand the data comprehensively. Herein, we provide a robust workflow developed in the R/Bioconductor architecture that includes methods for integrating and interpreting transcriptomics and metabolomics data. The workflow was validated with a dataset derived from HepaRG cells exposed to Di(2-ethylhexyl) phthalate (DEHP). First, the sample preparation and data acquisition protocols for the transcriptomics and metabolomics analysis are presented along with the data analysis processes for the individual omics data. This is followed by a detailed developed workflow for multi-omics analysis.

For transcriptomic sample analysis, all steps followed the One-Color Microarray-Based Gene Expression Analysis (Low Input Quick Amp Labeling) Protocol version 6.9.1 supplied by Agilent Technologies. A NanoDrop 2000™ spectrometer was used to measure sample purity and contamination (A260, 260/280), concentration, the efficacy of the fluorescent dye incorporation (Concentration of Cy3), total cRNA yield (µg cRNA), and specific activity. According to manufacturer recommendations, criteria for the specific activity (>6 pmol Cy3 per μg cRNA) and yield (>1.65 µg cRNA) were met to ensure optimal microarray results. Samples were hybridized using Agilent’s Gene Expression Hybridization Kit (Agilent 5188-5242) to Agilent SurePrint G3 Human Gene Expression v3 8 x 60k Microarray Kit, Design ID: 072363 (Agilent Technologies Inc., CA) following the manufacturer protocol. Slides were visually inspected to ensure adequate washing and removal of debris that could interfere with scanning, placed in a SureScan slide holder (Agilent G4900-60035) and transferred to the Agilent SureScan Microarray Scanner™ (G2600, Agilent Technologies, Inc., CA) where they were analyzed using GE 1200 one-colour protocol. Scan results were subjected to quality control with Feature Extraction Software (Agilent v. Raw data were exported as .txt files and imported into R. In R, data were analyzed using the limma package with built-in analyses specific for One-Colour Agilent microarray data.

For the metabolomics analysis, an aliquot of each sample was transferred to a new Eppendorf tube after vortexing. Aliquots were dried under a gentle nitrogen flow using a Techne (Techne FSC400D Sample concentrator, Tequipment, USA) andthen resuspended using water/methanol (70:30), followed by a vigorous vortex. Next, samples were centrifuged (Centurion, Scientific Limited), and the supernatants were transferred to autosampler vials with inserts. A mixture of multiple standards was added to the solvent to assess system stability for each sample analyzed. Cell samples were analyzed with an Agilent 1290 infinity HPLC LC System coupled to an Agilent 6540 HRMS-QTOF/ LC-MS system. A Fortis Speed Core pH+ C18 (2.1x 100 mm, 2.6 μm) from Fortis Technologies (United Kingdom) preceded by a filter column was used for the cell samples. The Q-TOF system was equipped with A Dual AJS ESI probe and operated in positive and negative modes using the full MS scan mode. Data were acquired between 50 and 1000 m/z in centroid mode at a resolution of 40,000 FWHM. Two Blank samples were injected at the beginning of each analytical batch, followed by 10 runs of pooled QC samples to succeed in the stabilization of the chromatographic column. The second Blank sample contained a mixture of the standards. After analyzing 3 problem samples, two Blanks and one pooled QC sample were run to monitor system performance. The UPLC-TOF-MS data of the analyzed samples were obtained with the Agilent MassHunter Workstation Data Acquisition Software v.B.06.01and raw data generated from negative and positive ionization were pre-processed in two experiments. The tool msConvert included in the ProteoWizard toolkit (Adusumilli and Mallick, 2017) translated the vendor format (.d) data into the .mzML open format. Spectral processing was performed using the Bioconductor R-based packages XCMS (Smith et al., 2006) and CAMERA (Kuhl et al., 2012), running under R version 4.0.0 (http://http://www.r-project.org/). Data pre-treatment included multi-filtering, normalization, scaling, log transformation, and batch effects correction. Detected features were annotated using the online compound databases HMDB, LipidMaps, and KEGG as well as an in-house MS/MS spectra library from authentic standards.

After an extensive literature search and evaluation of available R packages, we based our multi-omics workflow on the mixOmics and MetaboAnalystR packages. First, a multivariate analysis of omics data was performed using the functions of the mixOmics R package (Rohart et al., 2017). The mixOmics toolkit includes 19 unsupervised and supervised multivariate methodologies, including the multiblock sPLS method and a multiblock sPLS-DA (DIABLO) for generating a predictive model for a categorical variable based on predictors from several datasets. Furthermore, results of the multivariate analysis were enriched with those from the univariate analysis performed, for example, with limma for transcriptomics and MetaboAnalystR (Pang et al., 2020) for metabolomics data. This approach led to a set of biomarkers that will have the power to discriminate not only samples across several data sets based on their outcome category but also the different outcomes; thus, they constituted a multi-omics signature that predicts the class of external samples.

MetaboAnalystR is highly recommended for joint pathway analysis on detected biomarkers because it supports pathway enrichment and topology analysis for 21 model organisms, including Homo sapiens, and hosts a total of ~1600 metabolic pathways. Enrichment analysis aims to evaluate whether the observed genes and metabolites in a particular pathway are significantly enriched (appear more than expected by random chance) within the dataset whileover-representation analysis (ORA) is based on Fisher’s exact method. On the other hand, topology analysis aims to evaluate whether a given gene or metabolite plays an essential role in a biological response based on its position within a pathway. The enriched pathways will be explored based on joint evidence or the evidence obtained from one particular omic platform for comparison. The issue of biased results is caused by the fact that while the transcriptome and the proteome are routinely mapped, current metabolomic technologies only capture a small portion of the metabolome and will be addressed using customized background lists. Using a background list will minimize the design-related technology and biological biases, which accounts for genes not expressed in specific tissues, for example.

Additional integrative bioinformatics approaches were incorporated to enlighten the outcome of the multi-omics analyses to the maximum degree, including network analysis. Furthermore, KEGG Mapper, Pathvisio, and Reactome/Functional Interaction network plug-in for Cytoscape will be used for additional mapping.

Multi-omics pathway analysis was conducted with the statistically significant genes and metabolites resulting from the omics analyses of the 3D HepaRG cells after exposure to the pollutants. The analysis resulted in 142 significantly dysregulated pathways related to oxidative stress and lipids metabolism.

In conclusion, integration and subsequent interpretation of omics data at various biological levels are highly relevant for translating findings for use in risk assessments by generating mechanistic understanding of the underlying processes. Finally, we present the usefulness of our advanced multi-omics workflow in the R programming language that allowed the identification of potential biomarkers related to toxicity mechanisms of investigated pollutants by analyzing a real dataset, including transcriptomics and metabolomics data.


ADUSUMILLI, R. & MALLICK, P. 2017. Data Conversion with ProteoWizard msConvert.

DUAN, R., GAO, L., XU, H., SONG, K., HU, Y., WANG, H., DONG, Y., ZHANG, C. & JIA, S. 2019. CEPICS: A Comparison and Evaluation Platform for Integration Methods in Cancer Subtyping. Front Genet, 10, 966.

KUHL, C., TAUTENHAHN, R., BÖTTCHER, C., LARSON, T. R. & NEUMANN, S. 2012. CAMERA: An Integrated Strategy for Compound Spectra Extraction and Annotation of Liquid Chromatography/Mass Spectrometry Data Sets. Analytical Chemistry, 84, 283-289.

PANG, Z., CHONG, J., LI, S. & XIA, J. 2020. MetaboAnalystR 3.0: Toward an Optimized Workflow for Global Metabolomics. Metabolites, 10.

ROHART, F., GAUTIER, B., SINGH, A. & KA, L. C. 2017. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput Biol, 13, e1005752.

SMITH, C. A., WANT, E. J., O’MAILLE, G., ABAGYAN, R. & SIUZDAK, G. 2006. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem, 78, 779-87.