(160be) Comparison of Three High-Level Microarray Statistical Analysis Methods for Disease Mechanism Identification | AIChE

(160be) Comparison of Three High-Level Microarray Statistical Analysis Methods for Disease Mechanism Identification


Schultz, D. - Presenter, Aristotle University of Thessaloniki
Frydas, I., Aristotle University
Karakitsios, S., Aristotle University of Thessaloniki
Sarigiannis, D., Aristotle University of Thessaloniki
Transcriptomics is the omics platform that investigates the transcriptome, which is the complete set of RNA transcripts that are produced by the genome under specific conditions. The mRNA compartment of the transcriptome refers to gene expression and can be examined using microarrays. mRNA is the template for protein synthesis and therefore reflects the genes that are actively expressed in an organism. To date, integrated omics technologies such as transcriptomics have been instrumental in a host of applications, including disease discovery, and as a field have undergone large advancements in recent years. One of the main techniques used in transcriptomics is microarrays, which peaked in use in the early 2010s and remains a sound, cost-effective option that can be paired with targeted RNA-Seq to obtain reliable and robust results. Microarrays hybridize the transcripts of fluorescently labeled mRNA to an array with a defined set of complementary short nucleotide oligomers (“probes”). The resultant fluorescence intensity emitted from each probe location is indicative of the transcript abundance for that probe sequence. The genes associated with each probe can then be determined and information such as differentially expressed genes (DEGs; the difference in expression of genes between two arrays) amongst conditions can be assessed. Microarrays allow the hybridization of tens of thousands of transcripts simultaneously while offering a greatly reduced effort and cost per gene. With the ability to identify so many transcripts, omics techniques also generate huge quantities of data that require advanced statistical methods for analysis. Therefore, a major issue that frequently arises during transcriptomic research is the question of data analysis and the ideal method to handle raw data, identify validated DEGs, and generate results with limited rates of Type I and Type II error. Therefore, we compared three statistical techniques, including Significant Analysis of Microarrays (SAM; R software), Linear Models for Microarray Data (LIMMA; R software), and T-test (Agilent GeneSpring™ software) to determine the uniformity amongst results. For the present research, we analyzed microarray (Agilent™) data generated from three real datasets from experiments from within the lab group that aim to detect the molecular mechanisms involved in metabolic disorders associated with environmental contaminants. In these experiments, HepaRG cells were exposed to either a pharmaceutical (amiodarone) or environmental pollutant (di-2-ethylhexyl phthalate) that led to transcriptomic alterations in metabolic pathways. Overall, this work is instrumental in future efforts to generate reliable computational models and systems biology for the prediction of a host of metabolic diseases.

Amiodarone is a class III antiarrhythmic drug that induces target effects by leading to myocardial depolarization and repolarization. It was initially synthesized in the 1960s as an antianginal agent but was later discovered to be an effective antiarrhythmic. Due to its high efficacy in treating supraventricular and ventricular arrhythmias, it has become a widely prescribed drug. Amiodarone, however, is also associated with various side effects largely due to its high iodine content, and most importantly, side effects causing pregnancy complications as well as those leading to thyroid dysfunction are the most common and are due to its direct toxic effects on the thyroid itself (Latini et al. 1984). Other side effects include pulmonary toxicity, particularly because pulmonary disease is frequently a co-occurring condition in patients treated with amiodarone (Ruzieh et al. 2019). This is of particular concern in patients requiring long-term administration of the drug, especially in doses over 500 mg/day due to the dose-dependent toxicity on the pulmonary system. Furthermore, amiodarone is highly lipophilic and concentrates mainly in adipose tissue, cardiac and skeletal muscle, and the thyroid. The half-life in the body is approximately 100 days and hence, amiodarone toxicity has to potential to occur months after patients discontinue use (Basaria and Cooper 2005). Di-2-ethylhexyl phthalate (DEHP) is a colorless, viscous, and lipophilic plasticizer that is frequently used in medical devices, textiles, and during manufacturing procedures.

DEHP can leach out at any point in the product life cycle and has therefore been quantified in air, water, and soil samples worldwide (Huang et al. 2008). DEHP has been found to be a potent carcinogen and has the potential to lead to reproductive and developmental toxicity. Owing to its extensive use, there is mounting evidence that the general population has or will be exposed to DEHP in their lifetime, and currently, extensive research is being done to better understand the kinetics of DEHP toxicity, especially as they relate to effects on carcinogenesis and child health (Ito et al. 2019). To date, toxic effects of DEHP have been found to include endocrine, reproductive, renal, neural, and hepatotoxicity (Rowdhwal et al. 2018).

During experiments, 2D and 3D HepaRG cells were exposed to a carrier control, amiodarone, and DEHP. RNA was extracted and stored at -20°C until microarray analysis. Subsequently, samples were processed following the One-Color Microarray-Based Gene Expression Analysis (Low Input Quick Amp Labeling) Protocol version 6.9.1 supplied by Agilent Technologies. Samples were hybridized using Agilent’s Gene Expression Hybridization Kit (Agilent 5188-5242) to Agilent SurePrint G3 Human Gene Expression v3 8 x 60k Microarray Kit, design ID: 072363 (Agilent Technologies Inc., CA). Microarrays were read using Agilent SureScan Microarray Scanner™ (G2600, Agilent Technologies, Inc., CA) and analyzed using GE 1200 one-color protocol in the Agilent Feature Extraction™ software. After feature extraction, the raw data is exported and analyzed with a T-Test, SAM, and LIMMA based on results found in the literature (Chrominski and Tkacz 2015).

In conclusion, a T-Test determines statistical significance between two datasets by using the average and variance of the populations and is currently one of the most frequently used and easiest statistical tests to use. SAM is a method that is specifically used to determine statistical significance in gene expressions between groups. In certain regards, SAM is similar to a T-Test, although SAM utilizes non-parametric statistics mainly owing to the fact that microarray data are not normally distributed (Tusher, Tibshirani, and Chu 2001). LIMMA, utilizes linear models to analyze microarray data.

As we hypothesized, the results among the three analysis methods vary slightly, and the most robust method to determine DEGs is to only select those that are found to be statistically significant across these three methods. In our experiments the method with the less outliers and the most robust analysis was the SAM followed by the LIMMA and the T-Test. Though, the most consistent results were the common DEGs identified from the three methods after RT-qPCR validation, and correlation of the gene fold-change indication to copy numbers.


  1. Basaria S and Cooper DS. 2005. Amiodarone and the thyroid. The American Journal of Medicine, 118 (7): 706 – 714.
  2. Chrominski K and Tkacz M (2015) Comparison of High-Level Microarray Analysis Methods in the Context of Result Consistency. PLoS ONE 10(6):e0128845. doi:10.1371/journal.pone.0128845
  3. Huang P.C., C.-J. Tien, Y.-M. Sun, C.-Y. Hsieh, and C.-C. Lee, “Occurrence of phthalates in sediment and biota: relationship to aquatic factors and the biota-sediment accumulation factor,” Chemosphere, vol. 73, no. 4, pp. 539–544, 2008
  4. Ito, Y, Kamijima, M and Nakajima, T. Di(2-ethylhexyl) phthalate-induced toxicity and peroxisome proliferator-activated receptor alpha: a review. Environ Health Prev Med 24, 47 (2019). https://doi.org/10.1186/s12199-019-0802-z
  5. Latini R, Tognoni G, Kates RE. Clinical Pharmacokinetics of Amiodarone. Clin Pharmacokinet 9, 136–156 (1984). https://doi.org/10.2165/00003088-198409020-00002
  6. Rowdhwal S and Chen J. 2018. Toxic effects of Di-2-ethylhexy Phthalate: An Overview. BioMed Research International, 2018.
  7. Ruzieh M, Moroi MK, Aboujamous NM, Ghahramani M, Naccarelli GV, Mandrola J, Foy AJ. Meta-Analysis Comparing the Relative Risk of Adverse Events for Amiodarone Versus Placebo. Am J Cardiol. 2019;124(12):1889. Epub 2019 Sep 26.
  8. Tusher V, Tibshirani R, Chu C, Significance analysis of microarrays applied to ionizing radiation response, Proceedings of the National Academy of Sciences, vol. 98, 2001, p. 5116–21