(398c) Inferring Gene Regulatory Networks from Single Cell Expression Data

Conference

AIChE Annual Meeting

Year

2016

Proceeding

2016 AIChE Annual Meeting

Group

Computing and Systems Technology Division

Session

Complex and Networked Chemical and Biochemical Systems

Time

Tuesday, November 15, 2016 - 3:51pm to 4:09pm

Authors

Papili Gao, N. - Presenter, ETH Zurich

Gunawan, R., ETH Zurich

Recent advances in cell profiling technology, such as RNA-sequencing and real-time PCR (polymerase chain reaction), have provided researchers with the ability to obtain expression data of a large set of genes at single cell resolution. These revolutionary tools produce single cell data, which we could use to answer important scientific questions that are previously not possible using population-averaged measurements (Sandberg 2013). For example, by looking at single cell expression data, we could address the functional role of cell-to-cell variability arising from gene expression stochastic dynamics, in cell lineage decision-making during physiological differentiation process. However, new computational tools are also needed to take advantage of information contained in single cell data, which existing algorithms were not originally designed for.

In this work, we focused on the inference of gene regulatory network (GRN) from single cell expression data. More specifically, we considered time-stamped cross-sectional expression datasets, consistent with time series measurements taken using Fluidigm Biomark^©platform. Recently, several algorithms have been published for such GRN inference based on Boolean networks (Chen et al. 2014; Moignard et al. 2015), stochastic modelling (Teles et al. 2013), gene co-expression/correlation (Kouno et al. 2013; Moignard et al. 2013; Pina et al. 2015), and nonlinear ordinary differential equation models (Ocone et al. 2015). But, the direct application of these algorithms to time-stamped cross-sectional datasets face a few challenges due to, for example, the requirement of dense time course data and high computational complexity that scales exponentially with the size of the network.

Here, we developed a novel method for inferring the GRN structure, called Sparse Network Inference For Single cell data (SNIFS). SNIFS produces a directed graph model of the GRN by analyzing the time evolution of the distribution of single cell gene expression levels. Briefly, the algorithm begins with the computation of the changes in single cell transcriptional expression distribution over time for each gene. By employing the Kolmogorov-Smirnov (KS) distribution distances (Massey 1951) between two subsequent time points, the GRN inference involves solving a linear regression problem of the type y=XÎ±. More specifically, the KS distance of a gene at each time step y is modelled as a linear function of the KS distances of all other genes at a previous time step X. SNIFS then uses the elastic-net regularization (Zou and Hastie 2005) to find the optimal (sparse) solution Î±Â by solving the following penalized least square optimization problem:

min ||y-XÎ±||₂² +Â Î»(m||Î±||₁ + (1-m)||Î±||₂) subject toÂ Î±_jâ?¥0.

Note that by setting mÂ to 1 or to 0 turns the elastic net regularization into Lasso or Tikhonov (ridge regression) regularization, respectively. In the implementation of SNIFS, we used GLMNET (r (Friedman et al. 2010) to solve for the optimalÂ Î±.

We evaluated the performance of SNIFS by inferring 10- and 20-gene random subnetworks of E. coli and yeast GRNs using in silicotime-stamped cross-sectional single cell expression datasets. Given the structure of the GRN, we generated single cell expression data by simulating a stochastic differential equation (SDE) model: (Pinna et al. 2010)

dx_j = V(Î²Â Î (1+Î±_ijx_i/(x_i + 1)) - Î¸x_j) +Â Ï?x_jdW(t)

where x_j represents the mRNA level of gene j, Î±_i,j describes the regulation of the expression of gene j by gene i, Î² denotes the basal transcriptional rate, q is the mRNA degradation rate constant, and Ï? and V are scaling parameters. The variable dW(t) describes the random Wiener process, which accounted for intrinsic stochastic dynamics of the gene expression (Wilkinson 2009). We set Î±_ijto 1 for activation, to â??1 for repression, and to 0 otherwise. For the main datasets in the case study, we further set the parameters to the following: V=30, Î² =1, q=0.2, and Ï?=0.1. In total, we generated single cell data for 8 equally-spaced time points between t = 0.1 and t = 2.

We assessed the accuracy of the GRN predictions by computing the area under the receiver operating characteristics (AUROC) and the precision recall (AUPR) curves. We compared the GRNs predicted by SNIFS with those predicted using the population-averaged expression data by TSNI (Time Series Network Inference) (Bansal et al. 2006), and using a tree-based ensemble regression method called GENIE3 (GEne Network Inference with Ensemble of trees) (Huynh-Thu et al. 2010). The averaged AUROC and AUPR values in Table 1 indicated that for any mvalues, SNIFS could significantly outperform the predictions of TSNI and GENIE3. This result demonstrated the advantage of considering information contained in the single cell distributional data for the purpose of GRN inference, as done in SNIFS.

	Table 1. Evaluation of GRN Inference using TSNI, GENIE3, and SNIFS
	10-GENE NETWORK						20-GENE NETWORK
	AUROC			AUPR			AUROC			AUPR
m	TSNI	GENIE3	SNIFS	TSNI	GENIE3	SNIFS	TSNI	GENIE3	SNIFS	TSNI	GENIE3	SNIFS
0 (Ridge)	0.41	0.48	0.75	0.10	0.14	0.31	0.41	0.50	0.63	0.06	0.07	0.15
0.1	0.41	0.48	0.76	0.10	0.14	0.31	0.41	0.50	0.68	0.06	0.07	0.19
0.2	0.41	0.48	0.73	0.10	0.14	0.29	0.41	0.50	0.66	0.06	0.07	0.20
0.3	0.41	0.48	0.70	0.10	0.14	0.28	0.41	0.50	0.66	0.06	0.07	0.21
0.4	0.41	0.48	0.67	0.10	0.14	0.27	0.41	0.50	0.65	0.06	0.07	0.22
0.5	0.41	0.48	0.65	0.10	0.14	0.25	0.41	0.50	0.64	0.06	0.07	0.22
0.6	0.41	0.48	0.63	0.10	0.14	0.25	0.41	0.50	0.64	0.06	0.07	0.23
0.7	0.41	0.48	0.61	0.10	0.14	0.25	0.41	0.50	0.63	0.06	0.07	0.23
0.8	0.41	0.48	0.61	0.10	0.14	0.26	0.41	0.50	0.62	0.06	0.07	0.23
0.9	0.41	0.48	0.60	0.10	0.14	0.25	0.41	0.50	0.61	0.06	0.07	0.23
1 (Lasso)	0.41	0.48	0.58	0.10	0.14	0.25	0.41	0.50	0.60	0.06	0.07	0.24

REFERENCES

Bansal, M., Gatta, G. Della and di Bernardo, D. (2006). Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics, 22(7), pp.815â??822.

Chen, H. et al. (2014). Single-cell transcriptional analysis to uncover regulatory circuits driving cell fate decisions in early mouse development. Bioinformatics, 31(7), pp.1060â??1066.

Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of statistical software, 33(1), pp.1â??22.

Huynh-Thu, V.A. et al. (2010). Inferring regulatory networks from expression data using tree-based methods. PloS one, 5(9), p.e12776.

Kouno, T. et al. (2013). Temporal dynamics and transcriptional control using single-cell gene expression analysis. Genome biology, 14(10), p.R118.

Massey, F.J. (1951). The Kolmogorov-Smirnov Test for Goodness of Fit. Journal of the American Statistical Association, 46(253), pp.68 â?? 78.

Moignard, V. et al. (2013). Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis. Nature cell biology, 15(4), pp.363â??72.

Moignard, V. et al. (2015). Decoding the regulatory network of early blood development from single-cell gene expression measurements. Nature Biotechnology, advance on(3).

Ocone, a. et al. (2015). Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data. Bioinformatics, 31(12), pp.i89â??i96.

Pina, C. et al. (2015). Single-Cell Network Analysis Identifies DDIT3 as a Nodal Lineage Regulator in Hematopoiesis. Cell reports, 11(10), pp.1503â??10.

Pinna, A., Soranzo, N. and de la Fuente, A. (2010). From knockouts to networks: establishing direct cause-effect relationships through graph analysis. PloS one, 5(10), p.e12912.

Sandberg, R. (2013). Entering the era of single-cell transcriptomics in biology and medicine. Nature Methods, 11(1), pp.22â??24.

Teles, J. et al. (2013). Transcriptional regulation of lineage commitment--a stochastic model of cell fate decisions. PLoS computational biology, 9(8), p.e1003197.

Wilkinson, D.J. (2009). Stochastic modelling for quantitative description of heterogeneous biological systems. Nature reviews. Genetics, 10(2), pp.122â??33.

Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), pp.301â??320.

Topics

Systems Biology

Stem Cell Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

Upcoming Conferences & Events

2024 Eckhardt Northeast Student Regional Conference

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

2024 DIERS Virtual Spring Meeting

2024 Pacific Northwest Student Regional Conference

2024 Western Student Regional Conference

CCPS Middle East Regional Meeting

Hydrogen Fueling Station Safety

Streamlining Permit-to-Work Processes With a Digital Solution

CEP: April 2024

CEP: March 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(398c) Inferring Gene Regulatory Networks from Single Cell Expression Data

AIChE Annual Meeting

2016

2016 AIChE Annual Meeting

Computing and Systems Technology Division

Complex and Networked Chemical and Biochemical Systems

Tuesday, November 15, 2016 - 3:51pm to 4:09pm

Authors

Topics

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams