(398c) Inferring Gene Regulatory Networks from Single Cell Expression Data

Authors: 
Papili Gao, N., ETH Zurich
Gunawan, R., ETH Zurich
Recent advances in cell profiling technology, such as RNA-sequencing and real-time PCR (polymerase chain reaction), have provided researchers with the ability to obtain expression data of a large set of genes at single cell resolution. These revolutionary tools produce single cell data, which we could use to answer important scientific questions that are previously not possible using population-averaged measurements (Sandberg 2013). For example, by looking at single cell expression data, we could address the functional role of cell-to-cell variability arising from gene expression stochastic dynamics, in cell lineage decision-making during physiological differentiation process. However, new computational tools are also needed to take advantage of information contained in single cell data, which existing algorithms were not originally designed for.

In this work, we focused on the inference of gene regulatory network (GRN) from single cell expression data. More specifically, we considered time-stamped cross-sectional expression datasets, consistent with time series measurements taken using Fluidigm Biomark©platform. Recently, several algorithms have been published for such GRN inference based on Boolean networks (Chen et al. 2014; Moignard et al. 2015), stochastic modelling (Teles et al. 2013), gene co-expression/correlation (Kouno et al. 2013; Moignard et al. 2013; Pina et al. 2015), and nonlinear ordinary differential equation models (Ocone et al. 2015). But, the direct application of these algorithms to time-stamped cross-sectional datasets face a few challenges due to, for example, the requirement of dense time course data and high computational complexity that scales exponentially with the size of the network.

Here, we developed a novel method for inferring the GRN structure, called Sparse Network Inference For Single cell data (SNIFS). SNIFS produces a directed graph model of the GRN by analyzing the time evolution of the distribution of single cell gene expression levels. Briefly, the algorithm begins with the computation of the changes in single cell transcriptional expression distribution over time for each gene. By employing the Kolmogorov-Smirnov (KS) distribution distances (Massey 1951) between two subsequent time points, the GRN inference involves solving a linear regression problem of the type y=Xα. More specifically, the KS distance of a gene at each time step y is modelled as a linear function of the KS distances of all other genes at a previous time step X. SNIFS then uses the elastic-net regularization (Zou and Hastie 2005) to find the optimal (sparse) solution α by solving the following penalized least square optimization problem:

min ||y-Xα||22 + λ(m||α||1 + (1-m)||α||2) subject to αj�0.

Note that by setting m to 1 or to 0 turns the elastic net regularization into Lasso or Tikhonov (ridge regression) regularization, respectively. In the implementation of SNIFS, we used GLMNET (r (Friedman et al. 2010) to solve for the optimal α.

We evaluated the performance of SNIFS by inferring 10- and 20-gene random subnetworks of E. coli and yeast GRNs using in silicotime-stamped cross-sectional single cell expression datasets. Given the structure of the GRN, we generated single cell expression data by simulating a stochastic differential equation (SDE) model: (Pinna et al. 2010)

dxj = V(β Π(1+αijxi/(xi + 1)) - θxj) + Ï?xjdW(t)

where xj represents the mRNA level of gene j, αi,j describes the regulation of the expression of gene j by gene i, β denotes the basal transcriptional rate, q is the mRNA degradation rate constant, and Ï? and V are scaling parameters. The variable dW(t) describes the random Wiener process, which accounted for intrinsic stochastic dynamics of the gene expression (Wilkinson 2009). We set αijto 1 for activation, to â??1 for repression, and to 0 otherwise. For the main datasets in the case study, we further set the parameters to the following: V=30, β =1, q=0.2, and Ï?=0.1. In total, we generated single cell data for 8 equally-spaced time points between t = 0.1 and t = 2.

We assessed the accuracy of the GRN predictions by computing the area under the receiver operating characteristics (AUROC) and the precision recall (AUPR) curves. We compared the GRNs predicted by SNIFS with those predicted using the population-averaged expression data by TSNI (Time Series Network Inference) (Bansal et al. 2006), and using a tree-based ensemble regression method called GENIE3 (GEne Network Inference with Ensemble of trees) (Huynh-Thu et al. 2010). The averaged AUROC and AUPR values in Table 1 indicated that for any mvalues, SNIFS could significantly outperform the predictions of TSNI and GENIE3. This result demonstrated the advantage of considering information contained in the single cell distributional data for the purpose of GRN inference, as done in SNIFS.

 

Table 1. Evaluation of GRN Inference using TSNI, GENIE3, and SNIFS

 

10-GENE NETWORK

20-GENE NETWORK

 

AUROC

AUPR

AUROC

AUPR

m

TSNI

GENIE3

SNIFS

TSNI

GENIE3

SNIFS

TSNI

GENIE3

SNIFS

TSNI

GENIE3

SNIFS

0 (Ridge)

0.41

0.48

0.75

0.10

0.14

0.31

0.41

0.50

0.63

0.06

0.07

0.15

0.1

0.41

0.48

0.76

0.10

0.14

0.31

0.41

0.50

0.68

0.06

0.07

0.19

0.2

0.41

0.48

0.73

0.10

0.14

0.29

0.41

0.50

0.66

0.06

0.07

0.20

0.3

0.41

0.48

0.70

0.10

0.14

0.28

0.41

0.50

0.66

0.06

0.07

0.21

0.4

0.41

0.48

0.67

0.10

0.14

0.27

0.41

0.50

0.65

0.06

0.07

0.22

0.5

0.41

0.48

0.65

0.10

0.14

0.25

0.41

0.50

0.64

0.06

0.07

0.22

0.6

0.41

0.48

0.63

0.10

0.14

0.25

0.41

0.50

0.64

0.06

0.07

0.23

0.7

0.41

0.48

0.61

0.10

0.14

0.25

0.41

0.50

0.63

0.06

0.07

0.23

0.8

0.41

0.48

0.61

0.10

0.14

0.26

0.41

0.50

0.62

0.06

0.07

0.23

0.9

0.41

0.48

0.60

0.10

0.14

0.25

0.41

0.50

0.61

0.06

0.07

0.23

1 (Lasso)

0.41

0.48

0.58

0.10

0.14

0.25

0.41

0.50

0.60

0.06

0.07

0.24

REFERENCES

Bansal, M., Gatta, G. Della and di Bernardo, D. (2006). Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics, 22(7), pp.815â??822.

Chen, H. et al. (2014). Single-cell transcriptional analysis to uncover regulatory circuits driving cell fate decisions in early mouse development. Bioinformatics, 31(7), pp.1060â??1066.

Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of statistical software, 33(1), pp.1â??22.

Huynh-Thu, V.A. et al. (2010). Inferring regulatory networks from expression data using tree-based methods. PloS one, 5(9), p.e12776.

Kouno, T. et al. (2013). Temporal dynamics and transcriptional control using single-cell gene expression analysis. Genome biology, 14(10), p.R118.

Massey, F.J. (1951). The Kolmogorov-Smirnov Test for Goodness of Fit. Journal of the American Statistical Association, 46(253), pp.68 â?? 78.

Moignard, V. et al. (2013). Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis. Nature cell biology, 15(4), pp.363â??72.

Moignard, V. et al. (2015). Decoding the regulatory network of early blood development from single-cell gene expression measurements. Nature Biotechnology, advance on(3).

Ocone, a. et al. (2015). Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data. Bioinformatics, 31(12), pp.i89â??i96.

Pina, C. et al. (2015). Single-Cell Network Analysis Identifies DDIT3 as a Nodal Lineage Regulator in Hematopoiesis. Cell reports, 11(10), pp.1503â??10.

Pinna, A., Soranzo, N. and de la Fuente, A. (2010). From knockouts to networks: establishing direct cause-effect relationships through graph analysis. PloS one, 5(10), p.e12912.

Sandberg, R. (2013). Entering the era of single-cell transcriptomics in biology and medicine. Nature Methods, 11(1), pp.22â??24.

Teles, J. et al. (2013). Transcriptional regulation of lineage commitment--a stochastic model of cell fate decisions. PLoS computational biology, 9(8), p.e1003197.

Wilkinson, D.J. (2009). Stochastic modelling for quantitative description of heterogeneous biological systems. Nature reviews. Genetics, 10(2), pp.122â??33.

Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), pp.301â??320.